Master Thesis on “Fine-tuning Open-source LLMs for Processing Open-vocabulary Commands for Robotic Navigation”
Supervisor: Prof. Ville Kyrki (ville.kyrki@aalto.fi)
Advisor: Dr. Tsvetomila Mihaylova (tsvetomila.mihaylova@aalto.fi), Dr. Francesco Verdoja (francesco.verdoja@aalto.fi)
Keywords: robotic navigation, natural language processing
Project Description
Recent breakthroughs in natural language processing are enabling robots to understand human language like never before. Recent work is proposing to augment robotic maps with embeddings from visual-language models, bringing increased understanding of the environment and enabling us to command robots using open vocabulary.
Most of the systems performing open-vocabulary navigation use closed LLMs accessible via an API. If this kind of systems are to be adopted for practical robotic applications, it is important that the information is not sent to third-party systems. Nowadays, there is an abundance of open-source LLMs available for download. They usually have much poorer performance on general language understanding, but can be fine-tuned to specialize in certain tasks.
The goal of the project is to evaluate fine-tuning of open-source models for processing natural language commands for robotic navigation. The thesis will explore the following research questions: What is necessary for fine-tuning models for robotic navigation? Is fine-tuning helpful in this context? How does the performance of open-source fine-tuned models compare to third-party closed models. The hope is that with enough carefully crafted data, smaller, open-sourced models would be able to reach competitive performance on this specific context. The project will contain collection or generation of data for the task.
Optionally, the result could be tested on a real robot, Hello Robot Stretch 2, and maps obtained by the robot.
The project offers an opportunity for gaining experience in machine learning, familiarity with large (visual-)language models and robotic navigation.
Deliverables
- Literature review of commonly used navigation commands.
- Preparation of a dataset for open-vocabulary navigation.
- Fine-tuning of several open-source LLMs.
- Evaluation of the performance of the fine-tuned models and comparison with closed models.
Practical Information
Pre-requisites: Machine Learning; some familiarity with large language models or robotic navigation can be useful
Programming languages: Python
Start: Available immediately
References
- CoWs on Pasture: Baselines and Benchmarks for Language-Driven Zero-Shot Object Navigation, https://cow.cs.columbia.edu/
- ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning, https://concept-graphs.github.io/
- Open-vocabulary Queryable Scene Representations for Real World Planning, https://nlmap-saycan.github.io/
- LM-Nav: Robotic Navigation with Large Pre-Trained Models, https://sites.google.com/view/lmnav
- Visual Language Maps for Robot Navigation, https://vlmaps.github.io/
- Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation, https://hovsg.github.io/
- OpenScene: 3D Scene Understanding with Open Vocabularies, https://pengsongyou.github.io/openscene