Master Thesis on “Natural Language Commands for Robotic Navigation”
Supervisor: Prof. Ville Kyrki (ville.kyrki@aalto.fi)
Advisor: Dr. Tsvetomila Mihaylova (tsvetomila.mihaylova@aalto.fi), Dr. Francesco Verdoja (francesco.verdoja@aalto.fi)
Keywords: robotic navigation, natural language processing
Project Description
Recent breakthroughs in natural language processing are enabling robots to understand human language like never before. Recent work is proposing to use the capabilities of large visual-language models for enabling humans to command robots for navigating in the environment and executing different tasks using open vocabulary.
The thesis will address the problem of translating human language to executable robot skills. It will aim to answer the following research questions: What are different methods for translating text in natural language to executable robotic skills? At what level are robotic skills currently represented? What are different levels of skill specification where navigation can be addressed?
Based on the answers to these questions, the goal is to develop a prototype of a software component which would serve as an entry point to a system for open-vocabulary robotic navigation. The component should be able to process any given text, and figure out whether it contains a command for navigation. If a navigation command is found, it should identify and return a list of pre-defined skills to execute, or objects that should be navigated to, taking into account the constraints specified in the command. The processor should have the flexibility to process arbitrary list of robotic skills and objects.
Optionally, the system could be tested on a real robot, Hello Robot Stretch 2, and maps obtained by the robot.
The project offers an opportunity for gaining experience in machine learning, familiarity with large (visual-)language models and robotic navigation.
Deliverables
- Literature review of commonly used navigation commands.
- Specification of how abstract robotic skills can be defined and passed to the query processor.
- Building a text classifier for open-vocabulary queries.
Practical Information
Pre-requisites: Machine Learning; some familiarity with large language models or robotic navigation can be useful
Programming languages: Python
Start: Available immediately
References
- CoWs on Pasture: Baselines and Benchmarks for Language-Driven Zero-Shot Object Navigation, https://cow.cs.columbia.edu/
- ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning, https://concept-graphs.github.io/
- Open-vocabulary Queryable Scene Representations for Real World Planning, https://nlmap-saycan.github.io/
- LM-Nav: Robotic Navigation with Large Pre-Trained Models, https://sites.google.com/view/lmnav
- Visual Language Maps for Robot Navigation, https://vlmaps.github.io/
- Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation, https://hovsg.github.io/
- OpenScene: 3D Scene Understanding with Open Vocabularies, https://pengsongyou.github.io/openscene