Robotic Navigation with Natural Language Commands

Navigation in graph.

Supervisor: Prof. Ville Kyrki (ville.kyrki@aalto.fi)

Advisor: Dr. Tsvetomila Mihaylova (tsvetomila.mihaylova@aalto.fi), Dr. Francesco Verdoja (francesco.verdoja@aalto.fi)

Keywords: robotic navigation, natural language processing

Project Description

Recent breakthroughs in natural language processing are enabling robots to understand human language like never before. Recent work is proposing to use the capabilities of large visual-language models for enabling humans to command robots for navigating in the environment and executing different tasks using open vocabulary.

The main goal of this project is to add capabilities for processing text queries in natural language to an existing system for robot navigation. The component will be tested in simulation, as well as on a Hello Robot Stretch 2.

The currently implemented system allows creation of maps and navigation of a robot by manually choosing an object present on the map. The new component will allow any text query to be taken as an input and the navigation target will be extracted from this query. For example, such queries can be “Move to the table“, where the navigation target is clear, but also “Bring me something to drink“, where more complex understanding is required. Processing of the text will be done by querying a large language model (for example, GPT4).

Depending on the interests of the student, additional capabilities will be added, for example adding an option to process spoken commands, disambiguation (e.g. the system can ask for clarifications “Two chairs are found, which one should I go to?“) or handling of constraints (e.g. “Bring me a coffee, but avoid the living room.“).

The project offers an opportunity for gaining experience in machine learning, familiarity with large (visual-)language models and robotic navigation.

Deliverables

  • Implementation of a ROS component that processes a text query and returns an object to navigate to.
  • Integration of the component in an existing system for navigation using Hello Robot Stretch 2.
  • Depending on the student interests, it can be extended to adding speech processing, specification of navigation constraints or handling disambiguation.

Practical Information

Pre-requisites: ROS, Machine Learning; some familiarity with large language models or robotic navigation can be useful

Programming languages: Python

Start: Available immediately

References

  • Hydra: A Real-time Spatial Perception System for 3D Scene Graph Construction and Optimization, https://www.roboticsproceedings.org/rss18/p050.pdf
  • Speech-to-speech pipeline, https://github.com/huggingface/speech-to-speech