Master Thesis on “Controlling a Robotic Arm with Instructions in Natural Language”

Supervisor: Prof. Ville Kyrki (ville.kyrki@aalto.fi)

Advisor: Dr. Tsvetomila Mihaylova (tsvetomila.mihaylova@aalto.fi), Tran Nguyen Le (tran.nguyenle@aalto.fi)

Keywords: robotic manipulation, natural language processing

With the growing advancement of robots and their integration in society, there is a growing need for people-friendly communication between robots and humans. A natural way of sending instructions to an autonomous system is by using language. A recent direction of research is the integration of large pretrained vision-language models for robotic control, where a human can give a robot instructions in natural language and the robot would perform a certain manipulation task.

Project Description

The goal of this master thesis is to integrate a large vision-language model (VLM) with a manipulation policy in order to control a robotic hand for predefined manipulation tasks, such as grasping or pushing.
The thesis includes reviewing the latest work on the topic, selection of suitable datasets, VLM, manipulation model and configuration of an experimental setup. Initially, the experiments will be executed in a simulator. An additional option would be to execute the experiments to a real robotic hand (Franka Emika Panda) and identify the challenges in the transfer between the simulation to the real robot.

Deliverables

  • Literature review of using vision-language models for different manipulation tasks; selection of a task to focus on based on the research
  • Integration of a vision-language model for a selected manipulation task
  • Execution of experiments in a simulator environment
  • Execution of the experiments on a robotic hand and identifying the gaps between the simulation and real-world implementation

Practical Information

Pre-requisites: Python, PyTorch, experience with robotic manipulation, experience with machine learning

Simulators to be used: Isaac Sim

Start: Available immediately

References