Design of a pick-and-place setup of a franka robot-manipulator using ros2 for visual-language model deployment

Supervisor: Prof. Ville Kyrki (ville.kyrki@aalto.fi)

Advisor: Dr. Almas Shintemirov (almas.shintemirov@aalto.fi)

Keywords: robotic manipulation, ROS, robot perception, visual-language model

Project Description

Visual Language Models (VLMs) have recently demonstrated remarkable capabilities in natural language and image processing tasks and beyond [1]. With the increasing number of proposed VLM models, the tasks complement efficiency through language commands on intelligent devices have been steadily improving. Diverse VLM models can also help robots understand natural language, aiding in instruction following and collaborative problem-solving. Therefore, VLMs have very promising applications in robotics research. 

In this project, the student will be working on developing an experimental robotic setup for research projects on VLM-based object manipulation. The robotic setup will consist of a Franka robotic manipulator, a 3D camera, mounted on the robot gripper, and an an external view 3D camera sensor. Robot Operating System (ROS 2) [2] will used as a hardware/software integration platform.

Deliverables

The students will be following provided tutorials for implementing a “pick-and-place” task, where an object position/orientation will be derived through processing 3D point cloud data using ROS robot perceptions libraries/algorithms. The robot will be controlled using ROS Movelt2 library.

Pre-requisites: C++, Python, Github, ROS (desired)

References:

  1. https://deepmind.google/discover/blog/rt-2-new-model-translates-vision-and-language-into-action/
  2. https://www.ros.org/

Start: Available immediately