Master thesis on “Visual Action Planning for Complex Object Manipulation”

Supervisor: Prof. Ville Kyrki (ville.kyrki(at)

Advisor: Dr. Almas Shintemirov (almas.shintemirov(at)

Keywords: robotic manipulation, deformable object manipulation

Figure 1 – Examples of visual action plans for a stacking task (top), a rope/box manipulation task (middle) and a shirt folding task (bottom) [1].

Project Description

Complex manipulation of rigid and deformable objects such as stacking cubes on each other to form a certain figure combination or folding a T-shirt require consecutive action planning for a robot-manipulator. Recent approaches have tried to solve such tasks via Reinforcement Learning or Supervised Learning. The thesis will be based on the latter strategy and will focus on implementation and improving of a recently presented visual action planning of complex manipulation tasks, e.g. [1, 2]. A state-of-the-art NVIDIA Isaac Sim robotics simulator, offering photorealistic, physically accurate virtual environments to develop, test, and manage AI-based robots [3], will be used for synthetic data generation, deep learning model training and action verification through deploying generated motion planning scenarios on a simulated and real Franka research robot. The thesis will also include exploring and implementation of deep-learning based robot visual servoing control approaches, e.g. [4].


  • Review of relevant literature.
  • Implementation of relevant robot motion planning approaches for rigid and/or deformable object manipulation in a robotic simulator.
  • Evaluation of simulation results.
  • Evaluation of the algorithms on a real robot.

Practical Information

Pre-requisites: Python (high/medium), Machine Learning (medium), C++ (medium/beginning)

Tools: Nvidia Omniverse Isaac Sim, PyTorch, Robot Operating System (ROS)

Start: Available Immediately


  1. M. Lippi et al., Enabling Visual Action Planning for Object Manipulation Through Latent Space Roadmap, in IEEE Transactions on Robotics, 2022,
  2. X. Ma, D. Hsu, W-S. Lee, Learning Latent Graph Dynamics for Visual Manipulation of Deformable Objects
  4. Q. Bateux et al., Training Deep Neural Networks for Visual Servoing, 2018 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2018