Master thesis on “Efficient learning of task-specific trajectories from demonstrations”

Supervisor: Prof. Ville Kyrki (ville.kyrki@aalto.fi)
Advisors: Dr. Fares J. Abu-Dakka (fares.abu-dakka@aalto.fi), Karol Arndt (karol.arndt@aalto.fi)

Keywords: robot manipulation, learning from demonstration, reinforcement learning, Gaussian mixture model/regression.

Project description

Reinforcement learning revolves around the idea of learning skills by trial and error. This requires the learning agent to perform random actions in order to explore the environment and, eventually, learn to successfully perform the task. This process, however, is often done in a random and uninformed way, which may be unsafe both for the robot and for the surrounding environment.

To address these problems, safe and targeted exploration can be facilitated by providing demonstrations of the task and using RL to train a model that can generalize these demonstrations to new conditions. A potential solution lies in using the demonstrations to learn a low-dimensional representation of useful trajectories. Applying such methods, however, requires hundreds or thousands of trajectories to be available. In certain cases, these trajectories can be programmed explicitly by an expert or provided by a path planning algorithm, but this approach is limited to simple tasks and requires tedious engineering. This reliance on large amounts of data makes it effectively impossible to learn useful representations from a few human demonstrations.

A possible solution to this problem lies in combining the expressive power of neural networks with data efficiency of probabilistic models, like GMR, to learn the general structure of such trajectories, and use these models to generate more demonstrations.

The goal of this thesis is to increase the efficiency of reinforcement learning when limited number of examples is available by providing a method of obtaining a large number task-specific trajectories from only a few demonstrations.

Deliverables

  • Review of relevant state-of-the-art literature;
  • Designing a data-efficient trajectory generation method;
  • Selecting/designing an experimental setup;
  • Training relevant trajectory and policy models.

Practical information

Prerequisites: Basics of machine learning, Python, Linux.
Suggested tools: PyTorch, MuJoCo, ROS.
Platform: Franka Panda robotic arm.
Start: Available immediately

References

[1] Sylvain Calinon, A Tutorial on Task-Parameterized Movement Learning and Retrieval, Intelligent Service Robotics, 2016.
[2] Aleksi Hämäläinen, Karol Arndt, Ali Ghadirzadeh and Ville Kyrki,
Affordance Learning for End-to-End Visuomotor Robot Control, International Conference on Intelligent Robots and Systems, 2019.