Master thesis on “Reinforcement learning with kernelized movement primitives”

Supervisor: Prof. Ville Kyrki (
Advisors: Dr. Fares J. Abu-Dakka (, Karol Arndt (

Keywords: kernelized movement primitives, reinforcement learning, robot learning and manipulation, learning from demonstration.

Project description

Current reinforcement learning techniques for performing robotic tasks suffer from two main shortcomings: low sample efficiency and potentially unsafe exploration.Attempting to learn even a simple task from scratch requires a huge amount of training data and poses a risk that the robot could, for example, hit a table or a wall with high velocity, effectively destroying itself and the environment.

To partially address these problems, robot learning often relies on skill models, i.e.,models used to represent motions performed by the robot. In reinforcement learning,the models can be used to constrain and guide the exploration. Dynamic movement primitives (DMP) and Gaussian mixture models (GMM) provide a feedback controller for a skill. Those models have been widely used for robot skill modelling and multiple extensions for the methods have been proposed to address different issues with the basic methods.

Kernelized movement primitive (KMP) is a new probabilistic formulation which focuses on providing flexibility to adapt demonstrated trajectories to new situation.Moreover, KMP provide the possibility for the new trajectory to pass through new start-/end-/via-points, that can be defined by the user. Unlike DMP, KMP is capable of handling high dimensional input. The adaptation process facilitates trajectory extrapolation to new goals. This method shown promising results in a few-shot learning from demonstration scenario.

The goal of this thesis is to integrate KMP with reinforcement learning to provide an automatic adaptation approach to adapt the trajectory and goal in order to optimize a desired task.


  • Review of relevant state-of-the-art literature;
  • Integrate KMP with reinforcement learning for an automatic adaptation;
  • Implementation of relevant algorithms;
  • Selecting/designing appropriate evaluation setups;
  • Evaluating the method on a physical robot.

Practical information

Prerequisites: Basics of reinforcement learning, Python, Linux.
Suggested tools: C++, Matlab, or PyTorch; MuJoCo or VRep; ROS.
Platform: Franka Panda robotic arm.
Start: Available immediately


[1] Yanlong Huang, Leonel Rozo, João Silvério and Darwin G. Caldwell, Kernelized Movement Primitives, International Journal of Robotics Research, 38(7), pp.833-852, 2019.
[2] Yanlong Huang, Fares J. Abu-Dakka and João Silvério, Darwin G. Caldwell, Towards Orientation Learning and Adaptation in Cartesian Space, IEEE Transactions on Robotics, 2020.
[3] Freek Stulp and Olivier Sigaud, Policy Improvement Methods: Between Black-BoxOptimization and Episodic Reinforcement Learning, JFPDA 2013.
[4] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford and Oleg Klimov, Proximal Policy Optimization Algorithms, arXiv, 2017.