Preventing Mode Collapse When Imitating Latent Policies From Observation
Speaker: Oliver Struckmeier
Robotics Seminar Series. Next Session – 25th November 2022, 13:00-14:00, on-site (Maarintie 8, 1593) and via zoom. Link to event: https://aalto.zoom.us/j/62124942899
Imitation from observations only (ILfO) is an extension of the classic imitation learning setting to cases where expert observations are easy to obtain but no expert actions are available. Most existing ILfO methods either require access to task-specific cost functions or large amounts of interactions with the target environment. Learning a forward dynamics model in combination with a latent policy has been shown to solve these issues. However, the limited supervision in the ILfO scenario can lead to a mode collapse in learning the generative forward model and the corresponding latent policy. In this paper, we analyse the mode collapse problem and show that it can occur whenever the expert is deterministic, and may also occur due to bad initialization of the models. Under the assumption of piecewise continuous system dynamics, we propose a method to prevent the mode collapse using clustering of expert transitions to pre-train the generative model and the latent policy. We show that the resulting method prevents mode collapse and improves performance in five different OpenAI Gym environments.