Benchmark on deep reinforcement learning-based placing using a robot arm

Konferenz: ISR Europe 2023 - 56th International Symposium on Robotics
26.09.2023-27.09.2023 in Stuttgart, Germany

Tagungsband: ISR Europe 2023

Seiten: 7Sprache: EnglischTyp: PDF

Autoren:
Kernbach, Andreas (Institute of Industrial Manufacturing and Management IFF, University of Stuttgart & Department Cyber Cognitive Intelligence CCI, Fraunhofer Institute for Manufacturing Engineering and Automation IPA, Germany & University of Stuttgart, Institute for System Dynamics, Germany)
Hoffmann, Kathrin; Sawodny, Oliver (University of Stuttgart, Institute for System Dynamics, Germany)
Eivazi, Shahram (Festo SE & Co. KG and University of Tübingen, Germany)

Inhalt:
Deep reinforcement learning (DRL) is an approach by which an agent can learn to solve complex sequential tasks in a try-and-error manner. As such, it has been widely explored in robotic domain for solving manipulation tasks. In this paper we benchmark DRL approaches utilizing pure image-based observations and simulation-based numerical ground truth (oracle) observations. The underlying benchmark task is inspired by an industrial application where a robotic arm is required to position up to nine boxes of random sizes into a container with variable pose. Central for the comparison is the robustness of the task execution and the duration of DRL training. We first simulate the task based on a real world logistic setup and parameters for a vision based reinforcement learning agent that operates in real time. Second, we compare DRL algorithms and observation types with policies that decide the desired end-effector pose of the to-beplaced box. Our results show the Proximal Policy Optimization (PPO) algorithm has the highest performance with satisfactory robustness and learning time. Moreover, we successfully perform the placing task by relying only upon image-based input in which the robot arm robustly places 8.8(exp +0.2) −0.61 of nine objects on average into the container, increasing the required learning time by 233% compared to oracle observations.