Opening Articulated Objects
in the Real World
University of Illinois at Urbana-Champaign
Enabling mobile manipulators to perform everyday tasks in unfamiliar environments remains a fundamental challenge in robotics. In this work, we tackle the task of opening previously unseen articulated objects—cabinets, drawers, ovens—in diverse, real world settings as a testbed and present MOSART (a MOdular System for opening ARTiculated objects). MOSART integrates state-of-the-art perception, planning, and proprioceptive adaptation into an end-to-end system that estimates articulation parameters, computes a whole-body motion plan, navigates to the object, secures a grasp, and successfully opens it without any privileged information about the object or environment. Through large-scale tests in 13 sites spanning 31 unique objects, all of which were entirely unseen during development, we demonstrate that our modular design convincingly outperforms a recent end-to-end imitation learning method, Robot Utility Models (RUM), despite RUM being trained on 1,200 demonstrations for cabinets and 525 demonstrations for drawers. Our analysis further reveals that state-of-the-art articulation prediction models from the literature struggle when tested on real world robot-centric viewpoints, motivating the need for system-level research and prompting us to develop a specialized model that yields substantially more accurate estimates. Surprisingly, our study also reveals that perception—and not precise end-effector control—is the primary bottleneck to task success, accounting for the majority of failures, while last-centimeter grasping remains challenging. These findings highlight the limitations of developing components of the pipeline in isolation and underscore the need for system-level research, and provide a pragmatic roadmap for building generalizable mobile manipulation systems.