Semantic navigation is necessary to deploy mobile robots in uncontrolled environments like our homes, schools, and hospitals.
We present a large-scale empirical study of semantic visual navigation methods comparing representative methods from classical, modular, and end-to-end learning approaches across six homes with no prior experience, maps, or instrumentation.
For researchers, we identify two key issues that prevent today s simulators from being reliable evaluation benchmarks - (a) a large sim-to-realgap in images and (b) a disconnect between simulation and real-world error modes-and propose concrete steps forward.
For practitioners, we show that modular learning is a reliable approach to navigateto objects due to modularity and abstraction in policy design.
In this work, we tackle the object goal navigation task where a robot is asked to find an object belonging to a particular category, like a bed or a couch, in a completely unseen environment.
Humans can navigate in unseen environments effortlessly.
For example, when looking for a glass of water at a friend s house we’re visiting for the first time, we can easily find the kitchen without going to bedrooms or storage closets.
We can utilize our experience in prior environments to explore any new environment efficiently and find any target object efficiently.
Learning such semantic priors is essential to deploy autonomous mobile robots in uncontrolled environments like our homes, schools, and hospitals.
Following recent advances in machine learning and computer vision, there has been a lot of interest in designing learning-based policies for visual navigation capable of learning these semantic priors.
Most relevant to us, we propose end-to-end policies to navigate to object goals.
Result
We consider the problem of semantic navigation instantiated by the object goal task, which requires spatial scene understanding (obstacle and navigable space detection), semantic scene understanding (object detection), learning semantic priors (for efficient exploration), and episodic memory (keeping track of explored and unexplored areas).
We evaluate a policy representative of each of the classical, end-to-end learning, and modular learning approaches to object goal navigation on a hello robot stretch robot.