Tutorial - LaMAria

Simultaneous localization and mapping is a fundamental technique with applications spanning robotics, spatial AI, and autonomous navigation. It addresses two tightly coupled challenges: localizing the device while incrementally building a coherent map of the surroundings. Localization, or positioning, involves estimating a 6 Degrees-of-Freedom (6-DoF) pose for each image in a continuous sequence, typically aided by other sensor data, while mapping involves constructing an evolving representation of the surrounding environment. Accurate localization helps the device track its movement and improve the map's quality, while a better map further refines the device's pose estimates. Positioning is also crucial in real-world applications to ensure the persistence of digital content and enable seamless sharing across devices, which is especially important for applications like augmented reality, where precise placement enhances user experience and interaction.

Recent advancements in mobile computing have fueled the development of wearable devices equipped with multiple color or depth cameras, inertial units, and GPS. These devices capture egocentric, multi-modal data that pose challenges often overlooked by traditional SLAM research, which typically relies on curated datasets featuring controlled viewpoints and restricted motion patterns. On the other hand, egocentric data exhibits significantly more diversity in motion patterns, viewpoints, and environments. These devices aspire to be all-day wearables that capture data over extended durations, where factors like sensor calibration can shift over time. In this tutorial, we address the task of accurate positioning for large-scale egocentric data using visual-inertial simultaneous localization and mapping (SLAM) and odometry (VIO).

As the academic community has been mainly driven by benchmarks that are disconnected from the specifics of egocentric data, we introduce LaMAria, a city-scale egocentric dataset collected using Project Aria devices to track progress in egocentric VIO/SLAM. These devices capture rich multi-sensor streams in a glasses-like form-factor, such that they can be worn over extended durations and distances without impeding the wearer's motion. The dataset exhibits key characteristics of egocentric data, with a focus on challenges that break existing algorithms: long trajectories, extremely low illumination, fast motion, time-varying calibration, and traveling in a moving platform or vehicle. In this tutorial, we aim to provide a hands-on experience with the new dataset, while laying the groundwork for a forthcoming benchmark that will offer insights for research in accurate localization in the context of egocentric VIO/SLAM.

The tutorial will be structured as provided below. All times are in HST (Hawaii Standard Time).

Introduction and motivation [13:00 - 14:00]
1. Opening remarks and a general introduction to the dataset.
2. A discussion on the significance of large-scale egocentric data and Aria devices – constraints/challenges, unique opportunities, and application relevance.
3. A detailed introduction to visual-inertial SLAM and Aria's MPS SLAM.
Break [14:00 - 14:15]
Current academic benchmarks and the newly introduced dataset [14:15 - 15:15]
1. Introduction to SLAM benchmarking.
2. Existing datasets and benchmarks: metrics, modalities, and limitations.
3. Overview of the new egocentric dataset: scale, modalities, challenges, comparison to previous datasets, creation process.
4. Applicability of the new dataset to other research topics in computer vision.
Break [15:15 - 15:30]
Benchmarking VIO/SLAM baselines using our new dataset [15:30 - 16:30]
1. Metrics and evaluation tracks based on our different ground-truth types.
2. A discussion on academic vs. closed-source VIO/SLAM systems.
3. Analysis of the results: failure modes, limitations, open problems.
Practical guide [16:30 - 16:45]
1. How to evaluate your own systems: a detailed walk-through.
2. Details on the data release: data format, modalities, etc.
Conclusion and questions [16:45 - 17:00]

Organizers