1st Place

Team Getingarna

Summary of Approach

The team’s approach builds on their recent research on deep learning-based image matching, in particular the two methods RoMa and DeDoDe. These image matching techniques are able to reliably estimate relative pose for a majority of the consecutive image pairs in the challenge sequence. To get good estimates for the most challenging image pairs in the sequence and to handle loop closures, the team further built a pipeline using image retrieval with DINOv2 and full structure-from-motion reconstruction with COLMAP.

Georg Bökman

Georg Bökman is a final year PhD student in the Computer Vision group at Chalmers University of Technology in Gothenburg, Sweden. His main research interests include equivariant neural networks and deep learning for computer vision.

Johan Edstedt

Johan Edstedt is a third year PhD student at the Computer Vision Laboratory, Linköping University, Sweden. His research interest lies in 3D computer vision, particularly image matching and 3D reconstruction. Johan and Georg started collaborating after meeting in the Swedish nationwide research program WASP.

2nd Place

Team kbrodt

Summary of Approach

The team observed that classical approaches based on ORB detectors and FLANN matchers typically failed on the challenge’s task. So, they focused their approach on feature detection and feature matching steps. They replaced feature detection and matching models with state-of-the-art dense feature matching deep neural network model RoMa. They further refined this matching with another deep neural network model DeDoDe descriptor. They used publicly available pretrained weights for mentioned models. Their method didn’t require any retraining and had good generalisation on new tasks.

Kirill Brodt

Kirill Brodt is currently doing research in computer graphics at the University of Montréal. He got his master’s in mathematics at the Novosibirsk State University, Russia. He is involved in an educational program where he teaches machine learning and deep learning courses and supervises undergraduate theses.

3rd Place

Team Sylish

Summary of Approach

The solution proposed by Team Sylish involved a deep image retrieval network and deep feature matching to obtain 2D-2D correspondences between similar images. The relative motion between frames was then obtained by estimating essential matrices within a RANSAC scheme. The translation scale factor between frames was estimated using a deep scale estimation network. Finally, the resulting poses were refined through a nonlinear least-squares optimisation to solve a rotation graph problem.

Clémentin Boittiaux

Clémentin Boittiaux received an engineering degree in computer science from ESIEE Paris. He is currently finishing his PhD at Ifremer Underwater Robotics Lab / COSMER / LIS, working on visual localization and related computer vision problems.

Maxime Ferrera

Maxime Ferrera received an engineering degree in computer science & electrical engineering from Polytech Annecy-Chambéry and a PhD at ONERA – DTIS / LIRMM – CNRS. He is currently a research scientist at Ifremer Underwater Robotics Lab, working on robotics perception and computer vision related problems.

Team sdrnr

Summary of Approach

The team started with a basic CNN approach, providing the model with two RGB images from consecutive states, along with corresponding precomputed depth maps and flow maps. The model was trained to predict the quaternion that represents the rotation between these states and the corresponding translation. The team’s subsequent endeavours led to an enhancement using a visual matching model that identifies matches between anchor points on RGB images, even when the camera pose has changed dramatically. From these matches, the translation and rotation are deduced. Additionally, the team implemented heuristics for situations where no matches could be found, typically during a 180° turn or an anomalously long time delta between two states.

Stepan Konev

Stepan Konev is a Machine Learning Engineer at Booking.com in Amsterdam, where he applies his expertise in recommender systems. He earned his MSc degree at the Mobile Robotics Lab at Skoltech, supervised by Prof. Gonzalo Ferrer, focusing on motion prediction. After that, he joined the Yandex Self-Driving Group, where he contributed to the development of motion prediction algorithms for autonomous vehicles.