This project implements a Structure from Motion (SfM) pipeline using Python to reconstruct 3D scenes from 2D images. SfM is a computer vision technique used for 3D reconstruction and is widely applied in robotics, AR/VR, and mapping.
The Structure from Motion (SfM) pipeline consists of several stages, each responsible for processing input images and producing intermediate results that contribute to the final 3D reconstruction. Below is a detailed explanation of each step, accompanied by visual examples:
- The pipeline begins by detecting and matching keypoints between image pairs.
- Using robust estimation techniques like RANSAC, the Fundamental Matrix (F) is computed, which encodes the epipolar geometry between two images. This step ensures that only inlier matches are used for subsequent calculations.
- After estimating the Fundamental Matrix, outliers are rejected, leaving only high-confidence feature matches. This process is crucial for stable and accurate reconstruction.
- Using the Fundamental Matrix and intrinsic camera parameters, the Essential Matrix (E) is computed.
- The Essential Matrix is decomposed into rotation (R) and translation (t) components to estimate the relative pose between cameras.
- Given the relative camera poses and matched feature points, 3D points are reconstructed via triangulation.
- Triangulation uses the epipolar geometry and camera projection matrices to estimate the 3D location of points in the scene.
- After reconstructing some initial 3D points and camera poses, the pipeline incorporates additional images.
- Using the PnP algorithm, new camera poses are registered by aligning 2D keypoints from the new image with previously reconstructed 3D points.
- This step incrementally expands the reconstruction.
- Bundle Adjustment refines the entire structure by simultaneously optimizing:
- Camera intrinsic and extrinsic parameters (pose and focal length).
- 3D point coordinates.
- The goal is to minimize the overall reprojection error, ensuring that the reconstructed 3D points align well with the observed 2D points in the images.
- The final output is a 3D point cloud representing the reconstructed scene.
- The pipeline supports visualizing intermediate and final results, such as camera positions and dense or sparse point clouds.
To provide a clear overview, here’s a step-by-step flow of the pipeline:
- Feature Detection & Matching
- Fundamental Matrix Estimation (Epipolar Geometry)
- Essential Matrix Decomposition (Camera Pose)
- Triangulation
- Python 3.8 or higher
- NumPy
- Matplotlib
- SciPy
Install dependencies using:
pip install opencv-python numpy matplotlib scipy