Overview of DrivingForward. Given sparse surround-view input from vehicle-mounted cameras (e.g., single-frame surround-view images or multi-frame surround-view images), our model learns scale-aware localization for Gaussian primitives from the small overlap of spatial and temporal context views. A Gaussian network predicts other parameters from each image individually. This feed-forward pipeline enables the real-time reconstruction of driving scenes and the independent prediction from single-frame images supports flexible input modes. At the inference stage, we include only the depth network and the Gaussian network, as shown in the lower part of the figure.
@inproceedings{tian2025drivingforward,
title={DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input},
author={Qijian Tian and Xin Tan and Yuan Xie and Lizhuang Ma},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2025}
}