Overview of DrivingForward. Given sparse surround-view input from vehicle-mounted cameras (e.g., single-frame surround-view images or multi-frame surround-view images), our model learns scale-aware localization for Gaussian primitives from the small overlap of spatial and temporal context views. A Gaussian network predicts other parameters from each image individually. This feed-forward pipeline enables the real-time reconstruction of driving scenes and the independent prediction from single-frame images supports flexible input modes. At the inference stage, we include only the depth network and the Gaussian network, as shown in the lower part of the figure.
@misc{tian2024drivingforwardfeedforward3dgaussian,
title={DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input},
author={Qijian Tian and Xin Tan and Yuan Xie and Lizhuang Ma},
year={2024},
eprint={2409.12753},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2409.12753},
}