Byungchul Tak, Shu Tao, et al.
IC2E 2016
Although recent text-to-video generative models are getting more capable of following external camera controls, imposed by either text descriptions or camera trajectories, they still struggle to generalize to unconventional camera motions, which is crucial in creating truly original and artistic videos. The challenge lies in the difficulty of finding sufficient training videos with the intended uncommon camera motions. To address this challenge, we propose VIVIDCAM, a training paradigm that enables diffusion models to learn complex camera motions from synthetic videos, releasing the reliance on collecting realistic training videos. VIVIDCAMin- corporates multiple disentanglement strategies that isolate camera motion learning from synthetic appearance artifacts, ensuring more robust motion representation and mitigating domain shift. We show that our design synthesizes a wide range of precisely controlled camera motions using surprisingly simple synthetic data. Notably, this synthetic data often consists of basic geometries within a low-poly 3D scene and can be efficiently rendered by engines like Unity. Our video results can be found inhttps://wuqiuche.github.io/VividCamDemoPage/.
Byungchul Tak, Shu Tao, et al.
IC2E 2016
Vidushi Sharma, Andy Tek, et al.
NeurIPS 2025
Zongyuan Ge, Sergey Demyanov, et al.
BMVC 2017
Kristjan Greenewald, Yuancheng Yu, et al.
NeurIPS 2024