I agree, that's the future of these video models. For professional use you want more control and the obvious next step towards that is to generate the full 3D scene (in the form of animated gaussian splats since that's more AI friendly than the mesh based 3D). That also helps the model to be more consistent but also adds the ability for the user to have more control over the camera or the scene.