On April 21, 2025, ShengShu Technology released Vidu Q1 worldwide. The browser-based model lets creators turn two still images and a text prompt into a 5-second 1080p clip. Its “First-to-Last Frame” pipeline guides motion so that characters stay coherent even when the source images are unrelated, bringing movie-style transitions to solo editors.
Audio is now inside the same workflow. Text cues generate 48 kHz background music or foley, allow ten-second multitrack layers, and accept timestamp commands such as “0–2 s wind,” removing the need for external sound libraries. Anime output also sees sharper lines and steadier frame blend, building on the multiple-entity consistency method that debuted in Vidu 1.5.
Internal VBench scores rank Q1 ahead of Runway Gen-2, OpenAI Sora, and Luma Dream Machine on prompt fidelity and frame coherence, while those rivals still rely on outside audio tools or longer renders for similar resolution. Aura Productions, testing Q1 for a 50-episode sci-fi anime series, reports post-production costs dropping by an order of magnitude.
Vidu Q1 unites image-to-image transitions, 1080p five-second rendering, refined anime generation, and prompt-driven 48 kHz audio layers, giving small teams and influencers a direct route to cinematic polish without VFX or sound departments.
Founded in Singapore in 2023, ShengShu Technology focuses on multimodal large language models. After opening the Vidu platform to commercial users in July 2024, the firm now serves creators in more than 200 regions and is courting film, advertising, and social-media studios with Q1’s new capabilities.