Last updated on June 2nd, 2025 at 06:16 pm
We have seen video generators that can create scenes, after which we need to find audio separately. Then, we need to do the voiceover, find the right kind of voice artist if that’s something we cannot do ourselves, or generate the script, feed it to an AI voice generator, and then create the audio. Finally, we bring all the elements into a video editor and edit the video professionally to see a (hopefully) satisfying output.
But with the new Google Veo 3, it is now possible to produce videos with built-in audio.
Developed by Google’s DeepMind, this incredible technology generates high-quality videos that closely match the prompts provided. This technology feels like a turning point. The tech is extraordinarily impressive, and it’s going to be a major shift in how we think about and create video content.
When OpenAI’s Sora was introduced, it was a game changer. But Veo 3 is a step ahead due to its unique capabilities as it can create audio, including dialogue between characters, and even animal sounds to make scenes much more realistic. This is going to be a game changer for professionals working on large-scale cinematic projects.
Imagen 4, which is another amazing generation tool, has also been launched by Google. It is expected to produce high-quality images from prompts provided by users. In addition to Imagen 4, Google also announced Flow, a tool that helps create high-quality, engaging videos based on user prompts, where users can describe specific locations, preferred shots, and how they would like the video to be made.
To access these tools, users can go through Gemini, Vertex AI, Whisk, and Workspace.
Some of the challenges regarding AI-generated video include maintaining consistency in characters, camera angles, and mood across sequences. In storytelling, visual and narrative consistency play a crucial role. This is where Veo 3 comes into play by helping to solve such problems. There is a massive demand for video generation technologies. In some cases, early-stage video generative models appear to freeze due to the overwhelming number of requests for generating videos and images.
In current times, where we are still seeing inaccuracies in image or video generation because these are early-stage models, tech giants like Google and OpenAI are launching extraordinary tools and making them available to users interested in creating highly engaging video content that can become critical tools for storytelling. While many AI tools today still generate images with distorted limbs or unnatural expressions, the capabilities of Veo 3 are promising.
Google has also updated Veo 2 with capabilities such as removing or adding objects into a video simply by providing text-based prompts. The tech giant has also opened up music generation tools like Lyria 2 for creators to leverage on YouTube Shorts via Vertex AI.
Google’s VEO 3 is definitely an upgrade from VEO 2, but is it at the point where you can give it a prompt and expect a 100% done-for-you video to blindly publish to a platform of your choice? We’re not there yet. Google’s future AI models are promising, and VEO 3 is a new product. Google’s offerings keep getting better with newer versions as they become available.
To access the technology, it is currently available on Vertex AI for enterprise users, and to Ultra subscribers in the U.S. via the Gemini app and also through Flow, at a monthly plan of $249.99.