DomoAI Launches Built-In Text-to-Speech and Integrates OpenAI’s GPT Image 2.0 in Talking Avatar Workflow

SINGAPORE, May 5, 2026 /PRNewswire/ — DomoAI, a Singapore-based generative AI video platform now used by more than 4 million creators worldwide, today highlighted climbing demands of Talking Avatar product from creators as the AI avatar market accelerates toward US$5.93 billion by 2032, according to a recent MarketsandMarkets report.

AI hosts and avatars are visible across TikTok, YouTube Shorts, and Instagram Reels, often standing in presenters in cases where shooting different language versions of the same explainer would have meant a full week of studio time. Especially in Japan, where DomoAI’s Talking Avatar has built a meaningful following among VTuber and anime creators who use it to give voice to their original characters.

3 Key points of Talking Avatar feature. Lip sync had to be consistent even when the voice ran long. And it had to render continuous video output of up to 60 seconds, longer than what most AI avatar tools can output in a single take. The current flow is simple: drop in or generate an image, type a script, pick a voice, hit generate. Or users could upload the voice themselves. Within 1 minute, the output with the lip sync already aligned. This traditional workflow could take one or two days. DomoAI’s Talking Avatar, with its built-in text-to-speech (TTS) functionality, allows all operations to be completed on a single screen.

DomoAI also recently integrated OpenAI’s GPT Image 2.0, so a creator can generate the source image, animate it, voice it, and upscale it in one platform. The end-to-end loop matters most for creators producing scripted content at volume: VTubers, indie animators, language educators, marketing teams.

“They’ve been around since creators started using them daily to publish content. Two years ago, creating a clear and smooth avatar video would take an afternoon, stringing together multiple tools. Now, it can be done in just a few minutes within a single app.” said Joe Lam, CEO of DomoAI. ” The voice is key. In the past, they sounded entirely like AI voices, but not anymore. We’ve added emotion control features, allowing creators to adjust the tone of voice appropriately, rather than struggling with a flat, monotonous sound.”.”

One use case has stood out in particular: music videos. Japanese AI creator Azuki, who runs the Azuki Channel on YouTube, featured DomoAI in a tutorial that has now drawn over 30,000 views.

“With just one image, DomoAI brings my characters to life. They can speak, sing, and perform in a full music video,” said Azuki. “The Talking Avatar feature is one of the standout tools that makes DomoAI feel like a complete creative toolkit, even for beginners.”

About DomoAI

DomoAI is a Singapore-headquartered generative AI video platform serving more than four million creators worldwide. The company is dedicated to developing a unified workflow for AI-generated video and image content.

For media inquiries, please contact:

Public Relations

[email protected]

Cision View original content:https://www.prnewswire.co.uk/news-releases/domoai-launches-built-in-text-to-speech-and-integrates-openais-gpt-image-2-0-in-talking-avatar-workflow-302762491.html

Disclaimer: The above press release comes to you under an arrangement with PR Newswire. We takes no editorial responsibility for the same.