LiveKit Deepens Agentic AI Capabilities With Avatars, Voice Cloning and Guided Data Tools

LiveKit continued to build out its agentic AI platform this week, introducing multiple product enhancements and deepening ecosystem partnerships. The company positioned these updates as part of a broader strategy to become a core infrastructure layer for real-time, multimodal AI communications and workflow automation.

Claim 55% Off TipRanks

Unlock hedge fund-level data and powerful investing tools for smarter, sharper decisions
Discover top-performing stock ideas and upgrade to a portfolio of market leaders with Smart Investor Picks

LiveKit expanded its Agents framework by integrating LiveAvatar from HeyGen, enabling developers to add real-time, customizable video avatars to existing AI voice agents. The integration allows facial rendering, lip sync, and synchronized video to run on top of existing speech recognition, language models, and text-to-speech pipelines without requiring major architectural changes.

Audio and video are synchronized and published into LiveKit rooms over WebRTC, with sandbox testing and video quality controls to manage latency and fidelity tradeoffs. This capability targets use cases such as sales assistants, tutors, and customer support agents that benefit from more immersive, human-like experiences while preserving performance and reliability.

In parallel, LiveKit introduced cross-provider voice cloning on its LiveKit Inference platform through partnerships with Inworld AI and Cartesia. Users can upload a single audio sample to create a unified LiveKit voice ID that works across multiple text-to-speech providers, allowing developers to compare outputs and select the best option for their specific application.

The cloned voice can also be used as a fallback if a primary TTS provider fails mid-call, maintaining continuity and brand consistency in voice agents. The feature is available on all paid LiveKit Cloud plans, with cloning itself free and ongoing usage billed at providers’ standard per-character rates, potentially driving higher platform stickiness and paid usage.

LiveKit also highlighted Inworld AI’s Realtime TTS-2 model for emotionally expressive speech when used with its conversational agents. More lifelike audio is expected to make interactions feel less automated, potentially encouraging longer conversations and richer context sharing, which could improve agent performance and user engagement.

Beyond core technology, LiveKit advanced its guided data collection tools for the Agents platform, targeting structured workflows like lead qualification, patient intake, bookings, and surveys. Developers can orchestrate conversations via Tasks and TaskGroups in Python and TypeScript SDKs, while non-technical users leverage a browser-based Agent Builder that outputs structured JSON ready for downstream systems.

The company also increased its visibility in the developer community by participating in MongoDB’s Agentic Evolution Hackathon in London, focused on AI agents that can see, hear, speak, and act. Collectively, these product and ecosystem moves suggest a strengthening of LiveKit’s position in real-time AI infrastructure, with potential long-term benefits in adoption, monetization, and competitive differentiation.

Disclaimer & Disclosure Report an Issue

LiveKit Deepens Agentic AI Capabilities With Avatars, Voice Cloning and Guided Data Tools

Claim 55% Off TipRanks

Latest News Feed

More Articles

Stock Comparison

Investment Ideas