According to a recent LinkedIn post from Together AI, the company’s speech-to-text (STT) models are reported to hold the top two positions for transcription speed on the Artificial Analysis Speech to Text leaderboard. The post highlights that NVIDIA Parakeet TDT 0.6B V3 hosted on Together AI is measured transcribing 303 seconds of audio per second of processing time, with pricing indicated at $1.50 per 1,000 minutes of audio and an AA-WER of 4.6% across three real-world datasets.
Meet Samuel – Your Personal Investing Prophet
- Start a conversation with TipRanks’ trusted, data-backed investment intelligence
- Ask Samuel about stocks, your portfolio, or the market and get instant, personalized insights in seconds
The post suggests that Together AI is positioning fast, low-cost STT as core infrastructure for “AI natives” building real-time voice agents on its AI Native Cloud. For investors, this performance and cost profile could strengthen the firm’s value proposition in latency-sensitive AI applications, potentially supporting higher developer adoption and usage-based revenue, while also enhancing its competitive stance against other cloud and model-serving providers in the emerging real-time voice AI segment.

