tiprankstipranks
Advertisement
Advertisement

Together AI – Weekly Recap

Together AI – Weekly Recap

Together AI continued to build out its AI infrastructure platform this week with new model integrations and performance disclosures aimed at demanding, agentic workloads. The company operates an inference and tooling stack that targets enterprise developers seeking cost-efficient, high-throughput AI services.

Meet Samuel – Your Personal Investing Prophet

Together AI added Qwen3.7-Max to its serverless inference platform, describing the model as tailored to “AI natives” and long-horizon autonomous workflows. The integration emphasizes capabilities for extended coding tasks, reasoning-heavy agents, and 1M-token context use cases that require reliable, persistent infrastructure.

The Qwen3.7-Max launch highlights benchmarks such as maintaining coherence during a 35-hour autonomous kernel optimization run and strong results on Terminal-Bench 2.0-Terminus for terminal-based engineering. These features are intended to attract developers building complex agent systems that stress KV cache, scheduling, and throughput.

In voice AI, Together AI expanded support for MiniMax Speech 2.8 Turbo on its dedicated infrastructure, targeting expressive, real-time text-to-speech agents. The model offers sound tags for nonverbal cues, sub-250 millisecond streaming latency, and multilingual coverage in more than 40 languages, with reported prosody improvements over earlier versions.

These voice capabilities are aimed at enterprise customers building latency-sensitive, global voice interfaces in sectors such as customer support and healthcare. Higher-quality, multilingual audio and low latency may drive additional infrastructure utilization and deepen customer reliance on the platform.

The company also promoted internal benchmarks for its Inference Engine, arguing that standard tests understate challenges in real-world, high-concurrency workloads. Together AI reported 31% higher throughput than the next fastest open-source engine, double the time-to-first-token performance at saturation, and a 76% lower cost per request versus Claude Opus 4.6.

If sustained in production, these performance and cost metrics could strengthen Together AI’s competitive position in the inference layer and support higher usage-based revenue. The focus on realistic workloads and detailed technical breakdowns also appears aimed at building credibility with technical buyers and enterprise decision-makers.

Alongside these infrastructure and model updates, Together AI continued to advance partnerships such as its Pearl Research Labs collaboration around discounted Gemma-4-31B-it-Pearl endpoints. Overall, the week’s developments reinforced the company’s strategy of combining high-performance, cost-efficient inference with specialized capabilities in coding agents and voice AI.

Disclaimer & DisclosureReport an Issue

1