tiprankstipranks
Advertisement
Advertisement

Together AI Showcases Mamba-3 Inference-Focused AI Architecture

Together AI Showcases Mamba-3 Inference-Focused AI Architecture

According to a recent LinkedIn post from Together AI, the company is highlighting a new model architecture called Mamba-3, described as a state-space model (SSM) optimized for fast inference. The post contrasts Mamba-3 with prior linear architectures, suggesting that earlier designs prioritized training efficiency at the expense of decoding performance, which became constrained by memory bandwidth on GPUs.

Claim 30% Off TipRanks

The company’s LinkedIn post points to several technical changes, including an exponential-trapezoidal discretization scheme, complex-valued state tracking, and multi-input multi-output (MIMO) SSMs aimed at improving accuracy without increasing decode latency. At a 1.5 billion parameter scale on a single Nvidia H100, the post claims leading prefill and decode latency across sequence lengths versus Mamba-2, Gated DeltaNet, and Llama-3.2-1B under vLLM.

According to the post, MIMO configurations are presented as offering similar speed to Mamba-2 while delivering stronger downstream accuracy. The kernels associated with Mamba-3 are described as open-sourced, which may lower adoption barriers for developers and researchers, potentially expanding Together AI’s ecosystem and usage in latency-sensitive applications such as real-time inference and cost-sensitive deployments.

For investors, the emphasis on faster inference and open-source kernels suggests a strategy focused on performance leadership in model serving rather than solely on training scale. If Mamba-3’s performance and accuracy advantages are validated and widely adopted, Together AI could strengthen its competitive position in the AI infrastructure and model serving market, potentially improving pricing power, customer acquisition, and long-term platform stickiness.

Disclaimer & DisclosureReport an Issue

1