tiprankstipranks
Advertisement
Advertisement

Together AI Highlights Mamba-3 Architecture Aimed at Faster AI Inference

Together AI Highlights Mamba-3 Architecture Aimed at Faster AI Inference

According to a recent LinkedIn post from Together AI, the company is highlighting Mamba-3, a new state space model (SSM) architecture designed to improve inference speed versus prior linear models such as Mamba-2. The post suggests that earlier architectures optimized for training efficiency led to memory-bound decoding on GPUs, limiting effective compute utilization.

Claim 55% Off TipRanks

The company’s LinkedIn post indicates that Mamba-3 introduces more expressive recurrence via exponential-trapezoidal discretization, complex-valued state tracking and multi-input multi-output (MIMO) SSMs aimed at improving model accuracy without increasing decode latency. At a 1.5 billion parameter scale on a single Nvidia H100 GPU, the post claims faster prefill and decode performance across sequence lengths compared with Mamba-2, Gated DeltaNet and Llama-3.2-1B in a vLLM setting.

According to the post, the MIMO variant is presented as matching Mamba-2’s speed while delivering stronger downstream accuracy, and the underlying kernels are described as open-sourced. For investors, these technical advances may signal Together AI’s intent to compete in high-performance, cost-efficient model serving, potentially improving the economics of inference-heavy workloads for enterprise customers.

If Mamba-3’s reported performance gains are validated in production environments, Together AI could strengthen its positioning in the AI infrastructure and model-optimization segment relative to both transformer-based and alternative SSM providers. Open-sourcing the kernels may also help build developer adoption and ecosystem influence, which could translate into greater platform usage and longer-term monetization opportunities around hosted services and tooling.

Disclaimer & DisclosureReport an Issue

1