According to a recent LinkedIn post from FriendliAI, independent leaderboards from Artificial Analysis suggest the company’s Model APIs deliver comparatively strong output speed and latency for open-weight models GLM-5.1 and Gemma-4-31B. The post cites performance of roughly 133 output tokens per second at about 1 second time-to-first-token for GLM-5.1, and approximately 62 tokens per second for Gemma-4-31B, indicating a notable lead over other third-party endpoints tracked.
Meet Samuel – Your Personal Investing Prophet
- Start a conversation with TipRanks’ trusted, data-backed investment intelligence
- Ask Samuel about stocks, your portfolio, or the market and get instant, personalized insights in seconds
The post highlights FriendliAI’s focus on balancing throughput and latency rather than headline peak token rates, positioning its infrastructure for production workloads that require open-weight flexibility, multi-model portability, and reliability. For investors, this emphasis on practical, production-grade performance may support FriendliAI’s competitive standing in the inference infrastructure segment and could help attract enterprise AI customers seeking scalable, open-weight model deployment options.

