FriendliAI Deepens AI Inference Footprint With New Models, Web-Agent Tools, and San Francisco Expansion

FriendliAI featured a busy week as it expanded its AI inference infrastructure, deepened partnerships, and highlighted customer traction. The company integrated DeepSeek AI’s new DeepSeek-V4-Pro and DeepSeek-V4-Flash models into its Dedicated Endpoints, offering 1 million-token context windows and targeting high-capacity, cost-efficient workloads.

Meet Samuel – Your Personal Investing Prophet

Start a conversation with TipRanks’ trusted, data-backed investment intelligence
Ask Samuel about stocks, your portfolio, or the market and get instant, personalized insights in seconds

DeepSeek-V4-Flash is positioned as a performance-efficiency option with 284 billion total parameters and 13 billion active per token, while DeepSeek-V4-Pro is aimed at more demanding tasks in coding, reasoning, and long-context use cases. By aligning with popular open-weight models that are quickly gaining adoption on OpenRouter, FriendliAI is reinforcing its role as an infrastructure provider for advanced LLM deployments.

The company also underscored its focus on agentic AI by rolling out Qwen WebWorld models, in collaboration with Alibaba Cloud, on Friendli Dedicated Endpoints. WebWorld functions as an offline “flight simulator” for web agents, enabling training in sandboxed environments across HTML, XML, Markdown, and natural language without touching the live internet.

Three Qwen WebWorld variants—8B, 14B, and 32B—target different stages of the agent-training lifecycle, with reported benchmark gains on WebWorld-Bench, MiniWob++, and WebArena. One-click deployment and dedicated compute are marketed as reducing infrastructure overhead for enterprises building large-scale web agents and simulation-heavy workloads.

FriendliAI simultaneously highlighted updated customer case studies showcasing its platform in production-scale AI inference. Reported results include 3x traffic growth, 5x throughput, 3x cost savings, 50% lower GPU costs, and more than 1 billion monthly interactions across various deployments.

These metrics, while marketing-driven, emphasize the firm’s positioning around performance, reliability, and cost efficiency for high-volume AI workloads. If broadly representative, such outcomes could support recurring usage-based revenues and improve customer retention for the platform.

On the go-to-market front, FriendliAI announced plans to showcase its “frontier-grade” inference technology at the SuperAI conference in Singapore on June 10–11. The company is targeting enterprise and developer customers looking to improve latency and cost for open models and coding agents, using the tagline “Same model. Better inference” to stress its value proposition.

FriendliAI also continued its physical and organizational expansion with the opening of a 7,000-square-foot office in San Francisco’s SoMa district. The site is intended as a hub for the AI builder community, hosting meetups and hackathons, while the company ramps hiring across go-to-market, partnerships, and engineering roles to support growth in inference demand.

Strategically, the convergence of new model integrations, agent-focused simulation capabilities, visible customer outcomes, and geographic expansion points to a deepening commitment to AI inference infrastructure. Overall, the week’s developments suggest FriendliAI is strengthening its competitive position in serving open-weight, agentic, and enterprise-scale AI workloads, while laying groundwork for future revenue growth.

Disclaimer & Disclosure Report an Issue

FriendliAI Deepens AI Inference Footprint With New Models, Web-Agent Tools, and San Francisco Expansion

Meet Samuel – Your Personal Investing Prophet

Latest News Feed

More Articles

Stock Comparison

Investment Ideas