According to a recent LinkedIn post from Baseten, the company’s researchers are exploring ways to give large language models a more human-like working memory. The post describes work on a 7M-parameter “perceiver” model that compresses key-value (KV) caches by 8x while reportedly retaining over 90% factual accuracy in downstream recall.
Claim 30% Off TipRanks
- Unlock hedge fund-level data and powerful investing tools for smarter, sharper decisions
- Discover top-performing stock ideas and upgrade to a portfolio of market leaders with Smart Investor Picks
The LinkedIn post suggests this compression is achieved in a single forward pass, which is positioned as distinct from existing multi-step compaction methods. If the approach proves scalable, it could lower inference costs and enable longer context windows, potentially strengthening Baseten’s competitiveness in serving memory-intensive AI applications.
The post frames this research as an early step toward models that “learn from experience,” implying longer-term ambitions in continual learning and adaptive systems. For investors, such capabilities could translate into differentiated infrastructure offerings for enterprise AI, supporting higher-value workloads and improving Baseten’s strategic position in the rapidly evolving LLM tooling ecosystem.

