According to a recent LinkedIn post from Exa, the company is highlighting a new “Highlights” text-extraction model designed to cut input tokens for web-based AI agents by about 96%. The post indicates that this system selects only the most relevant roughly 500 tokens from a webpage for a given query while aiming to maintain retrieval-augmented generation, or RAG, performance comparable to a full 10,000-token input.
Claim 55% Off TipRanks
- Unlock hedge fund-level data and powerful investing tools for smarter, sharper decisions
- Discover top-performing stock ideas and upgrade to a portfolio of market leaders with Smart Investor Picks
The post suggests that this compression capability may be particularly relevant for frontier large language models such as GPT 5.5, where managing context length and cost is a growing constraint. For investors, this focus on improving content density and reducing context “bloat” could position Exa as an enabling infrastructure player in the AI tooling ecosystem, potentially driving API usage as model providers and enterprise developers look to optimize performance and inference costs.
As shared in the post, Exa frames the “Highlights” feature as immediately accessible via a dedicated content type in its API, signaling an emphasis on developer adoption and integration into existing AI workflows. If the technology proves effective at scale, it may enhance Exa’s competitive standing among vector search, RAG, and AI data-layer vendors, and could support future monetization through higher-value usage tiers or partnerships with model platforms.

