According to a recent LinkedIn post from Crusoe, the company has been working with NVIDIA to address tokenization latency in large language model inference, particularly for long-context, agent-style workloads. The post highlights integration of its open-source fastokens library with NVIDIA Dynamo and SGLang, and support for models such as NVIDIA Nemotron, DeepSeek, Qwen, GLM, MiniMax, and Mistral.
Claim 30% Off TipRanks
- Unlock hedge fund-level data and powerful investing tools for smarter, sharper decisions
- Discover top-performing stock ideas and upgrade to a portfolio of market leaders with Smart Investor Picks
The post suggests that fastokens achieved an average 9.1× speedup over the HuggingFace AutoTokenizer across four models, multiple datasets, and three CPU architectures. For prompts above 50K tokens, reported speedups rise to as high as 31×, with time-to-first-token reductions of up to 40% in real inference workloads.
For investors, these performance metrics, if replicated broadly in production environments, could enhance Crusoe’s value proposition in high-performance AI infrastructure and workloads with very long contexts. Closer technical collaboration with NVIDIA and alignment with its software ecosystem may also strengthen Crusoe’s competitive positioning and create opportunities for deeper enterprise and cloud partnerships in the AI acceleration market.

