Mercor has shared an update. The company has expanded APEX, its benchmark designed to assess whether frontier AI models can perform economically valuable work, doubling the size of the evaluation set to 400 real-world cases across four high-value professions: investment banking associate, management consultant, big law associate, and primary care physician. The latest results indicate that OpenAI’s GPT-5 currently leads performance, investment banking remains the most challenging domain with top models achieving around 60% performance, and models are improving over time—Anthropic’s Claude Opus 4.5 gained nearly 12 points over Opus 4.1, while Google’s Gemini 3 Pro is approaching GPT-5. Mercor is also open-sourcing 100 cases (25 per domain) and its evaluation harness.
Claim 55% Off TipRanks
- Unlock hedge fund-level data and powerful investing tools for smarter, sharper decisions
- Discover top-performing stock ideas and upgrade to a portfolio of market leaders with Smart Investor Picks
For investors, this development reinforces Mercor’s positioning as an emerging provider of standardized, high-signal benchmarks for applied AI in professional services and healthcare. By focusing on economically meaningful tasks and making part of its dataset and tooling open-source, Mercor may increase adoption of APEX by AI labs, enterprises, and researchers, which could translate into data partnerships, benchmarking-as-a-service offerings, or consulting revenue. The finding that there remains a significant gap between model capability and economic usefulness underscores the ongoing need for rigorous, domain-specific evaluation frameworks, a niche in which Mercor could build durable relevance as the AI ecosystem matures. If APEX becomes a reference standard for assessing AI readiness in high-value knowledge work, Mercor’s influence and monetization potential within the AI tooling and evaluation segment could strengthen over time.

