According to a recent LinkedIn post from Deccan AI, the company is drawing attention to what it describes as limitations in current benchmarks for deep research agents, which often focus primarily on information retrieval. The post suggests that such benchmarks may overlook critical capabilities such as forward reasoning, outcome forecasting, and the ability to generate novel hypotheses.
Claim 55% Off TipRanks
- Unlock hedge fund-level data and powerful investing tools for smarter, sharper decisions
- Discover top-performing stock ideas and upgrade to a portfolio of market leaders with Smart Investor Picks
The post indicates that Deccan AI has developed a proprietary benchmark incorporating new rubrics labeled “Outcome Forecasting” and “Future Scope,” and applied these to 104 PhD-level prompts tested on ChatGPT Deep Research and Gemini Deep Research, with scoring by subject-matter experts. According to the shared results, ChatGPT reportedly achieved stronger retrieval scores but weaker forecasting performance, while Gemini showed comparatively better forecasting but weaker retrieval.
In addition, the LinkedIn post highlights what Deccan AI calls a “compliance paradox,” where more specific instructions in prompts appeared to degrade model performance, and in divergent ways across the two systems. For investors, this line of research could position Deccan AI as a specialized evaluator of AI research agents, potentially creating opportunities in tooling, benchmarking services, or consulting for enterprises deploying advanced AI in research workflows.
If the benchmark gains industry traction, Deccan AI could benefit from increased visibility among AI vendors, institutional research organizations, and corporate R&D teams seeking more nuanced evaluation frameworks. However, financial implications will depend on whether the company can translate its methodological work into scalable products, standardized evaluation services, or partnerships with major AI model providers and large enterprise clients.

