Insilico Medicine Benchmarks LLM Limits in 3D Drug Discovery Tasks

According to a recent LinkedIn post from Insilico Medicine, the company’s ongoing ScienceAIBench series is now testing whether leading large language models can infer specific 3D protein‑ligand interactions rather than just bulk molecular properties. The benchmark uses the LP‑PDBBind dataset and Chemistry42 pharmacophore engine as ground truth, focusing on restored interactions, interaction powers, and the rate of false or “fake” interactions.

Claim 55% Off TipRanks

Unlock hedge fund-level data and powerful investing tools for smarter, sharper decisions
Discover top-performing stock ideas and upgrade to a portfolio of market leaders with Smart Investor Picks

The post highlights that some models, such as GPT 5.1 and Opus 4.5, achieved high validity in output formatting, while others like GPT 5.2 and Nemotron struggled to produce usable responses. Even among valid outputs, spatial accuracy appears limited, with Opus 4.6 restoring the highest fraction of exact interactions and Opus 4.5 performing best on interaction strength predictions.

Insilico’s analysis underscores a high hallucination rate in 3D interaction predictions, with even the strongest performers generating a large share of non‑existent bonds. The post suggests a significant performance gap between 1D scalar molecular predictions and reliable 3D spatial reasoning, implying that current frontier models may be insufficient on their own for precision tasks in structure‑based drug discovery.

For investors, the benchmark activity indicates that Insilico is positioning its Chemistry42 platform as a reference standard for evaluating AI models in 3D pharmacophore prediction. This could support the company’s competitive standing in AI‑driven drug discovery tools, especially if pharmaceutical and biotech customers view its benchmarking results as an argument for specialized, domain‑specific platforms over general‑purpose LLMs.

The findings may also temper expectations around rapid automation of complex structural biology workflows using generic AI models, reinforcing the need for proprietary data, physics‑aware engines, and integrated platforms. If Insilico continues to publish comparative performance data, it could enhance the firm’s visibility in the AI‑for‑drug‑discovery ecosystem and potentially strengthen its appeal as a technology partner or acquisition target for larger life‑science players.

Disclaimer & Disclosure Report an Issue

Insilico Medicine Benchmarks LLM Limits in 3D Drug Discovery Tasks

Claim 55% Off TipRanks

Latest News Feed

More Articles

Stock Comparison

Investment Ideas