A LinkedIn post from Insilico Medicine describes the latest installment of its ScienceAIBench series, focusing on antibody developability benchmarks. The post outlines how the company is comparing large language models on their ability to predict two chromatography-based readouts: heparin affinity chromatography (HAC) and hydrophobic interaction chromatography (HIC) retention times.
Claim 30% Off TipRanks
- Unlock hedge fund-level data and powerful investing tools for smarter, sharper decisions
- Discover top-performing stock ideas and upgrade to a portfolio of market leaders with Smart Investor Picks
According to the post, HAC is used as a proxy for surface charge distribution and electrostatic interaction propensity, while HIC reflects exposed hydrophobic patches and aggregation risk. These metrics are presented as relevant for assessing nonspecific interaction liability and the overall developability of therapeutic antibodies, an important factor for biologics pipelines.
The company’s benchmark reportedly evaluates several frontier AI models, including Anthropic’s Sonnet 4.5 and Opus 4.6, Grok 4.1, Gemini 3 Flash, Deepseek 3.2, GPT 5.1 and 5.2, and Nemotron 30B. Performance is measured using Spearman correlation between model predictions and experimental retention times, with higher values indicating stronger monotonic agreement.
The post highlights that Anthropic’s Sonnet 4.5 and Opus 4.6 achieved the highest Spearman correlations for HAC, at 0.499 and 0.492, respectively, followed by Grok 4.1 at 0.430. This suggests that some models may be increasingly capable of capturing electrostatic features relevant to antibody developability, which could be leveraged in early-stage biologics design.
For HIC, which the post notes as a more challenging hydrophobic interaction prediction task, Grok 4.1 is cited as leading with a correlation of 0.339. Gemini 3 Flash and Sonnet 4.5 showed lower but positive alignment, with correlations of 0.215 and 0.140, indicating that hydrophobic aggregation risk remains a tougher modeling problem for current AI systems.
The LinkedIn post emphasizes a “predictability gap,” with models performing substantially better on HAC than HIC. From an investor perspective, this gap underscores both the current limitations and future opportunity in AI-driven biophysics, where improved modeling of hydrophobic interactions could enhance candidate screening and reduce downstream development risk.
Several models are characterized as lagging, with GPT 5.2 and Nemotron 30B showing near-random ordering for HAC, and GPT 5.1 displaying a negative correlation for HIC. This dispersion in performance might signal that Insilico’s benchmarking framework could become a reference point for evaluating foundation models in biotech, potentially positioning the company as an arbiter of AI model suitability for drug discovery tasks.
The ongoing daily series, as mentioned in the post, indicates a systematic effort to build a comprehensive benchmark suite around molecular and biophysical prediction tasks. For investors, this activity may suggest that Insilico Medicine is investing in differentiated infrastructure and datasets that could strengthen its competitive moat in AI-first drug discovery and generate partnership or licensing opportunities over time.

