Figures and data

Signaling networks generated by general-purpose large language models.
A) Schematic of pipeline for LLM-generated models of signaling networks. B) Network reactions recalled by three large language models (Gemini2.0, orange; ChatGPT4, blue; Claude3.7, green) compared with a “Ground Truth” literature-curated and validated cardiomyocyte hypertrophy signaling network13 (gray reactions). LLM networks were generated using iterative prompts based on the gene set list of the Ground Truth hypertrophy network. Summary of reaction recall accuracy for three literature-curated signaling networks (C, hypertrophy13; D, fibroblast14; and E, mechanosignaling15) by Gemini, GPT, and Claude. * Indicates p < 10−9 in one-sample T test between LLM-generated replicates (n = 10) and the ground truth network.

Experimental validation of perturbation responses predicted by LLM-generated signaling network models.
A) Representative validations of network models generated by manual curation or by LLMs (Gemini, GPT, Claude), in comparison to experiments in conditions of Angiotensin II (AngII) or isoproterenol (ISO) from the literature13. B) Summary of systematic validations of manually curated (Ground Truth) and LLM-generated network models of hypertrophy, fibroblast, and mechano-signaling network models against perturbation experiments from the literature (n = 114, 83, and 171 experiments, respectively). * Indicates p < 10−11 in one-sample T test between LLM-generated model validation scores (n = 10 replicates) and ground truth model validation accuracy.

Visualization of LLM-generated fibroblast signaling networks, as recalled by three general-purpose large language models.
Network reactions recalled by three large language models (Gemini2.0, orange; ChatGPT4, blue; Claude3.7, green) compared with a “Ground Truth” literature-curated and validated fibroblast signaling network (gray reactions). LLM-generated networks used prompts based on the gene set list of the Ground Truth fibroblast network. This visualization corresponds to the analyses in Figure 1D.

Visualization of LLM-generated mechanosignaling networks, as recalled by three general-purpose large language models.
Network reactions recalled by three large language models (Gemini2.0, orange; ChatGPT4, blue; Claude3.7, green) compared with a “Ground Truth” literature-curated and validated mechanosignaling network (gray reactions). LLM-generated networks used prompts based on the gene set list of the Ground Truth mechanosignaling network. This visualization corresponds to the analyses in Figure 1E.