Breadcrumb

Karen Paco, Keck Graduate Institute

Large Language Models Accurately Predict Antigen-Specific B Cell Receptors from Unselected Immune Repertoire Sequencing

The ability to identify antigen-specific B cell receptors (BCRs) from unselected immune repertoire sequences remains a central challenge in immunology. Here, we present a large language model (LLM)-based framework trained on curated datasets of SARS-CoV-2, influenza, and HIV.  This model accurately predicts antigen specificity directly from BCR sequences. Applying our approach to mice immunized with a challenge antigen, the model accurately identified antibodies specific to the antigen without prior selection based on binding. Experimental and computational validation of the top predictions through molecular dynamics simulations and surface plasmon resonance (SPR) confirmed their binding affinities, demonstrating the model’s capability to predict antigen-antibody specificity from sequence data alone. These results suggest that sufficient information exists in BCR-antigen sequence pairs for LLMs to reliably predict antigen-antibody interaction specificity, potentially opening new avenues for the computational design of synthetic antibodies for vaccine and therapeutic development, and for rapidly identifying autoantigen-specific circulating antibodies.