1. Computational and Systems Biology
  2. Immunology and Inflammation
Download icon

Method for identification of condition-associated public antigen receptor sequences

  1. Mikhail V Pogorelyy
  2. Anastasia A Minervina
  3. Dmitriy M Chudakov
  4. Ilgar Z Mamedov
  5. Yuri B Lebedev  Is a corresponding author
  6. Thierry Mora  Is a corresponding author
  7. Aleksandra M Walczak  Is a corresponding author
  1. Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, Russia
  2. Skolkovo Institute of Science and Technology, Russia
  3. Central European Institute of Technology, Czech republic
  4. Moscow State University, Russia
  5. CNRS, Sorbonne University, Paris-Diderot University, École Normale Supérieure, France
  6. CNRS, Sorbonne University, École Normale Supérieure, France
Tools and Resources
Cite as: eLife 2018;7:e33050 doi: 10.7554/eLife.33050
4 figures, 2 tables, 2 data sets and 1 additional file


Method principle and pipeline.

(Top left) Sequence overlap between two TCR or BCR repertoires. (Bottom left) There are two major mechanisms for sequence sharing between two repertoires: convergent recombination and convergent selection. Because convergent recombination favors sequences with high generation probabilities, these two classes of sequences have different distributions of the generative probability, Pgen(σ). (Right) We estimate the theoretical Pgen(σ) for each sequence σ and compare it to Pdata(σ), which is empirically derived from the sharing pattern of that sequence in the cohort. Comparison of these two values allows us to calculate the analog of a p-value, namely the posterior probability that the sharing pattern is explained by the convergent recombination alone, with no selection for a common antigen.

Identification of condition-associated clonotypes using generative probability

(A) CDR3aa of antigen specific clonotypes (red circles) have less generative probability than other clonotypes shared among the same number of donors. The number of in silico rearrangements obtained for each TCRβ sequence in our simulation (which is proportional to generation probability for each clonotype in a given VJ combination Ppost(σ)), plotted against the number of patients with that TCRβ clonotype. (B) Model prediction of generative probabilities agrees well with data. To directly compare Ppost(σ) to data, we estimate the empirical probability of occurrence of sequences, Pdata(σ), from its sharing pattern across donors (see Materials and methods). In A. and B. red dots indicate significant results (adjusted P<0.01, Holm’s multiple testing correction), while red circles point to the responsive clonotypes identified in the source studies.

Calibration curve for TRBV5-1 TRBJ2-6 combination.

Here we plot the fraction of unique amino acid sequences to recombination events against the logarithm of the number of recombination events. The blue line corresponds to the theoretical solution with selection, the red line corresponds to the theoretical solution without selection.

Simulation of the method performance with different cohort sizes, sequencing depths, effect sizes and target clone abundances in population.

In panels (A. B. C) we plot the number of simulations (out of 100) where a clone with a given effect size q (line color, see legend) and P~data (x-axis) is found to be significant using our approach, for cohort sizes of 10, 30 and 100 donors respectively. Larger cohort sizes and effect sizes make it possible to resolve clonotypes with lower abundance in the population. In panel (D) we show the effect of sequencing depth for fixed q=10: larger numbers of clonotypes sequenced per donor allow us to resolve less frequent clones, since a clone of a given P~data is detected in a larger fraction of donors (panel E).



Table 1
Published antigen-specific clonotypes used to test the algorithm.
CDR3aaV-segmentJ-segmentAntigen sourceRef.
CASSLAPGATNEKLFFTRBV07-06TRBJ1-4CMV(Emerson et al., 2017)
CASSPGQEAGANVLTFTRBV05-01TRBJ2-6CMV(Emerson et al., 2017)
CASASANYGYTFTRBV12-3,−4TRBJ1-2CMV(Emerson et al., 2017)
CASSLVGGPSSEAFFTRBV05-01TRBJ1-1self(Seay et al., 2016; Gebe et al., 2009)
Table 2
Output of the algorithm for sequences from Table 1.
CDR3aaVJAg.source .p-value rankp-valueEffect size

Data availability

The following previously published data sets were used
  1. 1
  2. 2

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)