B cells are a relatively undercharacterized component of the tumor microenvironment. In human tumors, infiltration with B-cells, as cells and T-cells is often a positive prognostic factor, especially in conjunction with the formation of TLS. Mechanistically, tumor-infiltrating B-cells (TIBs) may act via presentation of BCR-cognate antigens to T-cells 13, production of pro- or anti-tumor cytokines 47, and production of antibodies, which may be tumor-specific8 and enhance killing of tumor cells via ADCC9 and complement-induced cytotoxicity, enhance antigen-presentation by dendritic cells10 or form immune complexes that promote the activation of MDSC11. The effector functions of antibodies are highly dependent on antibody isotype12 and specificity. For instance, IgG1 is the most efficient factor in ADCC, and IgG3 (together with IgG1) is capable of complement-dependent cytotoxicity 13. These functions depend on the ability of an antibody to bind its cognate antigen, which increases during B-cell maturation and somatic hypermutation14. Isotype switching, somatic hypermutation (SHM), and B-cell proliferation and differentiation are influenced by the tumor microenvironment, neoantigen burden, and presence of certain driver mutations15. Therefore, the properties of immunoglobulins produced within the tumor are both a reflection of TME features and a factor that shapes the TME.

The whole population of immunoglobulin receptors and immunoglobulins produced within a certain tissue or cell population is known as a BCR repertoire. BCR repertoire analysis has become an important approach for characterizing adaptive immune responses in health and disease1618. Starting from genomic DNA or RNA, libraries representing the sequences of variable domains of antibodies can be produced and sequenced using high throughput sequencing1923. Alternatively, immune receptor sequences can be derived from RNA-Seq data2427. Then, the properties of the variable parts of immune receptor sequences are studied using bioinformatics tools that are currently quite elaborately developed 2834.

For BCR repertoires, features that are considered functionally important are clonality, isotype composition, biases in V- and J-gene usage, and extent of SHM3538. In addition, characteristics of the immunoglobulin CDR3 region, such as the number of added nucleotides, length, and amino acid physicochemical properties, have functional implications and have been linked to antibody specificity39 and to the B-cell maturation stage40.

Correspondingly, the characteristics of the immune repertoire have been found to be associated with clinical outcomes (15,25,4143). Specifically, high intratumoral IGH expression, high IGH clonality, and a high proportion of IgG1 among all IGH transcripts were strongly correlated with higher overall survival in melanoma25. Similarly, high IgG1 mRNA levels are positively associated with improved prognosis in early breast cancer44. A high intratumoral IgG1 to IgA ratio was associated with improved overall survival in bladder cancer45 and for KRASmut but not KRASwt LUAD, suggesting the first link between driver mutation and B-cell response15. In lung adenocarcinoma, breast cancer, and bladder cancer, only a subset of V-segments is associated with improved survival46. The co-occurrence of specific antibody motifs and mutated tumor-derived peptides, presumably indicating specificity to particular tumor neoantigens, was also correlated with longer survival in colorectal cancer43.

The tumor immune repertoire may be used as a source of tumor-antigen-specific antibodies for therapy development 4749. For instance, in melanoma 30%-80% of the total BCR repertoire 25 may constitute one or several dominant B cell clones. Corresponding antibodies can be produced, verified for tumor reactivity 49, and further employed in CAR-T or other therapeutic approaches. The knowledge of antigen specificity of cancer-associated antibodies is currently insufficient, and it is mostly limited to autoantigens and some tumor-associated antigens 5055.

However, even in the absence of knowledge of cognate antigens, certain hypotheses may be derived from BCR repertoire characteristics. Some individual IGHV-genes genes and subgroups have been associated with autoimmunity 56. Likewise, an increase in the BCR clonal expansion index 57, dominated by IgA and IgM isotypes, was associated with SLE and Crohn’s disease 38.

The ability to accurately characterize the B cell repertoire in the tumor microenvironment is of vital importance for both fundamental and clinical challenges. However, tumor tissue may not always be available for analysis; therefore, it would be beneficial to derive information about the tumor BCR repertoire from peripheral blood or draining lymph nodes. It is not known, however, whether tumor-dominant BCR clonotypes can be found in the peripheral blood of cancer patients and at what frequencies. The ability to detect these clonotypes in a patient’s peripheral blood could have a predictive value for disease prognosis. In addition, it is not known how the BCR repertoire characteristics of peripheral blood B-cells and tumor-draining lymph nodes relate to the repertoire of TIL-B.

Therefore, in the current study, we aimed to comprehensively and systematically evaluate tumor, lymph node, and peripheral blood BCR repertoires and their interrelationships. We show that the tumor BCR repertoire is closely related to that of draining lymph nodes, both in clonal composition and isotype proportion. Furthermore, we observed that different lymph nodes from one draining lymph node pool may be more or less involved in the interaction with the tumor, as reflected by the similarity of the BCR repertoire clonal composition. CDR3 properties indicate a less mature and less specific BCR repertoire of tumor-infiltrating B cells compared to circulating B cells. BCR clonotypes that expand in a tumor relative to other tissues are, on average, less hypermutated than non-expanded (ubiquitous) clonotypes.


Experimental and Computational Study Design

To systematically study the relationships between BCR repertoires in tumors and normal peripheral compartments, we performed RNA-based targeted BCR repertoire analysis from four tissue types: tumor (tum), corresponding normal tissue (norm), draining lymph node (DLN), and blood (PBMC), of 17 cancer patients (melanoma, n = 9; lung cancer, n = 4; and colorectal cancer, n = 4). To account for spatial heterogeneity, we obtained 3 fragments of tumor tissue per patient. For DLNs, we either dissected them into three separate pieces to study intra-LN spatial heterogeneity (lung cancer and melanoma), or, where available, obtained three separate DLNs to study the differential involvement of DLNs in the interaction with tumors (colorectal cancer, patients ccp2 and ccp3) (Fig. 1A). All fragments were homogenized separately into single-cell suspensions. To account for sampling noise at the level of individual cells, we obtained two replicate samples from each fragment after homogenization. The relative similarity of clonotype frequencies in replicate samples and separate tumor fragments is shown in Fig. 1B. BCR repertoires were obtained using the 5’-RACE protocol as previously described21.

Experimental design.

A - Overview of the experimental design and procedures. B - Using technical replicates at the cell suspension level to account for sampling bias in the clonotype frequencies.

In this study, we used bulk RNA-based BCR profiling, with the understanding that the dominant clonotypes and B cell lineages reproducibly represented in biological replicates reflect the presence of clonally expanded plasma cells, plasmablasts, and the most expanded memory B cell clones. In other words, the RNA-based approach mainly revealed the repertoire of functionally active B-cell lineages (Supplementary Note 1).

In most experimental setups, we employed short 150+150 nt sequencing covering the IGH CDR3 regions, using our 5’RACE method. This approach allowed us to achieve high coverage for the obtained libraries (Table S1) to reveal information on clonal composition, CDR3 properties, IgM/IgG/IgA isotypes, somatic hypermutation load within CDR3, and clonal B cell lineages.

Tumor/non-tumor repertoire overlap and isotype composition

First, we characterized the relative similarity of IGH repertoires derived from tumors, DLN, and PBMC using the repertoire overlap metric F2, which accounts for both the number and frequency of overlapping clonotypes (Fig. 2A). As overlap metrics are dependent on overall repertoire richness, we normalized the comparison using the same number of top most frequent clonotypes of each isotype from each sample (N = 109). Repertoire data for each sample were split according to the immunoglobulin isotype, and the F2 metric was calculated for each isotype separately and plotted as an individual point.

Repertoire overlap, clonality and isotype composition.

A - repertoire overlap between pairs of tissues, by F2 metric, repertoires split by immunoglobulin isotype; B - network representation of Ig repertoires from PBMC, LN, and tumor of mp3 patient (melanoma); individual clonotypes of the same origin (PBMC, tumors, or LN) are shown as bubbles connected with the edges to one anchor node. Clonotypes shared between tissues were connected with two or three edges to the corresponding anchor nodes and were located between them. The size of the bubbles represents the relative frequency of clonotypes within a sample, and the color represents the isotype. The relative distance between anchor nodes corresponded to the similarity of repertoires (the number of shared clonotypes). C, D - Clonality of Ig repertoires in PBMC, LN, and tumors of N cancer patients. This reflects the presence of clonal expansion. Calculated as in 58: [1-normalized Shannon-Wiener index]; C–total IGrepertoire; D–IgM repertoire; E–LN/tumor overlap for IgA and IgG repertoires; F– PBMC/tumor overlap for IgA and IgG repertoires; G, H–isotype fraction correlation between PBMC and tumor repertoires (G), or between LN and tum repertoires (H).

As expected, significantly lower overlaps were observed between the IGH repertoires of peripheral blood and tumors compared to DLN/tumor overlaps. The DLN/PBMC overlap also tended to be lower, but the difference was not statistically significant. At the individual patient level, we used the Cytoscape network visualization platform to visualize the structure of the repertoire overlap.

As exemplified by mp3 (Fig. 2B), the repertoire from the tumor is closely related to the DLN repertoire, whereas the PBMC repertoire has very few overlapping clonotypes with both tumors and DLN. Overall, these analyses revealed that the extent of clonal exchange between tumors and PBMC was significantly lower than that between tumors and DLN. The frequencies of overlapping clonotypes were also more strongly correlated between tumors and DLN than between tumors and peripheral blood (Fig. S2A, R metric). The level of clonal exchange between tissues was dependent on isotype (Fig. 2 E, F). DLN/tumour overlap was higher in the IgG repertoire (Fig. 2E), whereas PBMC/tumour overlap was lower in the IgG repertoire (Fig. 2F) than in the IgA repertoire. This suggests that tumor-induced IgG-expressing B-cells (IgG-TIL-B) avoid systemic circulation, whereas IgA-expressing tumor-infiltrating B-cells (IgA-TIL-B) may be found in the peripheral blood with a higher probability. In addition, the tumor repertoire was significantly more clonal (clonality calculated as [1-normalized Shannon-Wiener index]58)than the PBMC or DLN repertoire, both overall (Fig. 2C) and separately for the IGHM repertoire (Fig. 2D).

We did not find a statistically significant difference in isotype composition between cancers in any of the studied tissues, with the exception of IgM percentage in melanoma tumors. In BCR repertoires from melanoma tumors, the total percentage of repertoire consisting of IgM clonotypes was significantly lower than that in other types of cancers (colorectal and lung) (Fig. S2B). The isotype composition of non-tumor tissues correlated with the isotype composition of the tumor; this effect was less prominent in peripheral blood (R = 0.42, p = 0.028)(Fig. 2G) and more prominent for DLN (R = 0.74, p < 0.01)(Fig. 2H).

CDR3 properties

Analysis of averaged CDR3 repertoire characteristics revealed increased CDR3 length in tumors compared to PBMC for the total repertoire (Fig. 3A) and also in IgA and IgG repertoires separately (Fig. S3A, B), but this was not the case for IgM (FigS3C). In addition, the increase in CDR3 length in IgA repertoires from lymph nodes compared to PBMC was statistically significant (Fig. S3A). Interestingly, the only significant difference we found when comparing CDR3 lengths between cancer types was reduced IgA CDR3 length in colorectal cancer, especially compared to melanoma (p=0.02) (Fig. 3B). This could reflect a generally more mature IgA repertoire in colon tissues owing to the previous history of interactions with microbiota59,60. However, the relationship between this difference and tumor antigen specificity remains to be verified.

CDR3 amino acid properties

A - Mean amino CDR3 length of top 100 most frequent clonotypes from tumor, lymph node and PBMC tissues irrespective of isotype CDR3 are on average significantly higher in tumor than PBMC for total repertoire (p<0.01, two-sided t-test, Bonferroni correction); B - Comparison of mean amino acid CDR3 length of 100 most frequent clonotypes for colon, lung and melanoma cancer samples from, tumor. CDR3s of tumor-infiltrating clonotypes were shorter for colorectal cancer patients compared to melanoma in IgA repertoires (p=0.02); C - Comparison of amino acid properties in the central region of CDR-H3, for total repertoire (C-left) or IGHG repertoire (C-right), all cancers (significantly increased - red, significantly decreased - blue) two-sided t-test, Bonferroni-Holm correction; D - Comparison of amino acid properties in the central region of CDR-H3, for IGHM repertoire, lung cancer (significantly increased - red, significantly decreased - blue); (p<0.01, two-sided t-test, Bonferroni-Holm correction)

To explore CDR3 physicochemical properties, we calculated the mean charge, hydropathy, predicted interaction strength, and Kidera factors 1 - 9 61 for five central amino acids of the CDR3 region for the 100 most frequent clonotypes of each sample using VDJtools. Comparing between tissues, we found that kf4 value for the tumor repertoire was decreased compared to PBMC in the total repertoire, and kf5 was decreased in tumor vs. PBMC in the IgG repertoire (Fig. 3C). In addition, kf6 expression was decreased in the LN repertoire compared to PBMC. Kf4 was inversely correlated with hydrophobicity, indicating a higher proportion of hydrophobic residues in BCR CDR3s from the tumor repertoire. Kf5 reflects a double-bend preference and has not been previously found to be significant in the context of antibody properties. Kf6 is a measure of partial specific volume; therefore, a lower kf6 value indicates less bulky amino acid residues in CDR3s from the LN repertoire. Between the tumor and normal tissue repertoires, the hydropathy value was lower in normal lung tissue, also indicating a higher proportion of hydrophobic residues in tumor-derived CDR-H3 repertoires (Fig. 3D). According to 40, more mature B-cell subpopulations have higher mjenergy, disorder, kf4, kf6, and kf7 and lower cdr3 length, strength, volume, kf2, and kf3. Again, the mean CDR3 charge was negatively associated with specificity 62 and beta-sheet propensity was associated with antibody promiscuity 63.

Poly reactive and self-reactive antibodies have, on average, longer CDR3s 64 with a higher charge62 and net hydrophobicity6567.

Therefore, collectively, our observations suggest a less mature and less specific BCR repertoire of tumor-infiltrating B cells compared to circulating B cells and B cells infiltrating normal tissue, which may indicate less stringent control for antibody-producing B cell development in the TME.

Immunoglobulin hypermutation analysis across tissues and isotypes

The intensity of somatic hypermutation (average number of mutations relative to the most recent common ancestor, MRCA) reflects the average extent of antigenic selection experience of the clonotypes found in a given tissue or cancer type. No significant difference was found between PBMC, DLN, and tumors for the top 100 most frequent clonotypes in the total repertoire as well as for the top 100 most frequent clonotypes of each isotype separately (not shown). This indicates that in the RNA-based BCR repertoires, in all studied tissues, the most dominant immunoglobulin clonotypes belong to cell populations with an equivalent history of antigen exposure, selection, and maturation. However, there was a statistically significant difference in the number of hypermutations in IgG between the cancers (Fig.3E).

Clonal exchange between tissues at the level of B cell lineages

Next, we investigated clonal exchange between the blood, lymph nodes, and tumors at the level of hypermutating clonal lineages, which are likely to be involved in recent and ongoing immune responses. The results are shown in Fig. 4A. After pooling patient-wise clonotype data, clonal groups were formed by sequence similarity, and then IGH clonotypes within a group were assembled into clonal lineages that shared a common ancestor 68 and represent a B-cell clone undergoing the affinity maturation process. Each clonotype within a clonal group was attributed to the tissue of origin, tumor, DLN, and/or peripheral blood, and to a particular isotype. For each clonal group, the percentage of clonotypes belonging to each isotype and tissue was calculated. Clonal groups from all patients with a given cancer type were plotted on a triangle plot using the percentage of clonotypes from the tumor, DLN, and blood as coordinates and colored according to the dominant isotype (>60%).

Immunoglobulin hypermutation analysis across tissues and isotypes

A - Schematic representation of analysis and triangle plot visualization of clonal group distribution between tissues; B, C, D - clonal group distribution between tissues for colorectal (B), lung (C), and melanoma (D); stars represent non-weighted by size mean center of triangle coordinates. Chi-2 test for goodness of fit was used to test whether each tissue equally contributed to clonal group formation.

In all studied cancer types, IgA-dominated clonal groups were evenly distributed among the three tissues, indicating no preference for IgA-switched cells towards lymphoid or tumor tissue residence. IgG-dominated clonal groups showed a preference for tumor and DLN residence in lung cancer and melanoma, in accordance with the idea of tight interaction between DLN and tumor in mounting anti-tumor immune responses. IgM-dominated clonal groups showed a strong preference for colorectal cancer tumors, indicating intensive intratumoral somatic hypermutation without isotype switching (Fig. 4B, red triangle).

LN-to-LN heterogeneity in colorectal cancer

Next, we sought to investigate whether BCR repertoire analysis could discern among a group of tumor-draining lymph nodes that are in more intensive clonal exchange with the tumor. This question may be addressed at the individual clonotype level and at the level of hypermutating clonal groups.

At the individual clonotype level, we compared F2 metric values for pairwise tumor/DLN BCR repertoire overlaps from 2 colorectal cancer patients, for which we obtained three separate lymph nodes from the excised surgical material. For patient ccp2, a significantly higher overlap of tumor repertoire with one of the DLNs was observed (Fig.5 A). Similarly, Cytoscape network analysis showed more clonotype sharing between LN3 than between other DLNs (Fig.5 B). For ccp3, no significant difference was observed between DLNs (Fig.5 A, right panel). Similarly, an unequal interaction of tumors with DLNs was observed at the level of hypermutating clonal groups. We used the proportions of clonotypes originating from LN1, 2 and 3 among all LN-clonotypes within a clonal group as coordinates for triangle plots (Fig. 5 C). For ccp2, most of the LN3-focussed clonal groups were also prominently represented in the tumor. However, most of the other clonal groups resided in the center of the triangle, so the mass center was not shifted towards any of the LNs overall. Functionally, this may again indicate that within a group of DLNs, nodes are unequal in terms of access to tumor antigens, and this inequality shapes the BCR repertoires within these lymph nodes. For ccp3, no such inequality was observed. To validate this hypothesis, it would be beneficial to obtain a BCR repertoire from non-tumor-draining lymph nodes; however, this was not possible in the current study.

LN-to-LN difference of BCR repertoires in colorectal cancer.

A - repertoire overlap between tumor and three separate lymph nodes from the draining lymph node pool; Mann-Whitney test, Bonferroni correction. B: Network representation of Ig repertoires from tumor and three separate lymph nodes, with circles representing individual CDR3 sequences, size of the circles corresponding to the clone frequency, and color corresponding to the isotype; C - triangle plot visualization of clonal group distribution between three different lymph nodes, with size corresponding to the number of individual CDR3 sequences (clonotypes) within a given clonal group, and color corresponding to the percentage of tumor-derived clonotypes–within the clonal group; D–example of a clonal lineage consisting of CDR3 sequences derived from tumor (blue) and all three LNs (shades of red) from patient ccp2, shapes representing isotypes and size representing frequency of a given sequence in a given sample.

These observations reflect a complex interplay between tumors and DLNs, which may be involved in the adaptive immune response to tumor antigens.

Intra-LN heterogeneity

Likewise, we asked whether parts of lymph nodes are equally involved in interaction with tumor. For ccp6 and lcp3, we obtained BCR repertoires from 3 fragments of one DLN and compared them to tumor repertoires on the individual clonotype and clonal group level. For ccp6, repertoire from LN2 showed significantly higher overlap with the tumor than other two LNs (Fig. 6A, left). Triangle plot analysis of clonal group distribution also showed that the majority of the tumor-dominated (red) clonal groups resided in the LN2 angle of the triangle (Fig.6B, left). For ccp6, repertoire from LN1 showed significantly less overlap with tumor on the individual clonotype level(Fig. 6A, right), whereas on the clonal group level no obvious LN dominance was observed (Fig.6B, right).

Intra-LN and intratumoral heterogeneity

A - repertoire overlap between tumor and lymph node fragments for colorectal cancer patient ccp6 and lung cancer patient lcp3; Mann-Whitney test, Bonferroni corr. B - triangle plot visualisation of clonal group distribution between lymph node fragments, with size corresponding to the number of individual CDR3 sequences (clonotypes) within a given clonal group, and color corresponding to the percentage of tumor-derived clonotypes within the clonal group; C - network representation of Ig repertoires from tumor fragments, with circles representing individual CDR3 sequences, size of the circles corresponding to the clone frequency, and color corresponding to the isotype, edges connect clonotypes to their fragment of origin; D - triangle plot visualization of clonal group distribution between tumor fragments, with size corresponding to the number of individual CDR3 sequences (clonotypes) within a given clonal group, and color representing dominant clonotype; Stars represent non-weighted by size mean center of triangle coordinates. Chi-2 test for goodness of fit is used to test if each tumor segment equally contributes to clonal groups formation.

Intratumoral heterogeneity of immune repertoires

Spatial heterogeneity of the T cell receptor repertoire is known to reflect the mutational landscape of tumor 41,69. Statistically, the magnitude of intratumoral genetic heterogeneity correlates with the heterogeneity of immune cell infiltration, implying the co-evolution of the tumor genetic architecture and immune microenvironment 70. Higher heterogeneity of tumor-infiltrating T-cell repertoire is associated with higher risk of recurrence and shorter disease-free survival 41.

B-cell repertoire heterogeneity, however, remains relatively understudied. The mechanisms underlying TIL-B heterogeneity may involve not only differential infiltration and accumulation of lymphocytes, but also local development of tertiary lymphoid structures which then drive the development of local immune response specific to subclonal neoantigens.

Here we aim to quantify intratumoral BCR repertoire heterogeneity, measured as the extent of clonotype or clonal group sharing between parts of the tumor. First, we confirm for all studied tumor samples, that repertoires derived from separate tumor fragments are significantly less overlapping (F2 metric) than repertoires from biological replicates of one tumor fragment (Fig.S4). Then we pooled repertoires from replicates belonging to the same tumor fragments and analyzed the overlap between top 300 clonotypes from fragments of the same tumor using Cytoscape platform (Fig.6C). We observed significant patient-to-patient variability in the degree of heterogeneity between tumor fragments. In a tumor from patient ccp2, all three fragments have a high number of clonotypes in common (represented also by close distance between anchor nodes on the bubble diagram), and the isotype composition is also similar and dominated by IgA (Fig.6C,D left). Overall, this draws a picture of a relatively homogenous immune infiltrate in the tumor of this patient. In patient ccp3 (Fig.6 C, D, right), we see a remarkably different pattern. In ccp3, fragment tum_1 is dominated by IgA and has many shared clonotypes with fragment tum_3, which are also IgA-clonotypes. Fragment tum_2 shares many clonotypes with tum_3, and these clonotypes, as also total repertoires of fragments tum_2 and tum_3, are mostly IgG and IgM clonotypes. Heterogeneity of clonal group distribution was also significantly more prominent in ccp3 compared to ccp2 (Fig.6D). In ccp2, significant eccentricity was observed only for IgG-dominated clonal groups (red star on Fig.6D, left), which tended to share between fragments tum_2 and tum_3, and not tum_1. This may indicate that IgA clonal groups are specific either to clonal tumor antigens (that is, present in all parts of tumor), or to other ubiquitous antigenic determinants, whereas IgG clonal groups more likely are specific to tumor specific and/or subclonal antigens (present only in some parts of tumor). In ccp3, most clonal groups show a significant degree of eccentricity on triangle plot (Fig.6D, right), indicating heterogeneous distribution between tumor fragments, for IgA-dominated as well as for IgG-dominated clonal groups. Given that clonality of ccp3 tumor fragments tum_2 and tum_3 is also higher than that of ccp2 tumor, we hypothesize that heterogeneity plus high clonality indicate formation of TLS and thus efficient local immune response towards tumor-related antigens.

To analyze the presence of TLSs in our tumor samples, we used histology and immunohistochemistry. We found frequent dense leukocytic accumulations in the peritumoral region, as well as distant from the tumor. The latter were the mucosal/submucosal layers, whereas peritumoral ones were found in all the intestinal layers (Fig. S5B). Such an appearance in the outer layers and prominent accumulation near the tumor indicates that these are more likely to be TLS, rather than Peyer’s patch-like structures that are localized predominantly in the mucosal and submucosal layers. In order to evaluate whether this accumulation has organized TLS-related lymphocyte distribution, we used multicolor immunohistochemical staining of T and B cells together with TLS-related markers: high endothelial venule marker PNAd and CXCL13 chemokine, which drives TLS formation. We found that CD20 B cells and CD3 T cells were the major cell types within these patches, indicating their lymphoid origin (Fig S5E,F). Peritumoral structures have condensed B cells follicles surrounded by T cells and high endothelial vessels (HEV), confirming that these are primary follicular TLS71,72. Distant lymphoid structures in the mucosal layer have a more dispersed and intermixed T and B-cell distribution, indicating a lower degree of maturation. Similar lymphoid aggregates in the mucosa and peritumoral regions have been observed on all tissue blockers for patients with colorectal cancer (two for each patient). However, their comparison is unrepresentative, owing to variations in the sampling of tissue fragments.

The design of our study included obtaining cellular-level replicates for each processed tissue fragment. This allowed us to reliably detect CDR-H3 clonotypes that were significantly expanded in a given sample relative to other samples. Fig.7 shows an example of such an analysis. We used the EdgeR library from the Bioconductor package to detect clonotypes that were differentially expanded in separate fragments of melanoma tumors from patient mp3. These clonotypes are represented in Fig.7 A, B as colored circles on a frequency correlation plot. Expanded clonotypes with the highest frequencies were also visible on Cytoscape plots (Fig.7 C) and labeled according to their sequences. Interestingly, most expanded clonotypes are not attributed to any clonal group (and thus are not actively hypermutating), and the largest of these are IgM isotypes. Only one of the expanded clonotypes in this tumor was involved in the hypermutation process, and the structure of its clonal lineage is shown in Fig.7 D. On; other hand, none of the largest clonal lineages detected in this patient contained any expanded clonotypes (Fig.7 E). In addition, on average, expanded clonotypes have fewer mutations than non-expanded clonotypes, both in tumors and in tumor-draining LN (Fig.7 F, all patients).

Expanded clonotypes and hypermutation analysis.

A, B - Visualization of expanded clonotypes on frequency correlation plots for pairs of tumor fragments of melanoma tumor from patient mp3; C - Cytoscape network visualization of top 300 most frequent individual Ig CDR-H3 clonotypes, colored by isotype, with size of circles proportional to the frequency of a given clonotype; D - example of a clonal lineage containing expanded CDR-H3 sequence; G - examples of the biggest clonal lineages, none of which contain expanded CDR-H3 sequences; F - average number of mutations in expanded and non-expanded clonotypes; Mann-Whitney test.

Short vs long trees - phylogeny analysis

CDR3 is the most variable component of the BCR and is also the most important in terms of antigen recognition. However, missing mutations in other segments may lead to difficulties or a complete inability to accurately recover the phylogeny of hypermutations. Moreover, shorter sequencing reads, which may be sufficient to obtain CDR3 data, cover the V segment to a lesser extent, which may lead to less accurate germline gene assignment and, consequently, erroneous rooting of the phylogenetic tree.

To estimate the extent to which phylogenetic trees built on short-read data and CDR3 sequences alone reflected the ‘true’ phylogeny recovered from full-length (CDR1 to FR4 regions) sequences, we used two types of BCR libraries obtained from the same lung cancer patient material. Short CDR3 sequences were obtained from libraries prepared according to the 5’RACE protocol and sequenced with a 150+150bp read length. Long sequences were obtained from libraries prepared according to the IgMultiplex protocol and sequenced at 250+250 read length. Then, for each dataset, we independently inferred clonal lineages and built rooted phylogenetic trees for each clonal lineage of size five or more (see Materials and Methods for more details). To compare the structures of these tree sets, we selected clonotypes with the CDR3 sequence and the corresponding V and J genes present in both the long and short trees. We then calculated their average ranks by distance to the root within their clonal lineages and compared the ranks (Fig.S6 A). The correlation was around 0.7 (p<6*10−16). Therefore, phylogeny inferred based on short CDR3 sequences generally reflects the ‘true’ phylogeny inferred from the CDR1-FR4 repertoires. However, the average distance to the root for IgM and IgA/IgG isotypes, which have significant differences for long repertoires, did not show significance for short (CDR3-based) repertoires (Fig.S6 B).

One potential explanation for this could be that a larger sample size and a higher sequencing depth are required for short sequences to yield statistically significant differences. Indeed, it is impossible to avoid sporadic misplacements of short sequences in the tree when the information on hypermutations is limited. On the other hand, we clearly observed a correlation between long and short tree phylogenies. Therefore, we suppose that the reason for this contradiction is an insufficient sample size. However, this is evidence that CDR3 data do not always perfectly represent the ‘picture’ and should be treated with particular attention.

Grimsholm et al. in 2020 that memory B cells have, on average, higher Kidera factor 4 values and lower predicted interaction strength than naïve B-cells40. We assumed that these features may evolve during the affinity maturation process. Therefore, we checked how the mean kf4, strength, and charge of the seven central CDR3 amino acids depend on the clonotype’s distance to the MRCA rank on the phylogenetic tree. To ensure that there was no bias in phylogeny as a result of short sequencing length and small sample size, we used full-length sequence trees inferred from lung cancer patient lcp3 and full-length repertoire of very high sequencing depth from a healthy donor. However, we did not find any correlation between strength, charge, and kf4 and the clonotype’s position on a tree (Fig.S6 C). One exception was a near-zero increase in charge down the tree found in the healthy donor repertoire (r=0.04, p=4.9*10−05), which, despite being statistically significant, is probably not mechanistically meaningful. In addition, the largest (>20 members) trees had dN/dS values < 1, indicating negative selection pressure (Fig.S6 D). Taken together, these data suggest that, in the absence of a defined time-controlled antigenic stimulus, such as vaccination, in steady-state equilibrium, only a few immunoglobulin clonal groups show definitive signs of positive selection.

Productive involvement in hypermutating lineages depends on CDR3 characteristics

It was shown by Grimsholm et al. in 2020 that memory B cells have, on average, higher Kidera factor 4 values and lower predicted interaction strength than naïve B-cells40. We assumed that these features may evolve during the affinity maturation process. Therefore, we checked how the mean kf4, strength, and charge of the five central CDR3 amino acids depend on the clonotype’s distance to the MRCA rank on the phylogenetic tree. To ensure no short-sequence bias in phylogeny or uncertainty of a small sample size, we used full-length sequence trees inferred from lung cancer patient lcp3 and a full-length repertoire of very high sequencing depth from a healthy donor. However, as mentioned before, we did not find any correlation between strength, charge, and kf4 with the position of a clonotype on a tree (Fig. 9 C). One exception was a near-zero increase in charge down the tree found in the healthy donor repertoire (r=0.04, p=4.9×10−5).

Taking into consideration the previously found differences between memory and naïve cell features by Grimsholm et al.40, we hypothesized that amino acid features of CDR3 may be selected even before the start of hypermutation and affinity maturation. To test this hypothesis, we checked how the mean kf4, strength, and charge of five central CDR3 amino acids depend on the clonal status of the clonotype (Fig. 10) and found the results to be tissue-specific. In particular, clonotypes not assigned to any clonal lineage (singles) had lower kf4 values, and hence higher hydrophobicity, than members of some clonal groups in the tumor and PBMC repertoires. The predicted interaction strength was higher for singles in PBMC and lymph nodes, but not in the tumor, potentially indicating that tumor-resident clonotypes have undergone selection for CDR3 properties, irrespective of their involvement in SHM. Finally, clonal lineages had a higher charge in tumor repertoires, but not in the lymph nodes or PBMC, indicating a tendency towards polyreactivity in clonotypes evolving locally under the influence of TME.

These results generally match the findings of Grimsholm et al. and support the hypothesis that selection of CDR3 amino acid features occurs before cells undergo SHM. Tissue differences could potentially be explained by differences in the proportions of B cell subsets in different tissues.


We performed a multi-location study of BCR repertoires in cancer patients using an unbiased BCR library preparation technique complemented by deep sequencing. We analyzed these repertoires from two complementary perspectives. At the level of individual BCR clonotypes, we studied various repertoire features and their differences between tissues of origin, as well as repertoire heterogeneity within tumor or lymph node tissue. At the level of B-cell clonal lineages, we tracked lineage sharing between tissues, and differences in clonotype features between clonotypes that belong to some clonal lineage, and those that do not.

First, we compared BCR repertoires from PBMC and bulk tumor infiltrates and the corresponding sorted CD19+CD20CD38high plasma cells, and confirmed that the bulk RNA-based BCR repertoire is dominated by plasma cells across different peripheral compartments. This is expected, given that the IG expression level is 50-500 times higher in plasma cells than in memory B-cells21. However, PC has a higher IgM isotype proportion in the tumor and a lower proportion of IgG in the lymph node. The higher proportion of IgM in PCs relative to the bulk repertoire may indicate that some IgM-producing plasma cells, while having an appropriate plasmablast/PC surface marker phenotype, produce relatively low levels of antibodies, which leads to them being masked in the bulk repertoire by non-IgM non-PC B cells. A higher level of antibody production correlates with the differentiation of plasmablasts/short-lived plasma cells into long-lived plasma cells. Therefore, one could hypothesize that tumor-infiltrating IgM-expressing antibody-producing cells are skewed towards a less differentiated B-cell compartment compared to other isotypes. Similar logic applies to the IgG-producing plasma cells in the lymph nodes: a lower proportion of IgG-clonotypes in PCs compared to the bulk repertoire indicates that some of the most frequent IgG clonotypes in the bulk repertoire belong to cells with a non-PC/plasmablast phenotype, presumably stimulated memory B-cells.

BCR repertoire overlaps and the correlation of isotype proportions of tumor and lymph node samples were higher than those of tumor and peripheral blood. A higher repertoire overlap is indicative of a higher proportion of tumor-related BCR clonotypes in the DLN than in the peripheral blood, which illustrates the crucial role of draining lymph nodes in the development of the anti-tumor immune response. In addition, in accordance with previous studies, our data confirmed that the tumor BCR repertoire is more clonal than that in the lymph node or PBMC. We also showed that lymph nodes within the same draining LN pool may be differentially involved in tumor interaction. Overall, lymph nodes are a better source of material to study anti-tumor B-cell response than PBMC, and a better source of material for a potential cancer B-cell or antibody therapy, if tumor tissue is unavailable.

The mean CDR-H3 length of the most frequent BCR clonotypes in tumors was higher than that in other tissues. This finding may indicate that tumors tend to be infiltrated with less mature B-cells than other compartments 40, and may also indicate a higher proportion of autoreactive and polyreactive 64,73 antibodies, which is consistent with previous research. However, in colorectal cancer, the mean CDR-H3 length in the IgA repertoire was lower than that in melanoma. This may indicate that the IgA repertoire in colorectal cancer is less subject to TME-induced biases than in other cancers studied in this study. Differences in other CDR-H3 properties, such as mean kf3 (b-sheet preference), kf4 (inversely correlated with hydrophobicity), predicted interaction strength, and charge between different tissues, including increased central-CDR3 charge in the tumor compared to normal tissue, may suggest a less mature and less specific BCR repertoire of tumor-infiltrating B-cells.

Clonal groups representing hypermutating B-cell lineages are distinct in their isotypes and tissue composition. In colorectal cancer, clonal groups dominated by IgA (> 60% of IgA clonotypes) tend to be shared between tumors and blood, whereas IgG-dominated (> 60% of IgG clonotypes) clonal groups tend to be shared between tumors and DLN. The probability of detection of a given clone in a given location is dependent on the mRNA quantity that is produced by a particular B-cell or group of B cells (BCR expression level), as well as on the lifetime of these cells in this location (lifetime being a superposition of cells entering and leaving the tissue, and the literal lifespan of the cells and their progeny). Assuming that BCR is expressed at the same level in all members of a certain clonal population, whether it is found in the blood, LN, or tumor, the probability of detecting a clone is then only a function of the “combined lifetime” of a clone within a given tissue. Therefore, the proportion of clones within a clonal group derived from a certain tissue is a snapshot of their preference for tissue residence or circulation. By comparing IgA- and IgG-dominated clonal groups, we observed subtle differences in the behavior of B cells during ongoing clonal selection and interaction with the tumor. Whereas IgG clonal groups preferentially reside in tumors and draining lymph nodes, IgA-dominated clonal groups show a greater preference for circulation. An obvious question is whether this corresponds to the proportion of isotypes at the cellular level(by surface or intracellular staining). The relative frequencies of antibody-secreting B cells in peripheral blood were IgA1 > IgG1 > IgA2 > IgM > IgG4 > IgG2 > IgG3 74, which corresponds to our observations of the peripheral blood bulk repertoire (total IgA>total IgG>total IgM). Therefore, the observed clonal group sharing between tissues may simply be due to the higher number of clonotypes, with a certain isotype detected in a certain tissue. However, for tissue distribution triangle plots, the data were normalized for isotype usage between tissues (i.e., equal numbers of IgG/IgA/IgM most frequent clonotypes were used for PBMC, LN, and blood) to correct for quantitative bias. Therefore, the trend for tissue preference reflects true clonal behavior and not simply quantitative prevalence.

It is well known that only draining lymph nodes show signs of interaction with tumor antigens 75. In murine models, non-draining usually occurs contralateral to the tumor site, which is anatomically distant. However, whether lymph nodes within the same anatomical site may differ in terms of their interaction with the tumor has yet to be determined. For some of the patients in this study, we were able to obtain more than one lymph node from the same tumor-draining pool, which allowed us to address this question directly on the level of BCR repertoires. A higher overlap between the tumor and DLN repertoire signifies a more intensive interaction with the tumor and its antigens. In at least one patient, the lymph nodes studied were significantly different in this regard, with one of the lymph nodes being significantly more similar to the tumor in its BCR repertoire. This observation was also reproduced at the clonal group level, where the lymph node that was in closer interaction with the tumor in terms of clonal overlap also contained more clonal groups that included tumor-derived clonotypes and were confined to this lymph node only. These clonal groups represent antibody maturation and selection processes involving tumors, and occur preferentially in the DLN, which is in tight interaction with the tumor, but not in the others. Whether this also reflects the response to tumor antigens and the maturation of tumor-specific antibodies remains to be tested.

It has been previously shown that within a single lymph node, individual germinal centers (GCs) share their immunoglobulin clonal composition to some extent, and the offspring of one B-cell clone may be found in several individual germinal centers 76. During GC reactions, individual GCs become oligoclonal (containing 4-13 major B cell clones with functional IgVH, as detected in the pre-NGS era) due to an affinity selection process 7779. However, in the case of chronic antigen stimulation, as in cancer, repeated cycles of antigen exposure, memory B-cell reactivation, and germinal center reactions should lead to uniform involvement of the whole DLN in interaction with tumors and production of tumor-specific immune responses. Again, we sought to study the spatial heterogeneity of DLNs at the level of individual immunoglobulin clonotypes and clonal lineages. We found that the BCR repertoires derived from fragments of a single lymph node were significantly different in their similarity to the pooled tumor BCR repertoire (F2 metric). We conclude that despite chronic antigen exposure, despite the ability of memory B cells to leave and re-enter germinal center reactions and form new germinal centers in subsequent rounds of affinity maturation, the BCR landscape of a lymph node remains significantly heterogeneous in space.

Likewise, for tumor immune infiltrates, it was shown that the TCR repertoire is heterogeneous in space, and this heterogeneity correlates with subclonal mutations within the tumor tissue 69 and likely represents expanded mutation-specific T cell clones. BCRs are expected to be even more clonal, both in lymph nodes and tumors. However, for BCRs, heterogeneity analysis is complicated by the orders-of-magnitude difference in immunoglobulin expression between plasma cells and other B-cells21. A rare plasma cell will have a low probability of sampling at the cellular level but will produce a sufficient amount of immunoglobulin RNA to be reliably detected in the BCR repertoire. Therefore, appropriate use of biological (cellular) replicates sequenced at comparable depths is crucial for accurate detection of differentially expanded clonotypes23. Following the logic described above, we were able to analyze repertoire overlap between tumor fragments and detect BCR clonotypes that were differentially expanded in individual fragments. We observed that, in some patients, the most abundant BCR clonotypes were different in different fragments. In this case, the most expanded clonotypes rarely overlapped between fragments, thus probably representing clonotypes that recognize subclonal tumor antigens. This needs to be tested directly and, if proven true, may potentially be a reliable method for the identification of tumor-specific antibodies. In other cases, the tumor may be relatively homogeneous, without significant expansion, and with highly overlapping repertoires of individual fragments. These observations are mostly reproduced at the level of clonal groups, where in heterogeneous tumors, we observed clonal groups that concentrated almost entirely in single tumor fragments, whereas in homogenous tumors, clonal groups mostly include clonotypes derived from all fragments of the tumor.

The physicochemical properties of the CDR-H3 region differ between clonotypes that represent clonal lineages, and thus, are actively hypermutating, and those that do not. The most prominent difference was observed for kf4 (inversely related to hydrophobicity) in PBMC and tumors; clonotypes that undergo hypermutation are, on average, less hydrophobic, which indicates higher binding specificity. However, there was no correlation between kf4 and the extent of hypermutation, even in large full-length Ig-profiling datasets. One possible explanation may be that the selection of less hydrophobic CDR3s occurred before the onset of hypermutation.

Interestingly, analysis of silent vs. non-silent hypermutations (dNdS) suggests that in the absence of a defined time-controlled antigenic stimulus, such as vaccination, in steady-state equilibrium, only a few immunoglobulin clonal groups show definitive signs of positive selection. It would be interesting to systematically explore the specificity of these clones in patients with cancer as a potential source of tumor-reactive antibodies.

In general, this study has several limitations, such as the small number of patients, both per cancer type and overall, and our 5’RACE library preparation protocol, which was used for most samples in this study, does not allow for an accurate distinction between the IgA and IgG isotype subtypes. Regarding clonal lineage structure analysis, we found that our study design has a major limitation of a relatively short (150 bp) read length, which does not provide sufficient information for hypermutation analysis.

Nevertheless, our observations contribute to the understanding of the interactions among the tumor immune microenvironment, tumor-draining lymph nodes, and peripheral blood.

Materials and methods


All clinical samples were acquired from the N.N. Blokhin National Medical Research Center of Oncology or Volga District Medical Centre under the Federal Medical and Biological Agency. This study was conducted in accordance with ICH-GCP. The protocol was approved by the Ethical Committees of the Volga District Medical Centre under the Federal Medical and Biological Agency, and by the N.N. Blokhin National Medical Research Center of Oncology. Written informed consent was obtained from all patients.

Fourteen patients were included in the study, of which four were diagnosed with colon cancer, four with lung adenocarcinoma, and six with melanoma. For each patient, samples from the tumor (tum), draining lymph node(s) (DLN), peripheral blood (PBMC), and normal tissue (norm) were collected, or only some of these tissues (Table S1). Several distant pieces were resected from each tumor and lymph node, unless there were several lymph nodes available (as for colon cancer patients, ccp2, ccp3). All samples were bulk (unsorted), but for two colon cancer patients (ccp2 and ccp3), plasma cells were sorted from a part of each sample. All the samples were processed and stored for library preparation.

Blood sampling was performed immediately prior to surgery, and the total volume of obtained blood did not exceed 35 ml for each patient. Fragments of tumor, LN, or normal tissue were placed in 50 ml Falcon tubes containing 7-10 ml of MACS® Tissue Storage Solution (Miltenyi Biotec, cat. 130-100-008) and stored at room temperature for no longer than 2 hours. All the procedures were performed under sterile conditions. PBMC were isolated from the blood samples using the Ficoll-Hypaque™ centrifugation protocol. Briefly, whole blood was diluted 2.5 times with sterile PBS buffer, carefully layered over Ficoll-Paque PLUS (GE Healthcare, cat. 17-1440-03) and centrifuged at 600g for 20 min. Afterwards, buffy coats were harvested from the interface, washed twice with PBS buffer (20 min 350 g), calculated, and lysed in RLT buffer (Qiagen, cat. 79216) at 0.5 to 3 *10^6 cells/sample density.

Tum, LN, and normal tissues were mechanically cut into 3-7 mm fragments and incubated in RPMI 1640 medium (Gibco™, cat. 42401042) containing 1 mg/ml Liberase TL Research Grade (Roche, cat. 5401020001) and 30 U/ml DNase I (Qiagen, cat. 15200-50), at 37°C for 30–60 min with gentle shaking. Samples were then processed using a gentleMACS™ Dissociator (Miltenyi Biotec, cat. 130-093-235) and passed through a 100 um Nylon cell strainer to remove non-dissociated fragments. The resulting cell suspension was concentrated by centrifugation (7 min, 350 g) and lysed in RLT buffer(Qiagen, cat. 79216) 0.1-0.5 *10^6 lymphocytes /sample density. All RLT samples (PBMC, tum, LN, and norm lysates) were stored at −80°C before preparation of the BCR IGH repertoire libraries.

PC isolation

For colon cancer patients, three distant samples (at least 3 cm apart) were obtained from the tumor border with a portion of normal tissue. Cells from three digested tumor samples, three lymph nodes (for patients ccp2 and ccp3) and two replicates of PBMC were stained with a panel of fluorescent antibodies: CD45-PerCP/Cy5.5 (BD 564105, clone HI30), CD38-PE (BD 555460, clone HIT2), CD19-Alexa700 (BD 557921, clone HIB19), CD20-BV510 (BD 563067, clone 2H7), CD25-V450 (BD 560458, clone M-T271) and isolated BD FACSAriaIII cell sorter (BD Bioscience). First, lymphocytes were gated as CD45+ cells, and the plasma cell population was isolated as CD20_neg/CD19_low/CD27_high/CD38_high. Sorting was performed using a FACSAria III cell sorter (BD Bioscience) directly into RLT lysis buffer (Qiagen, cat 79216). Cells from the ccp6 lymph nodes were lysed and unsorted.

BCR library preparation and sequencing

All BCR IGH libraries were generated using the 5RACE methodology described by Turchaninova et al.21 and sequenced with a 150+150 bp read length. For one of the lung cancer patients, lcp3, additional tumor, lymph node, and normal lung tissue libraries were generated using IgMultiplex methodology and sequenced with 250+250 read length. To account for sampling bias, we also obtained technical replicate samples at the cell suspension level.

Preprocessing of sequencing data

To process sequencing reads, we used the MiNNN software to extract UMI from raw sequences, correct errors, and assemble consensus sequences for each UMI. For bulk libraries prepared with the 5’RACE protocol and sequenced with 150+150 read length, we filtered out UMI with less than three reads. For bulk libraries prepared with the Multiplex protocol and sequenced with 250+250 read length, we filtered out UMI with less than four reads. For libraries of sorted plasma cells prepared with the 5’RACE protocol and sequenced with 150+150 read length, we filtered out UMI for which there were fewer than two reads.

To ensure the absence of contamination, we checked for the presence of identical UMI in different samples. If such a UMI was identified, and BCR sequences for such UMIs were identical or almost identical (suspecting amplification error), then the number of reads for this UMI in both samples were compared. If the number of reads of this particular UMI in one sample exceeded the number of reads of this UMI in another sample by more than five times, this UMI was eliminated from the sample with a smaller number of reads per UMI. If the ratio was lower than 5, the UMI was eliminated from both samples.

After UMI-based decontamination, we used MIXCR software to assemble reads into quantitated clonotypes, determine germline V, D, and J genes, isotypes, and find the boundaries of target regions, such as CDR3. Clonotypes derived from only one UMI were excluded from the analysis of individual clonotype features but were used to analyze clonal lineages and hypermutation phylogeny. Single-UMI sequences were eliminated to avoid errors during cDNA synthesis during library preparation. However, modern reverse transcriptases have high fidelity, with 2.5e-05 error rates for some of them 80. Therefore, we considered it reasonable to include single UMI sequences in those parts of the analysis, where a larger sample size was important.

Samples with 50 or less clonotypes left after preprocessing were excluded from the analysis.

Clonal lineage inference

We identified sequences belonging to the same clonal lineage using the ChangeO software. The criteria for the initial grouping were the same V and J germline genes identified by MIXCR, and the same CDR3 length. These criteria do not account for the D segment, as there is insufficient confidence in the germline annotations due to its short length and high level of mutations. Sequences within each group were defined as belonging to the same clonal lineage if they had a CDR3 sequence identity above a certain threshold. Such a threshold was individually defined for each patient’s dataset as a local minimum of the distance-to-nearest distribution function31. In most cases, this threshold is set between 80% and 85%.

Phylogenetic inference

The phylogeny of B-cell hypermutations was inferred for each clonal lineage of size five or more using the maximum likelihood method and the GTR GAMMA nucleotide substitution model. To find the most recent common ancestor (MRCA) or root of the tree, we used an outgroup constructed as a conjugate of germline segments V and J defined by MIXCR. The D segment was excluded from the outgroup formation, as there was insufficient confidence in the germline annotations due to its short length and high level of mutations. The MUSCLE tool was used for multiple sequence alignment and RAxML software was used to build and root phylogenetic trees.

Sample pooling and normalization

For some parts of the analysis, we considered samples irrespective of technical replicates or specific tumor/LN segments. In such cases, the corresponding repertoire datasets were pooled in one, where for each clonotype, its new frequency was calculated as the mean of the clonotype’s frequencies from separate files. Such an approach helps avoid sampling bias and achieve equal contributions of all clonotype sets being pooled.

Statistical analysis

In our analysis, we often used repertoires of different parts of the same tissue as separate observations within the same comparison. Examples of these could be isotypes, different pieces of tumors or lymph nodes, or PBMC samples taken at different time points. Understanding the pitfalls of this approach in general, we argue that it can be justified in some cases considering the heterogeneity of tissues, especially tumors, and the distinctive characteristics of different isotypes.

Most of the analysis was performed using VDJtools and custom Python and R scripts.

Expanded clonotypes detection

We used the EdgeR package in R to identify the clonotypes that were differentially expressed between the two sample sets. The problem with this approach is determining the correct number of counts required to pass into the DGEList function of EdgeR. Using a number of unique UMIs detected for each clonotype in the sample might not be a good idea, considering the possibility of sampling bias (e.g., resecting tumors into pieces of slightly different sizes). To account for sampling bias, we defined clonotype count for the DGEList function as clonotype frequency in a normalized sample multiplied by the total number of unique UMIs in all groups of samples.

The output of the DGEList function is then normalized and passed to the exactTest function of the EdgeR. Clonotypes with FDR<0.05 and logFC>0 were considered expanded in a corresponding group of samples.


Sections of formalin-fixed paraffin-embedded tissue were sliced on an RM2235 microtome (Leica Biosystems). Slices were deparaffinized with Xylol and Ethanol. H&E staining was performed using Mayer’s hematoxylin and eosin (Biovitrum, Russia). For IHC staining, the slices were demasked in AR9 buffer (PerkinElmer) for 10 min at 98 0C in a water bath, washed twice in PBS, and blocked for 30 min with Protein Block (Leica Novocastra, UK). Primary antibodies were added without washing the blocking solution and incubated overnight at 4 0C. After incubation with primary antibodies, slices were washed twice in PBS and stained with the NovoLink polymer detection system (Leica Biosystems, UK) according to the manufacturer’s instructions or with HRP-streptavidin (Jackson Immunoresearch 016-620-084, 1:500 for 30 min) and contrasted with 10 µM CF tyramide dye (Biotium, USA). The following primary antibodies were used at the consequence and with indicated dilutions and CF dyes: CD20 (Leica Biosystems PA0200), 1:1 with CF488; CXCL13 (GeneTex GTX108471, Taiwan), 1:100 with CF660; CD3 (HuaBio HA720082, China), 1:200 with CF430; PNAd-biotin (BioLegend 120804, USA), 1:200 with CF555. Before staining with non-biotinylated antibodies, the antibodies were stripped in AR9 buffer for 10 min at 98 0C. After staining with antibodies, the slides were counterstained with DAPI (0,2 nM) for 5 min, embedded in Mowiol 4-88 (Sigma-Aldrich), and coverslipped.

Brightfield and fluorescence images were acquired using an EVOS M7000 microscope (Thermo Fisher Scientific) with a 20x dry objective. The following light cubes were used: DAPI (AMEP4650) for DAPI fluorescence, CFP (AMEP4653) for CF430 fluorescence, YFP (AMEP4654) for CF488 fluorescence, RFP (AMEP4652) for CF555 fluorescence, and Cy5 (AMEP4656) for CF660. Whole slice images were stitched using microscope imaging software. Fluorescence images were contrasted, colorized into pseudocolors, and overlaid using Fiji software (