Introduction

B cells are a relatively undercharacterized component of the tumor microenvironment. In human tumors, infiltration with B-cells, as cells and T-cells is often a positive prognostic factor, especially in conjunction with the formation of tertiary lymphoid structures (TLSss). Mechanistically, tumor-infiltrating B-cells (TI-Bs) may act via presentation of BCR-cognate antigens to T-cells 13, production of pro- or anti-tumor cytokines 47, and production of antibodies, which may be tumor-specific8 and enhance killing of tumor cells via antibody-dependent cytotoxicity (ADCC)9 and complement-induced cytotoxicity, enhance antigen-presentation by dendritic cells10 or form immune complexes that promote the activation of myeloid-derived suppressor cells (MDSC)11. The effector functions of antibodies are highly dependent on antibody isotype12 and specificity. For instance, IgG1 is the most efficient factor in ADCC, and IgG3 (together with IgG1) is capable of complement-dependent cytotoxicity 13. These functions depend on the ability of an antibody to bind its cognate antigen, which increases during B-cell maturation and somatic hypermutation14. Isotype switching, SHM, and B-cell proliferation and differentiation are influenced by the tumor microenvironment, neoantigen burden, and presence of certain driver mutations15. Therefore, the properties of immunoglobulins produced within the tumor are both a reflection of TME features and a factor that shapes the TME.

The whole population of immunoglobulin receptors and immunoglobulins produced within a certain tissue or cell population is known as a BCR repertoire. BCR repertoire analysis has become an important approach for characterizing adaptive immune responses in health and disease1618. Starting from genomic DNA or RNA, libraries representing the sequences of variable domains of antibodies can be produced and sequenced using high throughput sequencing1923. Alternatively, immune receptor sequences can be derived from RNA-Seq data2427. Then, the properties of the variable parts of immune receptor sequences are studied using bioinformatics tools that are currently quite elaborately developed 2834.

For BCR repertoires, features that are considered functionally important are clonality, isotype composition, biases in V- and J-gene usage, and extent of SHM3538. In addition, characteristics of the immunoglobulin CDR-H3 region, such as the number of added nucleotides, length, and amino acid physicochemical properties, have functional implications and have been linked to antibody specificity39 and to the B-cell maturation stage40.

Correspondingly, the characteristics of the immune repertoire have been found to be associated with clinical outcomes (15,25,4143). Specifically, high intratumoral immunoglobulin heavy chain (IGH) expression, high IGH clonality, and a high proportion of IgG1 among all IGH transcripts were strongly correlated with higher overall survival in melanoma25. Similarly, high IgG1 mRNA levels are positively associated with improved prognosis in early breast cancer44. A high intratumoral IgG1 to IgA ratio was associated with improved overall survival in bladder cancer45 and for KRASmut but not KRASwt lung adenocarcinoma (LUAD), suggesting the first link between driver mutation and B-cell response15. In lung adenocarcinoma, breast cancer, and bladder cancer, only a subset of V-segments is associated with improved survival46. The co-occurrence of specific antibody motifs and mutated tumor-derived peptides, presumably indicating specificity to particular tumor neoantigens, was also correlated with longer survival in colorectal cancer43.

The tumor immune repertoire may be used as a source of tumor-antigen-specific antibodies for therapy development 4749. For instance, in melanoma 30%-80% of the total BCR repertoire 25 may constitute one or several dominant B cell clones. Corresponding antibodies can be produced, verified for tumor reactivity 49, and further employed in chimeric antigen receptor T cell (CAR-T) therapy or other therapeutic approaches. The knowledge of antigen specificity of cancer-associated antibodies is currently insufficient, and it is mostly limited to autoantigens and some tumor-associated antigens 5055.

However, even in the absence of knowledge of cognate antigens, certain hypotheses may be derived from BCR repertoire characteristics. Some individual IGHV-genes genes and subgroups have been associated with autoimmunity 56. Likewise, an increase in the BCR clonal expansion index 57, dominated by IgA and IgM isotypes, was associated with systemic lupus erythematosus (SLE) and Crohn’s disease 38.

The ability to accurately characterize the B cell repertoire in the tumor microenvironment is of vital importance for both fundamental and clinical challenges. However, tumor tissue may not always be available for analysis; therefore, it would be beneficial to derive information about the tumor BCR repertoire from peripheral blood or draining LNs. It is not known, however, whether tumor-dominant BCR clonotypes can be found in the peripheral blood of cancer patients and at what frequencies. The ability to detect these clonotypes in a patient’s peripheral blood could have a predictive value for disease prognosis. In addition, it is not known how the BCR repertoire characteristics of peripheral blood B-cells and tumor-draining LNs relate to the repertoire of TI-Bs.

Therefore, in the current study, we aimed to comprehensively and systematically evaluate tumor, lymph node, and peripheral blood BCR repertoires and their interrelationships. We show that the tumor BCR repertoire is closely related to that of draining LNs, both in clonal composition and isotype proportion. Furthermore, we observed that different LNs from one draining lymph node pool may be differentially involved in the interaction with the tumor, as reflected by the similarity of the BCR repertoire clonal composition. CDR-H3 properties indicate a less mature and less specific BCR repertoire of tumor-infiltrating B cells compared to circulating B cells. BCR clonotypes that expand in a tumor relative to other tissues are, on average, less hypermutated than non-expanded (ubiquitous) clonotypes.

Results

Experimental and Computational Study Design

To systematically study the relationships between BCR repertoires in tumors and normal peripheral compartments, we performed RNA-based targeted BCR repertoire analysis from four tissue types: tumor (tum), corresponding normal tissue (norm), tumor-draining LNs, and peripheral blood mononuclear cells (PBMC), of 14 cancer patients (melanoma, n = 6; lung cancer, n = 4; and colorectal cancer, n = 4). To account for spatial heterogeneity, we obtained 3 fragments of tumor tissue per patient. For draining LNs, we either dissected them into three separate pieces to study intra-LN spatial heterogeneity (lung cancer and melanoma, parts of LNs designated as LN11, LN12, LN13), or, where available, obtained three separate draining LNs (designated as LN1, LN2, LN3) (Fig. 1A). All fragments were homogenized separately into single-cell suspensions. To account for sampling noise at the level of individual cells, we obtained two replicate samples from each fragment after homogenization. The relative similarity of clonotype frequencies in replicate samples and separate tumor fragments is shown in Fig. 1B. BCR repertoires were obtained using the 5’-RACE (Rapid Amplification of cDNA Ends) protocol as previously described21.

Experimental design.

A - Overview of the experimental design and procedures. B - Using technical replicates at the cell suspension level to account for sampling bias in the clonotype frequencies.

In this study, we used bulk RNA-based BCR profiling, with the understanding that the dominant clonotypes and B cell lineages reproducibly represented in biological replicates reflect the presence of clonally expanded plasma cells, plasmablasts, and the most expanded memory B cell clones. In other words, the RNA-based approach mainly revealed the repertoire of functionally active B-cell lineages (Supplementary Note 1).

In most experimental setups, we employed short 150+150 nt sequencing covering the IGH CDR-H3 regions, using 5’RACE method. This approach allowed us to achieve high coverage for the obtained libraries (Table S1) to reveal information on clonal composition, CDR-H3 properties, IgM/IgG/IgA isotypes, somatic hypermutation load within CDR-H3, and clonal B cell lineages.

Tumor/non-tumor repertoire overlap and isotype composition

First, we characterized the relative similarity of IGH repertoires derived from tumors, tumor-draining LNs, and PBMC on the individual CDR-H3 clonotype level. We define clonotype as an instance with an identical CDR-H3 nucleotide sequence and identical V- and J-segment attribution (isotype attribution may be different). Unlike other authors, here we do not pool together similar CDR-H3 sequences to account for hypermutation. (Hypermutation analysis is done separately and defined as clonal group analysis. )

As overlap metrics are dependent on overall repertoire richness, we normalized the comparison using the same number of top most frequent clonotypes of each isotype from each sample (N = 109). Repertoire data for each sample were split according to the immunoglobulin isotype, and the F2 metric was calculated for each isotype separately and plotted as an individual point. We used the repertoire overlap metric F2 (Сlonotype-wise sum of geometric mean frequencies of overlapping clonotypes), which accounts for both the number and frequency of overlapping clonotypes (Fig. 2A). As expected, significantly lower overlaps were observed between the IGH repertoires of peripheral blood and tumors compared to LN/tumor overlaps. The LN/PBMC overlap also tended to be lower, but the difference was not statistically significant. We also analyzed D metric, which represents the relative overlap diversity uninfluenced by clonotype frequency (Dij=dij/(di*dj), where dij is the number of clonotypes present in both samples, while di and dj are the diversities of samples i and j respectively). The results for D metric are not shown, as they indicate a similar trend to that of F2 metric. This observation allows us to conclude that tumor IGH repertoires are more similar to the repertoires of tumor-draining LNs than to those of peripheral blood, both if clonotype frequency is taken into account, and when it is not.

Repertoire overlap, clonality and isotype composition (clonotype level, pooled patient data (except B, data from melanoma patient 3)).

A - repertoire overlap between pairs of tissues, by F2 metric, repertoires split by immunoglobulin isotype, (N=6); B - network representation of Ig repertoires from PBMC, tumor-draining LN, and tumor of mp3 patient (melanoma); individual clonotypes of the same origin (PBMC, tumors, or draining LN) are shown as bubbles connected with the edges to one anchor node. Clonotypes shared between tissues were connected with two or three edges to the corresponding anchor nodes and were located between them. The size of the bubbles represents the relative frequency of clonotypes within a sample, and the color represents the isotype. The relative distance between anchor nodes corresponded to the similarity of repertoires (the number of shared clonotypes). C, D - Clonality of Ig repertoires in PBMC, draining LNs, and tumors of 14 cancer patients. This reflects the presence of clonal expansion. Calculated as in 58: [1-normalized Shannon-Wiener index]; C–total IG repertoire; D–IgM repertoire; E–LN/tumor overlap for IgA and IgG repertoires (N=7); F– PBMC/tumor overlap for IgA and IgG repertoires (N=9); G, H–isotype fraction correlation between PBMC and tumor repertoires (G, N=9), or between LN and tumor repertoires (H, N=7).

These results are corroborated by visualization at the individual patient level, using Cytoscape network visualization platform to visualize the structure of the repertoire overlap. As exemplified by melanoma patient mp3 (Fig. 2B), the repertoire from the tumor is closely related to the tumor-draining LN repertoire, whereas the PBMC repertoire has very few overlapping clonotypes with both tumors and draining LNs. Overall, these analyses revealed that the extent of clonal exchange between tumors and PBMC was significantly lower than that between tumors and draining LNs. The frequencies of overlapping clonotypes were also more strongly correlated between tumors and draining LNs than between tumors and peripheral blood (Fig. S2A, R metric). The level of clonal exchange between tissues was dependent on isotype (Fig. 2 E, F). LN/tumor overlap was higher in the IgG repertoire (Fig. 2E), whereas PBMC/tumor overlap was lower in the IgG repertoire (Fig. 2F) than in the IgA repertoire. This suggests that tumor-infiltrating IgG-expressing B-cells (IgG-TI-Bs) avoid systemic circulation, whereas IgA-expressing tumor-infiltrating B-cells (IgA-TI-Bs) may be found in the peripheral blood with a higher probability. In addition, the tumor repertoire was significantly more clonal (clonality calculated as [1-normalized Shannon-Wiener index]58) than the PBMC or draining LN repertoire, both overall (Fig. 2C) and separately for the IGHM repertoire (Fig. 2D).

We did not find a statistically significant difference in isotype composition between cancers in any of the studied tissues, with the exception of IgM percentage in melanoma tumors. In BCR repertoires from melanoma tumors, the total percentage of repertoire consisting of IgM clonotypes was significantly lower than that in other types of cancers (colorectal and lung) (Fig. S2B). The isotype composition of non-tumor tissues correlated with the isotype composition of the tumor; this effect was less prominent in peripheral blood (R = 0.42, p = 0.028)(Fig. 2G) and more prominent for LN (R = 0.74, p < 0.01)(Fig. 2H).

CDR-H3 properties

Analysis of averaged CDR-H3 repertoire characteristics revealed increased CDR-H3 length in tumors compared to PBMC for the total repertoire (Fig. 3A) and also in IgA and IgG repertoires separately (Fig. S3A, B), but this was not the case for IgM (Fig. S3C). In addition, the increase in CDR-H3 length in IgA repertoires from tumor-draining LNs compared to PBMC was statistically significant (Fig. S3A).

CDR-H3 amino acid properties (clonotype level, pooled patient data)

A - Mean amino CDR-H3 length of top 100 most frequent clonotypes from tumor, lymph node and PBMC tissues irrespective of isotype CDR-H3 are on average significantly longer in tumor than PBMC for total repertoire (N=14, p<0.01, two-sided t-test, Bonferroni correction); B - Comparison of mean amino acid CDR-H3 length of 100 most frequent clonotypes for colon, lung and melanoma cancer samples from, tumor. CDR-H3s of tumor-infiltrating clonotypes were shorter for colorectal cancer patients compared to melanoma in IgA repertoires (N=11, p=0.02); C - Comparison of amino acid properties in the central region of CDR-H3, for total repertoire (C-left) or IGHG repertoire (C-right), all cancers (significantly increased - red, significantly decreased - blue) two-sided t-test, Bonferroni-Holm correction; D - Comparison of amino acid properties in the central region of CDR-H3, for IGHM repertoire, lung cancer (significantly increased - red, significantly decreased - blue); (p<0.01, two-sided t-test, Bonferroni-Holm correction); E - Average number of mutations relative to germline for tumor samples from different types of cancers, N=14.

Interestingly, the only significant difference we found when comparing CDR-H3 lengths between cancer types was reduced IgA CDR-H3 length in colorectal cancer, especially compared to melanoma (p=0.02) (Fig. 3B). This could reflect a generally more mature IgA repertoire in colon tissues owing to the previous history of interactions with microbiota59,60. However, the relationship between this difference and tumor antigen specificity remains to be verified.

To explore CDR-H3 physicochemical properties, we calculated the mean charge, hydropathy, predicted interaction strength, and Kidera factors 1 - 9 (kf1-kf9) for five central amino acids of the CDR-H3 region for the 100 most frequent clonotypes of each sample using VDJtools. Kidera factors are a set of scores which quantify physicochemical properties of protein sequences 61. 188 physical properties of the 20 amino acids are encoded using dimension reduction techniques, to yield 9 factors which are used to quantitatively characterize physicochemical properties of amino acid sequences. Comparing between tissues, we found that kf4 value for the tumor repertoire was decreased compared to PBMC in the total repertoire, and kf5 was decreased in tumor vs. PBMC in the IgG repertoire (Fig. 3C). In addition, kf6 value was decreased in the LN repertoire compared to PBMC. Kf4 inversely correlates with hydrophobicity, indicating a higher proportion of hydrophobic residues in BCR CDR-H3s from the tumor repertoire. Kf5 reflects a double-bend preference and has not been previously found to be significant in the context of antibody properties. Kf6 is a measure of partial specific volume; therefore, a lower kf6 value indicates less bulky amino acid residues in CDR-H3s from the LN repertoire. Between the tumor and normal tissue repertoires, the hydropathy value was lower in normal lung tissue, also indicating a higher proportion of hydrophobic residues in tumor-derived CDR-H3 repertoires (Fig. 3D). According to 40, more mature B-cell subpopulations have higher mjenergy, disorder, kf4, kf6, and kf7 and lower CDR-H3 length, strength, volume, kf2, and kf3. Again, the mean CDR-H3 charge was negatively associated with specificity 62 and beta-sheet propensity was associated with antibody promiscuity 63. Poly-reactive and self-reactive antibodies have, on average, longer CDR-H3s 64 with a higher charge62 and net hydrophobicity6567.

Therefore, collectively, our observations suggest a less mature and less specific BCR repertoire of tumor-infiltrating B cells compared to circulating B cells and B cells infiltrating normal tissue, which may indicate less stringent control for antibody-producing B cell development in the TME.

Immunoglobulin hypermutation analysis across tissues and isotypes

The intensity of somatic hypermutation (average number of mutations relative to the most recent common ancestor, MRCA) reflects the average extent of antigenic selection experience of the clonotypes found in a given tissue or cancer type. No significant difference was found between PBMC, tumor-draining LNs, and tumors for the top 100 most frequent clonotypes in the total repertoire as well as for the top 100 most frequent clonotypes of each isotype separately (not shown). This indicates that in the RNA-based BCR repertoires, in all studied tissues, the most dominant immunoglobulin clonotypes belong to cell populations with an equivalent history of antigen exposure, selection, and maturation. However, there was a statistically significant difference in the number of hypermutations in IgG between the cancers (Fig.3E). IGHG clonotypes from lung cancer samples show higher number of hypermutations, possibly reflecting high mutational load found in lung cancer tissue. For melanoma, another cancer known for high mutational load, no statistically significant difference was found. This may be due to higher variance between melanoma samples, which hinders the analysis, or due to the small sample size.

Clonal exchange between tissues at the level of B cell lineages

Next, we investigated clonal exchange between the PBMC, tumor-draining LNs, and tumors at the level of hypermutating clonal lineages, which are likely to be involved in recent and ongoing immune responses. The results are shown in Fig. 4A. After pooling patient-wise clonotype data, clonal groups were formed by sequence similarity, and then IGH clonotypes within a group were assembled into clonal lineages that shared a common ancestor68 and represent a B-cell clone undergoing the affinity maturation process. Each clonotype within a clonal group was attributed to the tissue of origin, tumor, LN, and/or PBMC, and to a particular isotype. For each clonal group, the percentage of clonotypes belonging to each isotype and tissue was calculated. Clonal groups from all patients with a given cancer type were plotted on a triangle plot using the percentage of clonotypes from the tumor, LN, and PBMC as coordinates and colored according to the dominant isotype (>60%).

Immunoglobulin hypermutation analysis across tissues and isotypes (clonal group level pooled patient data)

A - Schematic representation of analysis and triangle plot visualization of clonal group distribution between tissues; B, C, D - clonal group distribution between tissues for colorectal (B), lung (C), and melanoma (D); stars represent non-weighted by size mean center of triangle coordinates. Chi-2 test for goodness of fit was used to test whether each tissue equally contributed to clonal group formation.

In all studied cancer types, IgA-dominated clonal groups were evenly distributed among the three tissues, indicating no preference for IgA-switched cells towards lymphoid or tumor tissue residence. IgG-dominated clonal groups showed a preference for tumor and LN residence in lung cancer and melanoma, in accordance with the idea of tight interaction between LN and tumor in mounting anti-tumor immune responses. IgM-dominated clonal groups showed a strong preference for colorectal cancer tumors, indicating intensive intratumoral somatic hypermutation without isotype switching (Fig. 4B, red triangle).

LN-to-LN heterogeneity in colorectal cancer

Next, we sought to investigate whether BCR repertoire analysis could discern among a group of tumor-draining LNs that are in more intensive clonal exchange with the tumor. This question may be addressed at the individual clonotype level and at the level of hypermutating clonal groups.

At the individual clonotype level, we compared F2 metric values for pairwise tumor/LN BCR repertoire overlaps from 2 colorectal cancer patients, for which we obtained three separate LNs from the excised surgical material. For patient ccp2, a significantly higher overlap of tumor repertoire with one of the LNs was observed (Fig.5 A). Similarly, Cytoscape network analysis showed more clonotype sharing between LN3 than between other LNs (Fig.5 B). For ccp3, no significant difference was observed between LNs (Fig.5 A, right panel). Similarly, an unequal interaction of tumors with LNs was observed at the level of hypermutating clonal groups. We used the proportions of clonotypes originating from LN1, 2 and 3 among all LN-clonotypes within a clonal group as coordinates for triangle plots (Fig. 5 C). For ccp2, most of the LN3-focussed clonal groups were also prominently represented in the tumor. However, most of the other clonal groups resided in the center of the triangle, so the mass center was not shifted towards any of the LNs overall. Functionally, this may again indicate that within a group of LNs, nodes are unequal in terms of access to tumor antigens, and this inequality shapes the BCR repertoires within these LNs. On Fig. 5D we show an example of a clonal group that includes clonotypes from tumor (multiple tumor fragments) and only from one of the LNs, to illustrate asymmetric involvement of LNs in the development of a hypermutated clonotype. For ccp3, no such inequality was observed. To validate this hypothesis, it would be beneficial to obtain a BCR repertoire from non-tumor-draining LNs; however, this was not possible in the current study.

LN-to-LN difference of BCR repertoires in colorectal cancer (clonotype level - panels A, B; clonal group level - panels C, D; individual patient data).

A - repertoire overlap between tumor and three separate LNs from the draining lymph node pool; Mann-Whitney test, Bonferroni correction. B: Network representation of Ig repertoires from tumor and three separate LNs, with circles representing individual CDR-H3 sequences, size of the circles corresponding to the clone frequency, and color corresponding to the isotype; C - triangle plot visualization of clonal group distribution between three different LNs, with size corresponding to the number of individual CDR-H3 sequences (clonotypes) within a given clonal group, and color corresponding to the percentage of tumor-derived clonotypes–within the clonal group; D–example of a clonal lineage consisting of CDR-H3 sequences derived from a lymph node (blue) and all three tumor fragments (shades of red) from patient ccp2, shapes representing isotypes and size representing frequency of a given sequence in a given sample.

These observations reflect a complex interplay between tumors and LNs, which may be involved in the adaptive immune response to tumor antigens.

Intra-LN heterogeneity

Likewise, we asked whether parts of LNs are equally involved in interaction with tumor. For lcp3, we obtained BCR repertoires from 3 fragments of one draining LN and compared them to tumor repertoires on the individual clonotype and clonal group level. Repertoire from fragment LN11 showed significantly lower overlap with the tumor than other two LN fragments, LN12 and LN13 (Fig. 6A, left). Triangle plot analysis of clonal group distribution also showed that the majority of the tumor-dominated (red) clonal groups resided in the LN12-LN13 side of the triangle (Fig.6B, left).

Intra-LN and intratumoral heterogeneity (clonotype level - panels A, C; clonal group level - panel B, D; individual patient data).

A - repertoire overlap between tumor and lymph node fragments for colorectal cancer patient ccp6 and lung cancer patient lcp3; Mann-Whitney test, Bonferroni corr. B - triangle plot visualization of clonal group distribution between lymph node fragments, with size corresponding to the number of individual CDR-H3 sequences (clonotypes) within a given clonal group, and color corresponding to the percentage of tumor-derived clonotypes within the clonal group; C - network representation of Ig repertoires from tumor fragments, with circles representing individual CDR-H3 sequences, size of the circles corresponding to the clone frequency, and color corresponding to the isotype, edges connect clonotypes to their fragment of origin; D - triangle plot visualization of clonal group distribution between tumor fragments, with size corresponding to the number of individual CDR-H3 sequences (clonotypes) within a given clonal group, and color representing dominant clonotype; Stars represent non-weighted by size mean center of triangle coordinates. Chi-2 test for goodness of fit is used to test if each tumor segment equally contributes to clonal groups formation.

Intratumoral heterogeneity of immune repertoires

Spatial heterogeneity of the T cell receptor repertoire is known to reflect the mutational landscape of tumor 41,69. Statistically, the magnitude of intratumoral genetic heterogeneity correlates with the heterogeneity of immune cell infiltration, implying the co-evolution of the tumor genetic architecture and immune microenvironment 70. Higher heterogeneity of tumor-infiltrating T-cell repertoire is associated with higher risk of recurrence and shorter disease-free survival 41.

B-cell repertoire heterogeneity, however, remains relatively understudied. The mechanisms underlying TI-Bs heterogeneity may involve not only differential infiltration and accumulation of lymphocytes, but also local development of tertiary lymphoid structures which then drive the development of local immune response specific to subclonal neoantigens.

Here we aim to quantify intratumoral BCR repertoire heterogeneity, measured as the extent of clonotype or clonal group sharing between parts of the tumor. First, we confirm for all studied tumor samples, that repertoires derived from separate tumor fragments are significantly less overlapping (F2 metric) than repertoires from biological replicates of one tumor fragment (Fig.S4). Then we pooled repertoires from replicates belonging to the same tumor fragments and analyzed the overlap between top 300 clonotypes from fragments of the same tumor using Cytoscape platform (Fig.6C). We observed significant patient-to-patient variability in the degree of heterogeneity between tumor fragments. In a tumor from patient ccp2, all three fragments have a high number of clonotypes in common (represented also by close distance between anchor nodes on the bubble diagram), and the isotype composition is also similar and dominated by IgA (Fig.6C,D left). Overall, this draws a picture of a relatively homogenous immune infiltrate in the tumor of this patient. In patient ccp3 (Fig.6 C, D, right), we see a remarkably different pattern. In ccp3, fragment tum_1 is dominated by IgA and has many shared clonotypes with fragment tum_3, which are also IgA-clonotypes.

Fragment tum_2 shares many clonotypes with tum_3, and these clonotypes, as also total repertoires of fragments tum_2 and tum_3, are mostly IgG and IgM clonotypes. Heterogeneity of clonal group distribution was also significantly more prominent in ccp3 compared to ccp2 (Fig.6D). In ccp2, significant eccentricity was observed only for IgG-dominated clonal groups (red star on Fig.6D, left), which tended to share between fragments tum_2 and tum_3, and not tum_1. This may indicate that IgA clonal groups are specific either to clonal tumor antigens (that is, present in all parts of tumor), or to other ubiquitous antigenic determinants, whereas IgG clonal groups more likely are specific to tumor specific and/or subclonal antigens (present only in some parts of tumor). In ccp3, most clonal groups show a significant degree of eccentricity on triangle plot (Fig.6D, right), indicating heterogeneous distribution between tumor fragments, for IgA-dominated as well as for IgG-dominated clonal groups. Given that clonality of ccp3 tumor fragments tum_2 and tum_3 is also higher than that of ccp2 tumor, we hypothesize that heterogeneity plus high clonality indicate formation of TLSs and thus efficient local immune response towards tumor-related antigens.

To analyze the presence of TLSss in our tumor samples, we used histology and immunohistochemistry. We found frequent dense leukocytic accumulations in the peritumoral region, as well as distant from the tumor. The latter were the mucosal/submucosal layers, whereas peritumoral ones were found in all the intestinal layers (Fig. S5B). Such an appearance in the outer layers and prominent accumulation near the tumor indicates that these are more likely to be TLSs, rather than Peyer’s patch-like structures that are localized predominantly in the mucosal and submucosal layers. In order to evaluate whether this accumulation has organized TLSs-related lymphocyte distribution, we used multicolor immunohistochemical staining of T and B cells together with TLSs-related markers: high endothelial venule marker PNAd and CXCL13 chemokine, which drives TLSs formation. We found that CD20 B cells and CD3 T cells were the major cell types within these patches, indicating their lymphoid origin (Fig S5E,F). Peritumoral structures have condensed B cells follicles surrounded by T cells and high endothelial vessels (HEV), confirming that these are primary follicular TLSs71,72. Distant lymphoid structures in the mucosal layer have a more dispersed and intermixed T and B-cell distribution, indicating a lower degree of maturation. Similar lymphoid aggregates in the mucosa and peritumoral regions have been observed on all tissue blockers for patients with colorectal cancer (two for each patient). However, their comparison is unrepresentative, owing to variations in the sampling of tissue fragments.

The design of our study included obtaining cellular-level replicates for each processed tissue fragment. This allowed us to reliably detect CDR-H3 clonotypes that were significantly expanded in a given sample relative to other samples. Fig.7 shows an example of such an analysis. We used the EdgeR library from the Bioconductor package to detect clonotypes that were differentially expanded in separate fragments of tumors. These clonotypes are represented in Fig.7 A, B as colored circles on a frequency correlation plot. Expanded clonotypes with the highest frequencies were also visible on Cytoscape plots (Fig.7 C) and labeled according to their sequences. Interestingly, most expanded clonotypes are not attributed to any clonal group (and thus are not actively hypermutating), and the largest of these are of IgM isotype. Only one of the expanded clonotypes in this tumor was involved in the hypermutation process, and the structure of its clonal lineage is shown in Fig.7 D. On; other hand, none of the largest clonal lineages detected in this patient contained any expanded clonotypes (Fig.7 E). In addition, on average, expanded clonotypes have fewer mutations than non-expanded clonotypes, both in tumors and in tumor-draining LNs (Fig.7 F, all patients).

Expanded clonotypes and hypermutation analysis (clonotype level - A, B, C, F; clonotype group level - D, E; individual patient data).

A, B - Visualization of expanded clonotypes on frequency correlation plots for pairs of tumor fragments of melanoma tumor from patient mp3; C - Cytoscape network visualization of top 300 most frequent individual Ig CDR-H3 clonotypes, colored by isotype, with size of circles proportional to the frequency of a given clonotype; D - example of a clonal lineage containing expanded CDR-H3 sequence; E - examples of t e biggest clonal lineages, none of which contain expanded CDR-H3 sequences; F - average number of mutations in expanded and non-expanded clonotypes; Mann-Whitney test; N=11.

Short vs long trees - phylogeny analysis

CDR-H3 is the most variable component of the BCR and is also the most important in terms of antigen recognition. However, missing mutations in other segments may lead to difficulties or a complete inability to accurately recover the phylogeny of hypermutations. Moreover, shorter sequencing reads, which may be sufficient to obtain CDR-H3 data, cover the V segment to a lesser extent, which may lead to less accurate germline gene assignment and, consequently, erroneous rooting of the phylogenetic tree.

To estimate the extent to which phylogenetic trees built on short-read data and CDR-H3 sequences alone reflected the ‘true’ phylogeny recovered from full-length (CDR1 to FR4 regions) sequences, we used two types of BCR libraries obtained from the same lung cancer patient material. Short CDR-H3 sequences were obtained from libraries prepared according to the 5’RACE protocol and sequenced with a 150+150bp read length. Long sequences were obtained from libraries prepared according to the IgMultiplex protocol and sequenced at 250+250bp read length. Then, for each dataset, we independently inferred clonal lineages and built rooted phylogenetic trees for each clonal lineage of size five or more (see Materials and Methods for more details). To compare the structures of these tree sets, we selected clonotypes with the CDR-H3 sequence and the corresponding V and J genes present in both the long and short trees. We then calculated their average ranks by distance to the root within their clonal lineages and compared the ranks (Fig.S6 A). The correlation was around 0.7 (p<6*10-16). Therefore, phylogeny inferred based on short CDR-H3 sequences generally reflects the ‘true’ phylogeny inferred from the CDR1-FR4 repertoires. However, the average distance to the root for IgM and IgA/IgG isotypes, which have significant differences for long repertoires, did not show significance for short (CDR-H3-based) repertoires (Fig.S6 B).

One potential explanation for this could be that a larger sample size and a higher sequencing depth are required for short sequences to yield statistically significant differences. Indeed, it is impossible to avoid sporadic misplacements of short sequences in the tree when the information on hypermutations is limited. On the other hand, we clearly observed a correlation between long and short tree phylogenies. Therefore, we suppose that the reason for this contradiction is an insufficient sample size. However, there is evidence that CDR-H3 data do not always perfectly represent the ‘picture’ and should be treated with particular attention.

Grimsholm et al. in 2020 that memory B cells have, on average, higher Kidera factor 4 values and lower predicted interaction strength than naïve B-cells40. We assumed that these features may evolve during the affinity maturation process. Therefore, we checked how the mean kf4, strength, and charge of the five central CDR-H3 amino acids depend on the clonotype’s distance to the MRCA rank on the phylogenetic tree. To ensure that there was no bias in phylogeny as a result of short sequencing length and small sample size, we used full-length sequence trees inferred from lung cancer patient lcp3 and full-length repertoire of very high sequencing depth from a healthy donor. However, we did not find any correlation between strength, charge, and kf4 and the clonotype’s position on a tree (Fig.S6 C). One exception was a near-zero increase in charge down the tree found in the healthy donor repertoire (r=0.04, p=4.9*10-05), which, despite being statistically significant, is probably not mechanistically meaningful. In addition, the largest (>20 members) trees had dN/dS values < 1, indicating negative selection pressure (Fig.S6 D). Taken together, these data suggest that, in the absence of a defined time-controlled antigenic stimulus, such as vaccination, in steady-state equilibrium, only a few immunoglobulin clonal groups show definitive signs of positive selection.

Productive involvement in hypermutating lineages depends on CDR-H3 characteristics

Taking into consideration the previously found differences between memory and naïve cell features by Grimsholm et al.40, we hypothesized that amino acid features of CDR-H3 may be selected even before the start of hypermutation and affinity maturation. To test this hypothesis, we checked how the mean kf4, strength, and charge of five central CDR-H3 amino acids depend on the clonal status of the clonotype (Fig. S7) and found the results to be tissue-specific. In particular, clonotypes not assigned to any clonal lineage (singles) had lower kf4 values, and hence higher hydrophobicity, than members of some clonal groups in the tumor and PBMC repertoires. The predicted interaction strength was higher for singles in PBMC and LNs, but not in the tumor, potentially indicating that tumor-resident clonotypes have undergone selection for CDR-H3 properties, irrespective of their involvement in SHM. Finally, clonal lineages had a higher charge in tumor repertoires, but not in the LNs or PBMC, indicating a tendency towards polyreactivity in clonotypes evolving locally under the influence of TME.

These results generally match the findings of Grimsholm et al. and support the hypothesis that selection of CDR-H3 amino acid features occurs before cells undergo SHM. Tissue differences could potentially be explained by differences in the proportions of B cell subsets in different tissues.

Discussion

We performed a multi-location study of BCR repertoires in cancer patients using an unbiased BCR library preparation technique complemented by deep sequencing. We analyzed these repertoires from two complementary perspectives. At the level of individual BCR clonotypes, we studied various repertoire features and their differences between tissues of origin, as well as repertoire heterogeneity within tumor or lymph node tissue. At the level of B-cell clonal lineages, we tracked lineage sharing between tissues, and differences in clonotype features between clonotypes that belong to some clonal lineage, and those that do not.

First, we compared BCR repertoires from PBMC and bulk tumor infiltrates and the corresponding sorted CD19+CD20-CD38high plasma cells, and confirmed that the bulk RNA-based BCR repertoire is dominated by plasma cells across different peripheral compartments. This is expected, given that the IG expression level is 50-500 times higher in plasma cells than in memory B-cells21. However, PC has a higher IgM isotype proportion in the tumor and a lower proportion of IgG in the lymph node. The higher proportion of IgM in PCs relative to the bulk repertoire may indicate that some IgM-producing plasma cells, while having an appropriate plasmablast/PC surface marker phenotype, produce relatively low levels of antibodies, which leads to them being masked in the bulk repertoire by non-IgM non-PC B cells. A higher level of antibody production correlates with the differentiation of plasmablasts/short-lived plasma cells into long-lived plasma cells. Therefore, one could hypothesize that tumor-infiltrating IgM-expressing antibody-producing cells are skewed towards a less differentiated B-cell compartment compared to other isotypes. Similar logic applies to the IgG-producing plasma cells in the LNs: a lower proportion of IgG-clonotypes in PCs compared to the bulk repertoire indicates that some of the most frequent IgG clonotypes in the bulk repertoire belong to cells with a non-PC/plasmablast phenotype, presumably stimulated memory B-cells.

BCR repertoire overlaps and the correlation of isotype proportions of tumor and lymph node samples were higher than those of tumor and peripheral blood. A higher repertoire overlap is indicative of a higher proportion of tumor-related BCR clonotypes in the draining LNs than in the peripheral blood, which illustrates the crucial role of draining LNs in the development of the anti-tumor immune response. In addition, in accordance with previous studies, our data confirmed that the tumor BCR repertoire is more clonal than that in the lymph node or PBMC. We also showed that LNs within the same draining LN pool may be differentially involved in tumor interaction. Overall, LNs are a better source of material to study anti-tumor B-cell response than PBMC, and a better source of material for a potential cancer B-cell or antibody therapy, if tumor tissue is unavailable.

The mean CDR-H3 length of the most frequent BCR clonotypes in tumors was higher than that in other tissues. This finding may indicate that tumors tend to be infiltrated with less mature B-cells than other compartments 40, and may also indicate a higher proportion of autoreactive and polyreactive 64,73 antibodies, which is consistent with previous research. However, in colorectal cancer, the mean CDR-H3 length in the IgA repertoire was lower than that in melanoma. This may indicate that the IgA repertoire in colorectal cancer is less subject to TME-induced biases than in other cancers studied in this study. Differences in other CDR-H3 properties, such as mean kf3 (b-sheet preference), kf4 (inversely correlated with hydrophobicity), predicted interaction strength, and charge between different tissues, including increased central-CDR-H3 charge in the tumor compared to normal tissue, may suggest a less mature and less specific BCR repertoire of tumor-infiltrating B-cells.

Clonal groups representing hypermutating B-cell lineages are distinct in their isotypes and tissue composition. In colorectal cancer, clonal groups dominated by IgA (> 60% of IgA clonotypes) tend to be shared between tumors and blood, whereas IgG-dominated (> 60% of IgG clonotypes) clonal groups tend to be shared between tumors and draining LNs. The probability of detection of a given clone in a given location is dependent on the mRNA quantity that is produced by a particular B-cell or group of B cells (BCR expression level), as well as on the lifetime of these cells in this location (lifetime being a superposition of cells entering and leaving the tissue, and the literal lifespan of the cells and their progeny). Assuming that BCR is expressed at the same level in all members of a certain clonal population, whether it is found in the blood, LN, or tumor, the probability of detecting a clone is then only a function of the “combined lifetime” of a clone within a given tissue. Therefore, the proportion of clones within a clonal group derived from a certain tissue is a snapshot of their preference for tissue residence or circulation. By comparing IgA- and IgG-dominated clonal groups, we observed subtle differences in the behavior of B cells during ongoing clonal selection and interaction with the tumor. Whereas IgG clonal groups preferentially reside in tumors and draining LNs, IgA-dominated clonal groups show a greater preference for circulation. An obvious question is whether this corresponds to the proportion of isotypes at the cellular level(by surface or intracellular staining). The relative frequencies of antibody-secreting B cells in peripheral blood were IgA1 > IgG1 > IgA2 > IgM > IgG4 > IgG2 > IgG3 74, which corresponds to our observations of the peripheral blood bulk repertoire (total IgA>total IgG>total IgM). Therefore, the observed clonal group sharing between tissues may simply be due to the higher number of clonotypes, with a certain isotype detected in a certain tissue.

However, for tissue distribution triangle plots, the data were normalized for isotype usage between tissues (i.e., equal numbers of IgG/IgA/IgM most frequent clonotypes were used for PBMC, LN, and blood) to correct for quantitative bias. Therefore, the trend for tissue preference reflects true clonal behavior and not simply quantitative prevalence.

It is well known that only draining LNs show signs of interaction with tumor antigens 75. In murine models, non-draining usually occurs contralateral to the tumor site, which is anatomically distant. However, whether LNs within the same anatomical site may differ in terms of their interaction with the tumor has yet to be determined. For some of the patients in this study, we were able to obtain more than one lymph node from the same tumor-draining pool, which allowed us to address this question directly on the level of BCR repertoires. A higher overlap between the tumor and LN repertoire signifies a more intensive interaction with the tumor and its antigens. In at least one patient, the LNs studied were significantly different in this regard, with one of the LNs being significantly more similar to the tumor in its BCR repertoire. This observation was also reproduced at the clonal group level, where the lymph node that was in closer interaction with the tumor in terms of clonal overlap also contained more clonal groups that included tumor-derived clonotypes and were confined to this lymph node only. These clonal groups represent antibody maturation and selection processes involving tumors, and occur preferentially in the LN, which is in tight interaction with the tumor, but not in the others. Whether this also reflects the response to tumor antigens and the maturation of tumor-specific antibodies remains to be tested.

It has been previously shown that within a single lymph node, individual germinal centers (GCs) share their immunoglobulin clonal composition to some extent, and the offspring of one B-cell clone may be found in several individual germinal centers 76. During GC reactions, individual GCs become oligoclonal (containing 4-13 major B cell clones with functional IgVH, as detected in the pre-NGS era) due to an affinity selection process 7779. However, in the case of chronic antigen stimulation, as in cancer, repeated cycles of antigen exposure, memory B-cell reactivation, and germinal center reactions should lead to uniform involvement of the whole LN in interaction with tumors and production of tumor-specific immune responses. Again, we sought to study the spatial heterogeneity of LNs at the level of individual immunoglobulin clonotypes and clonal lineages. We found that the BCR repertoires derived from fragments of a single lymph node were significantly different in their similarity to the pooled tumor BCR repertoire (F2 metric). We conclude that despite chronic antigen exposure, despite the ability of memory B cells to leave and re-enter germinal center reactions and form new germinal centers in subsequent rounds of affinity maturation, the BCR landscape of a lymph node remains significantly heterogeneous in space.

Likewise, for tumor immune infiltrates, it was shown that the TCR repertoire is heterogeneous in space, and this heterogeneity correlates with subclonal mutations within the tumor tissue 69 and likely represents expanded mutation-specific T cell clones. BCRs are expected to be even more clonal, both in LNs and tumors. However, for BCRs, heterogeneity analysis is complicated by the orders-of-magnitude difference in immunoglobulin expression between plasma cells and other B-cells21. A rare plasma cell will have a low probability of sampling at the cellular level but will produce a sufficient amount of immunoglobulin RNA to be reliably detected in the BCR repertoire. Therefore, appropriate use of biological (cellular) replicates sequenced at comparable depths is crucial for accurate detection of differentially expanded clonotypes23.

Following the logic described above, we were able to analyze repertoire overlap between tumor fragments and detect BCR clonotypes that were differentially expanded in individual fragments. We observed that, in some patients, the most abundant BCR clonotypes were different in different fragments. In this case, the most expanded clonotypes rarely overlapped between fragments, thus probably representing clonotypes that recognize subclonal tumor antigens. This needs to be tested directly and, if proven true, may potentially be a reliable method for the identification of tumor-specific antibodies. In other cases, the tumor may be relatively homogeneous, without significant expansion, and with highly overlapping repertoires of individual fragments. These observations are mostly reproduced at the level of clonal groups, where in heterogeneous tumors, we observed clonal groups that concentrated almost entirely in single tumor fragments, whereas in homogenous tumors, clonal groups mostly include clonotypes derived from all fragments of the tumor.

The physicochemical properties of the CDR-H3 region differ between clonotypes that represent clonal lineages, and thus, are actively hypermutating, and those that do not. The most prominent difference was observed for kf4 (inversely related to hydrophobicity) in PBMC and tumors; clonotypes that undergo hypermutation are, on average, less hydrophobic, which indicates higher binding specificity. However, there was no correlation between kf4 and the extent of hypermutation, even in large full-length Ig-profiling datasets. One possible explanation may be that the selection of less hydrophobic CDR-H3s occurred before the onset of hypermutation.

Interestingly, analysis of silent vs. non-silent hypermutations (dNdS) suggests that in the absence of a defined time-controlled antigenic stimulus, such as vaccination, in steady-state equilibrium, only a few immunoglobulin clonal groups show definitive signs of positive selection. It would be interesting to systematically explore the specificity of these clones in patients with cancer as a potential source of tumor-reactive antibodies.

In general, this study has several limitations, such as the small number of patients, both per cancer type and overall, and our 5’RACE library preparation protocol, which was used for most samples in this study, does not allow for an accurate distinction between the IgA and IgG isotype subtypes. Regarding clonal lineage structure analysis, we found that our study design has a major limitation of a relatively short (150 bp) read length, which does not provide sufficient information for hypermutation analysis.

Nevertheless, our observations contribute to the understanding of the interactions among the tumor immune microenvironment, tumor-draining LNs, and peripheral blood.

Materials and methods

Patients

All clinical samples were acquired from the N.N. Blokhin National Medical Research Center of Oncology or Volga District Medical Centre under the Federal Medical and Biological Agency. This study was conducted in accordance with ICH-GCP. The protocol was approved by the Ethical Committees of the Volga District Medical Centre under the Federal Medical and Biological Agency, and by the N.N. Blokhin National Medical Research Center of Oncology. Written informed consent was obtained from all patients.

Fourteen patients were included in the study, of which four were diagnosed with colon cancer, four with lung adenocarcinoma, and six with melanoma. For each patient, samples from the tumor (tum), draining LN(s) (further referred to as LN1, LN2, LN3 if multiple LNs were analyzed), peripheral blood (PBMC), and normal tissue (norm) were collected, or only some of these tissues (Table S2). Several distant pieces were resected from each tumor and lymph node, unless there were several LNs available (as for colon cancer patients, ccp2, ccp3). All samples were bulk (unsorted), but for two colon cancer patients (ccp2 and ccp3), plasma cells were sorted from a part of each sample. All the samples were processed and stored for library preparation. Blood sampling was performed immediately prior to surgery, and the total volume of obtained blood did not exceed 35 ml for each patient. Fragments of tumor, LN, or normal tissue were placed in 50 ml Falcon tubes containing 7-10 ml of MACS® Tissue Storage Solution (Miltenyi Biotec, cat. 130-100-008) and stored at room temperature for no longer than 2 hours. All the procedures were performed under sterile conditions. PBMC were isolated from the blood samples using the Ficoll-Hypaque™ centrifugation protocol. Briefly, whole blood was diluted 2.5 times with sterile PBS buffer, carefully layered over Ficoll-Paque PLUS (GE Healthcare, cat. 17-1440-03) and centrifuged at 600g for 20 min. Afterwards, buffy coats were harvested from the interface, washed twice with PBS buffer (20 min 350 g), calculated, and lysed in RLT buffer (Qiagen, cat. 79216 ) at 0.5 to 3 *10^6 cells/sample density.

Tum, LN, and normal tissues were mechanically cut into 3-7 mm fragments and incubated in RPMI 1640 medium (Gibco™, cat. 42401042) containing 1 mg/ml Liberase TL Research Grade (Roche, cat. 5401020001) and 30 U/ml DNase I (Qiagen, cat. 15200-50), at 37°С for 30–60 min with gentle shaking. Samples were then processed using a gentleMACS™ Dissociator (Miltenyi Biotec, cat. 130-093-235) and passed through a 100 um Nylon cell strainer to remove non-dissociated fragments. The resulting cell suspension was concentrated by centrifugation (7 min, 350 g) and lysed in RLT buffer(Qiagen, cat. 79216 ) 0.1-0.5 *10^6 lymphocytes /sample density. All RLT samples (PBMC, tum, LN, and norm lysates) were stored at -80°C before preparation of the BCR IGH repertoire libraries.

PC isolation

For colon cancer patients, three distant samples (at least 3 cm apart) were obtained from the tumor border with a portion of normal tissue. Cells from three digested tumor samples, three LNs (for patients ccp2 and ccp3) and two replicates of PBMC were stained with a panel of fluorescent antibodies: CD45-PerCP/Cy5.5 (BD 564105, clone HI30), CD38-PE (BD 555460, clone HIT2), CD19-Alexa700 (BD 557921, clone HIB19), CD20-BV510 (BD 563067, clone 2H7), CD25-V450 (BD 560458, clone M-T271) and isolated BD FACSAriaIII cell sorter (BD Bioscience). First, lymphocytes were gated as CD45+ cells, and the plasma cell population was isolated as CD20_neg/CD19_low/CD27_high/CD38_high. Sorting was performed using a FACSAria III cell sorter (BD Bioscience) directly into RLT lysis buffer (Qiagen, cat 79216). Cells from the ccp6 LNs were lysed and unsorted.

BCR library preparation and sequencing

All BCR IGH libraries were generated using the 5RACE methodology described by Turchaninova et al.21 and sequenced with a 150+150 bp read length. For one of the lung cancer patients, lcp3, additional tumor, lymph node, and normal lung tissue libraries were generated using IgMultiplex methodology and sequenced with 250+250 read length. To account for sampling bias, we also obtained technical replicate samples at the cell suspension level.

Preprocessing of sequencing data

To process sequencing reads, we used the MiNNN software to extract UMI from raw sequences, correct errors, and assemble consensus sequences for each UMI. For bulk libraries prepared with the 5’RACE protocol and sequenced with 150+150 read length, we filtered out UMI with less than three reads. For bulk libraries prepared with the Multiplex protocol and sequenced with 250+250 read length, we filtered out UMI with less than four reads. For libraries of sorted plasma cells prepared with the 5’RACE protocol and sequenced with 150+150 read length, we filtered out UMI for which there were fewer than two reads.

To ensure the absence of contamination, we checked for the presence of identical UMI in different samples. If such a UMI was identified, and BCR sequences for such UMIs were identical or almost identical (suspecting amplification error), then the number of reads for this UMI in both samples were compared. If the number of reads of this particular UMI in one sample exceeded the number of reads of this UMI in another sample by more than five times, this UMI was eliminated from the sample with a smaller number of reads per UMI. If the ratio was lower than 5, the UMI was eliminated from both samples.

After UMI-based decontamination, we used MIXCR software to assemble reads into quantitated clonotypes, determine germline V, D, and J genes, isotypes, and find the boundaries of target regions, such as CDR-H3. Clonotypes derived from only one UMI were excluded from the analysis of individual clonotype features but were used to analyze clonal lineages and hypermutation phylogeny. Single-UMI sequences were eliminated to avoid errors during cDNA synthesis during library preparation. However, modern reverse transcriptases have high fidelity, with 2.5e-05 error rates for some of them 80. Therefore, we considered it reasonable to include single UMI sequences in those parts of the analysis, where a larger sample size was important.

Samples with 50 or less clonotypes left after preprocessing were excluded from the analysis.

Clonal lineage inference

We identified sequences belonging to the same clonal lineage using the ChangeO software. The criteria for the initial grouping were the same V and J germline genes identified by MIXCR, and the same CDR-H3 length. These criteria do not account for the D segment, as there is insufficient confidence in the germline annotations due to its short length and high level of mutations. Sequences within each group were defined as belonging to the same clonal lineage if they had a nucleotide CDR-H3 sequence identity above a certain threshold. Such a threshold was individually defined for each patient’s dataset as a local minimum of the distance-to-nearest distribution function31. In most cases, this threshold is set between 80% and 85%.

Phylogenetic inference

The phylogeny of B-cell hypermutations was inferred for each clonal lineage of size five or more using the maximum likelihood method and the GTR GAMMA nucleotide substitution model. To find the most recent common ancestor (MRCA) or root of the tree, we used an outgroup constructed as a conjugate of germline segments V and J defined by MIXCR. The D segment was excluded from the outgroup formation, as there was insufficient confidence in the germline annotations due to its short length and high level of mutations. The MUSCLE tool was used for multiple sequence alignment and RAxML software was used to build and root phylogenetic trees.

Sample pooling and normalization

For some parts of the analysis, we considered samples irrespective of technical replicates or specific tumor/LN segments. In such cases, the corresponding repertoire datasets were pooled in one, where for each clonotype, its new frequency was calculated as the mean of the clonotype’s frequencies from separate files. Such an approach helps avoid sampling bias and achieve equal contributions of all clonotype sets being pooled.

Statistical analysis

In our analysis, we often used repertoires of different parts of the same tissue as separate observations within the same comparison. Examples of these could be isotypes, different pieces of tumors or LNs, or PBMC samples taken at different time points. Understanding the pitfalls of this approach in general, we argue that it can be justified in some cases considering the heterogeneity of tissues, especially tumors, and the distinctive characteristics of different isotypes.

Most of the analysis was performed using VDJtools and custom Python and R scripts.

Expanded clonotypes detection

We used the EdgeR package in R to identify the clonotypes that were differentially expressed between the two sample sets. The problem with this approach is determining the correct number of counts required to pass into the DGEList function of EdgeR. Using a number of unique UMIs detected for each clonotype in the sample might not be a good idea, considering the possibility of sampling bias (e.g., resecting tumors into pieces of slightly different sizes). To account for sampling bias, we defined clonotype count for the DGEList function as clonotype frequency in a normalized sample multiplied by the total number of unique UMIs in all groups of samples.

The output of the DGEList function is then normalized and passed to the exactTest function of the EdgeR. Clonotypes with FDR<0.05 and logFC>0 were considered expanded in a corresponding group of samples.

Immunohistochemistry

Sections of formalin-fixed paraffin-embedded tissue were sliced on an RM2235 microtome (Leica Biosystems). Slices were deparaffinized with Xylol and Ethanol. H&E staining was performed using Mayer’s hematoxylin and eosin (Biovitrum, Russia). For IHC staining, the slices were demasked in AR9 buffer (PerkinElmer) for 10 min at 98 0С in a water bath, washed twice in PBS, and blocked for 30 min with Protein Block (Leica Novocastra, UK). Primary antibodies were added without washing the blocking solution and incubated overnight at 4 0С. After incubation with primary antibodies, slices were washed twice in PBS and stained with the NovoLink polymer detection system (Leica Biosystems, UK) according to the manufacturer’s instructions or with HRP-streptavidin (Jackson Immunoresearch 016-620-084, 1:500 for 30 min) and contrasted with 10 µM CF tyramide dye (Biotium, USA). The following primary antibodies were used at the consequence and with indicated dilutions and CF dyes: CD20 (Leica Biosystems PA0200), 1:1 with CF488; CXCL13 (GeneTex GTX108471, Taiwan), 1:100 with CF660; CD3 (HuaBio HA720082, China), 1:200 with CF430; PNAd-biotin (BioLegend 120804, USA), 1:200 with CF555. Before staining with non-biotinylated antibodies, the antibodies were stripped in AR9 buffer for 10 min at 98 0С. After staining with antibodies, the slides were counterstained with DAPI (0,2 nM) for 5 min, embedded in Mowiol 4-88 (Sigma-Aldrich), and coverslipped.

Brightfield and fluorescence images were acquired using an EVOS M7000 microscope (Thermo Fisher Scientific) with a 20x dry objective. The following light cubes were used: DAPI (AMEP4650) for DAPI fluorescence, CFP (AMEP4653) for CF430 fluorescence, YFP (AMEP4654) for CF488 fluorescence, RFP (AMEP4652) for CF555 fluorescence, and Cy5 (AMEP4656) for CF660. Whole slice images were stitched using microscope imaging software. Fluorescence images were contrasted, colorized into pseudocolors, and overlaid using Fiji software (https://imagej.net/).

Abbreviations

  • (PBMC): Peripheral blood mononuclear cells

  • (LN): lymph node

  • ((tumor)-draining LNs): (tumor)-draining lymph nodes

  • (BCR): B cell receptor

  • (CDR-H3): complementarity-determining region 3 of a BCR heavy chain

  • (TME): tumor microenvironment

  • (SHM): somatic hypermutation

  • (TLSs): tertiary lymphoid structures

  • (TI-Bs): tumor-infiltrating B-cells

  • (ADCC): antibody-dependent cytotoxicity

  • (MDSC): myeloid-derived suppressor cells

  • (LUAD): lung adenocarcinoma

  • (Rapid Amplification of cDNA Ends): 5’RACE

  • (IGH): immunoglobulin heavy chain

  • (SLE): systemic lupus erythematosus

  • (CAR-T): chimeric antigen receptor T cell

  • (MRCA): most recent common ancestor.

Terms “tumor-draining LN”, “draining LN” and “LN” are used interchangeably throughout the text and figures.