Common virulence gene expression in adult first-time infected malaria patients and severe cases
Sequestration of Plasmodium falciparum(P. falciparum)-infected erythrocytes to host endothelium through the parasite-derived P. falciparum erythrocyte membrane protein 1 (PfEMP1) adhesion proteins is central to the development of malaria pathogenesis. PfEMP1 proteins have diversified and expanded to encompass many sequence variants, conferring each parasite a similar array of human endothelial receptor-binding phenotypes. Here, we analyzed RNA-seq profiles of parasites isolated from 32 P. falciparum-infected adult travellers returning to Germany. Patients were categorized into either malaria naive (n = 15) or pre-exposed (n = 17), and into severe (n = 8) or non-severe (n = 24) cases. For differential expression analysis, PfEMP1-encoding var gene transcripts were de novo assembled from RNA-seq data and, in parallel, var-expressed sequence tags were analyzed and used to predict the encoded domain composition of the transcripts. Both approaches showed in concordance that severe malaria was associated with PfEMP1 containing the endothelial protein C receptor (EPCR)-binding CIDRα1 domain, whereas CD36-binding PfEMP1 was linked to non-severe malaria outcomes. First-time infected adults were more likely to develop severe symptoms and tended to be infected for a longer period. Thus, parasites with more pathogenic PfEMP1 variants are more common in patients with a naive immune status, and/or adverse inflammatory host responses to first infections favor the growth of EPCR-binding parasites.
Despite considerable efforts during recent years to combat malaria, the disease remains a major threat to public health in tropical countries. The most severe clinical courses of malaria are due to infections with the protozoan species Plasmodium falciparum. In 2019, there were 229 million cases of malaria worldwide, resulting in more than 400,000 deaths (WHO, 2020). Currently, about half of the world’s population lives in infection-prone areas, and more than 90% of the malaria deaths occur in Africa. In particular, children under five years of age and pregnant women suffer from severe disease, but adults from areas of lower endemicity and non-immune travelers are also vulnerable to severe malaria. Both, children and adults are affected by cerebral malaria, but the prevalence of different features of severe malaria differs with increasing age. Anemia and convulsions are more frequent in children, jaundice indicative of hepatic dysfunction and oliguric renal failure are the dominant manifestations in adults (Dondorp et al., 2008; World Health Organization, 2014). Moreover, the mortality increases with age (Dondorp et al., 2008) and was previously determined as a risk factor for severe malaria and fatal outcome in non-immune patients, but the causing factors are largely unknown (Schwartz et al., 2001).
The virulence of Plasmodium falciparum (P. falciparum) is linked to the infected erythrocytes binding to endothelial cell surface molecules expressed on blood vessel walls. This phenomenon, known as sequestration, prevents the passage of infected erythrocytes through the spleen, which would otherwise remove the infected erythrocytes from the circulation and kill the parasite (Saul, 1999). The membrane proteins mediating sequestration are exposed to the host’s immune system, and through evolution, P. falciparum parasites have acquired several multi-copy gene families coding for variant surface antigens, allowing immune escape through extensive sequence polymorphisms. Endothelial sequestration is mediated by the P. falciparum erythrocyte membrane protein 1 (PfEMP1) family, the members of which have different binding capacities for host vascular tissue receptors such as CD36, endothelial protein C receptor (EPCR), ICAM-1, PECAM1, receptor for complement component C1q (gC1qR), and chondroitin sulphate (CSA) (Magallón-Tejada et al., 2016; Turner et al., 2013; Rowe et al., 2009). The long, variable, extracellular PfEMP1 region responsible for receptor binding contains a single N-terminal segment (NTS; main classes A, B, and pam) and a variable number of different Duffy binding-like (DBL; main classes DBLα-ζ and pam) and cysteine-rich inter-domain region domains (CIDR; main classes CIDRα-δ and pam) (Rask et al., 2010). These domains were initially allocated to subclasses (e.g., the DBLβ subclasses 1–13); however, due to frequent recombinations between members of the different subclasses, many of these are indistinct and poorly defined (Otto et al., 2019).
PfEMP1 molecules have been grouped into four categories (A, B, C, and E) depending on the type of N-terminal domains (the PfEMP1 ‘head structure’) as well as the 5’ upstream sequence, the chromosomal localization, and the direction of transcription of their encoding var gene (Rask et al., 2010; Kyes et al., 2007; Kraemer and Smith, 2003; Lavstsen et al., 2003). Each parasite possesses about 60 var genes with approximately the same distribution among the different groups (Rask et al., 2010). About 20% of PfEMP1 variants belong to group A and are typically longer proteins with a head structure containing DBLα1 and either an EPCR-binding CIDRα1 domain or a CIDRβ/γ/δ domain of unknown function. Further, the group A includes two conserved, strain-transcendent subfamilies: the var1 subfamily (previously known as var1csa), found in two different variants in the parasite population (3D7- and IT-type) (Otto et al., 2019), and the var3 subfamily, the shortest var genes with only two extracellularly exposed domains (DBLα1.3 and DBLε8). Groups B and C include most, about 75%, of PfEMP1 and typically have DBLα0-CIDRα2–6 head structures binding CD36, followed by a DBLδ1-CIDRβ/γ domain combination. A particular subset of B-type proteins, also known as group B/A chimeric genes, possesses a chimeric DBLα0/1 domain (a.k.a. DBLα2) and an EPCR-binding CIDRα1 domain. Thus, the head structure confers mutually exclusive binding properties either to EPCR and CD36, or to an unknown receptor via the CIDRβ/γ/δ domains. C-terminally to the head structure, most group A and some group B and C PfEMP1 have additional DBL domains, of which specific subsets of DBLβ domains of group A and B PfEMP1 bind ICAM-1 (Lennartz et al., 2017; Janes et al., 2011), and another DBLβ domain, DBLβ12, has been suggested to bind gC1qR (Magallón-Tejada et al., 2016). The consistent co-occurrence of specific domain subsets in the same PfEMP1 gave rise to the definition of domain cassettes (DCs) (Otto et al., 2019; Berger et al., 2013; Rask et al., 2010). The best example of this is the VAR2CSA PfEMP1 (group E, DC2), which binds placental CSA and causes pregnancy-associated malaria (Salanti et al., 2004). The VAR2CSA proteins share domain composition, their encoding genes are less diversified than other var groups, and all parasites possess one or two var2csa copies. Another example is the chimeric group B/A PfEMP1, also known as DC8, which includes DBLα2, specific CIDRα1.1/8 subtypes capable to bind EPCR and typically DBLβ12 domains.
Due to the sequence diversity of var genes, studies of var expression in patients have relied on the analysis of DBLα-expressed sequence tags (EST) (Warimwe et al., 2009; Warimwe et al., 2012) informing on relative distribution of different var transcripts and qPCR primer sets covering some, but not all, subsets of DBL and CIDR domains (Lavstsen et al., 2012; Mkumbaye et al., 2017). So far, only very few studies (Tonkin-Hill et al., 2018; Andrade et al., 2020; Duffy et al., 2016; Kamaliddin et al., 2019) have used the RNA-seq technology to quantify assembled var transcripts in vivo. Moreover, most studies have focused on the role of PfEMP1 in severe pediatric malaria. A consensus from these studies is that severe malaria in children is associated with expression of PfEMP1 with EPCR-binding CIDRα1 domains (Jespersen et al., 2016; Kessler et al., 2017; Storm et al., 2019; Shabani et al., 2017; Mkumbaye et al., 2017; Magallón-Tejada et al., 2016), but elevated expression of dual EPCR and ICAM-1-binding PfEMP1 (Lennartz et al., 2017) and the group A-associated DC5 and DC6 have also been associated with severe disease outcomes (Magallón-Tejada et al., 2016; Avril et al., 2013; Avril et al., 2012; Claessens et al., 2012; Lavstsen et al., 2012; Duffy et al., 2019). Less effort has been put into understanding the role of PfEMP1 in relation to severe disease in adults, and its different symptomatology and higher fatality rate. Two gene expression studies from regions of unstable transmission in India showed elevated expression of EPCR-binding variants (DC8, DC13) and DC6 (Bernabeu et al., 2016; Subudhi et al., 2015), and also of transcripts encoding B- and C-type PfEMP1 in severe cases (Subudhi et al., 2015).
In this study, we applied an improved genome-wide expression profiling approach using RNA-seq to study gene expression, in particular var gene expression, in P. falciparum parasites from hospitalized adult travellers and combined it with a novel prediction analysis of var transcripts from DBLα EST. Individuals were clustered into (i) first-time infected (n = 15) and (ii) pre-exposed (n = 17) individuals on the basis of serological data or into (iii) severe (n = 8) and (iv) non-severe (n = 24) cases according to medical reports. Our multi-dimensional analysis revealed a clear association of domain cassettes with EPCR-binding properties with a naive immune status and severe malaria, whereas CD36-binding PfEMP1 proteins and the conserved var1-3D7 variant were expressed at higher levels in pre-exposed patients and non-severe cases. Interestingly, severe complications occurred only in the group of first-time infected patients, who also tended to be infected for a longer period, indicating that severity of infection in adults is dependent on duration of infection, host immunity, and parasite virulence gene expression.
This study is based on a cohort of 32 adult malaria patients hospitalized in Hamburg, Germany. All patients had fever, indicative of symptomatic malaria. MSP1 genotyping estimated a low number of different parasite genotypes present in the patients (Table 1). For 10 patients, the present malaria episode was their first recorded P. falciparum infection. Nine individuals had previously experienced malaria episodes according to the medical reports, whereas malaria exposure was unknown for 13 patients. In order to determine if patients already had an immune response to P. falciparum antigens, indicative of previous exposure to malaria, plasma samples were analyzed by a Luminex-based assay displaying the antigens AMA1, MSP1, and CSP (Supplementary file 1). Immune responses to AMA1 and MSP1 are known to be long-lasting, and seroconversion to AMA1 is assumed to occur only after a single or very few infections (Drakeley et al., 2005). Principal component analysis (PCA) of the Luminex data resulted in separation of the patients into two discrete groups corresponding to first-time infected adults (‘naive cluster’) and malaria pre-exposed individuals (‘pre-exposed cluster’) (Figure 1A). The 13 patients with an unknown malaria exposure status could be grouped into either the naive or the pre-exposed group defined by the PCA of antigen reactivity. The only outlier in the clustering was a 19-year-old patient (#21) from Sudan, who reported several malaria episodes during childhood, but clustered with the malaria-naive patients.
Plasma samples were further subjected to (i) a merozoite-directed antibody-dependent respiratory burst (mADRB) assay (Kapelski et al., 2014), (ii) a parasitophorous vacuolar membrane-enclosed merozoite structure (PEMS)-specific ELISA, and (iii) a protein microarray with 228 P. falciparum antigens (Borrmann, 2020). Analysis of these serological assays in relation to the patient clustering confirmed the expected higher and broader antigen recognition by ELISA and protein microarray, and the stronger ability to induce burst of neutrophils by serum from the group of malaria-pre-exposed patients (Figure 1B–D, Supplementary file 1). Data from all the serological assays were next used for an unsupervised random forest machine learning approach to build models predictive of each individual’s protective status. A multidimensional scaling plot was used to visualize cluster allocation, confirming the classification of patient #21 as being non-immune (Figure 1E). Patient #26, positioned at the borderline to pre-exposed patients, was grouped into the naive cluster in accordance with the Luminex data and the patient statement that this potentially pre-exposed patient returned from his first trip to Africa. The calculated variable importance highlighted the relevance of mADRB, ELISA, and Luminex assays to allocate patients into clusters (Figure 1F).
Using protein microarrays, the antibody response against described antigens was analyzed in detail. As expected, pre-exposed individuals showed significantly elevated IgG antibody responses against a broad range of parasite antigens, especially typical parasite blood stage markers, including MSP1, MSP2, MSP4, MSP10, EBA175, REX1, and AMA1 (Figure 1D, upper panel). Markers for a recent infection, MSP1, MSP4, GLURP, and ETRAMP5 (van den Hoogen et al., 2018), were significantly elevated in the pre-exposed individuals in comparison to the defined first-time infected group. In addition, further members of the ETRAMP family, including ETRAMP10, ETRAMP14, ETRAMP10.2, and ETRAMP4, and also antibodies against pre-erythrocytic antigens, such as CSP, STARP, and LSA3, were highly elevated. Similar effects were detectable for IgM antibodies; previous exposure to the malaria parasite led to higher antibody levels (Figure 1D, lower panel).
Eight patients from the malaria-naive group were considered as having severe malaria based on the predefined criteria. The remaining 24 cases were assigned to the non-severe malaria group (Figure 1G, Supplementary file 2). Comparing the IgG antibody response of severe and non-severe cases within the previously malaria-naive group, elevated antibody levels were found in the severe subgroup. The highest fold change was observed for antibodies directed against intracellular proteins, such as DnaJ protein, GTPase-activating protein, or heat shock protein 70 (Figure 1—figure supplement 1). Interestingly, IgM antibodies against ETRAMP5 were detectable in the severely infected individuals, suggesting they were infected for a prolonged period of time compared to the mild malaria population (Helb et al., 2015; van den Hoogen et al., 2018).
Parasites were isolated from the venous blood of all patients for subsequent transcriptional profiling (Figure 2A). Transcriptome libraries were sequenced for all 32 patient samples (NCBI BioProject ID: PRJNA679547). The number of trimmed reads ranged between 29,142,684 and 82,000,248 (median: 41,383,289) within the individual libraries derived from patients. The proportion of total reads specific for P. falciparum was 87.7% (median; IQR 76.7–91.3) for the 30 samples included in the de novo assembly (Supplementary file 3). Variation in parasite ages – defined as the progression of the intraerythrocytic development cycle measured in hours post invasion – in the different patient samples was analyzed with a mixture model in accordance with Tonkin-Hill et al., 2018, using the published data from López-Barragán et al., 2011. Parasites from first-time infected and pre-exposed patients revealed no obvious difference in the proportion of the different parasite stages or in median age (Table 1, Figure 2—figure supplement 1). However, a small, statistically not significant bias (p=0.17) towards younger parasites in the severe cases was observed with a median age of 8.2 hours post infection (hpi) (IQR 8.0–9.8) in comparison to 9.8 hpi in the non-severe cases (IQR 8.2–11.4) (Table 1, Figure 2—figure supplement 1D). None of the samples revealed high proportions of late trophozoites (all <3%), schizonts (0%), or gametocytes (all <6%) (Figure 2—figure supplement 1A,B). The estimated proportions were used to control for differences in parasite stages between samples by including them as covariates in the regression analysis of differential core gene expression (Figure 2A, Figure 2—figure supplement 2).
Genome-wide analysis of differential gene expression
Global gene expression analysis according to Tonkin-Hill et al., 2018 identified 420 genes to be higher and 236 to be lower expressed (p≤0.05) in first-time infected patients, together corresponding to 11.3% of all P. falciparum genes (Supplementary file 4). Similarly, 362 genes were significantly higher and 219 genes lower expressed in severe cases (Supplementary file 5). A gene set enrichment analysis (GSEA) of the differentially expressed genes using Gene Ontology (GO) terms and KEGG pathway annotations showed that the KEGG pathway 03410 ‘base excision repair’ facilitating the maintenance of genome integrity by repairing small base lesions in the DNA was expressed at significantly higher levels in first-time infected patient samples (Figure 2B, Figure 2—figure supplement 3). In total, six out of 15 P. falciparum genes included in this KEGG pathway were found to be statistically significantly enriched upon first-time infection, including the putative endonuclease III (PF3D7_0614800) from the short-patch pathway and the putative A-/G-specific adenine glycosylase (PF3D7_1129500), the putative apurinic/apyrimidinic endonuclease Apn1 (PF3D7_1332600), the proliferating cell nuclear antigens 1 (PF3D7_1361900), and the catalytic (PF3D7_1017000) and small (PF3D7_0308000) subunits from the DNA polymerase delta from the long-patch pathway (Figure 2C). Additionally, a significantly lower expression level for genes associated with several GO terms involved in antigenic variation and host cell remodeling was found in first-time infected patients (Figure 2B, Supplementary file 4) and severe cases (Supplementary file 5).
As variant surface antigens like var, rif, and stevor are largely clone-specific, analysis of reads from the clinical isolates mapping to homologous regions in 3D7 genes would be distorted and flawed. Therefore, we analyzed var gene expression first by de novo assembling var genes from the RNA-seq data and subsequently analyzing the expression of specific var gene subsets according to the domains encoded. In addition, we manually screened differentially expressed genes known to be involved in var gene regulation or the correct display of PfEMP1 at the host cell surface (Supplementary file 4, 5).
Differential var gene expression
To correlate individual var genes or common var gene-encoded traits (Figure 3) with a naive immune status or disease severity, differential var transcript levels were analyzed as in Tonkin-Hill et al., 2018;Figure 2A, Figure 2—figure supplement 2. Var transcripts were assembled from each patient sample separately and annotated. In total, 6441 contigs with over 500 bp lengths were generated with an N50 of 2302 bp and a maximum length of 10,412 bp (Source data 1.). A median of 200 contigs (IQR: 137–279) with lengths >500 bp was assembled per sample. One or more DBL or CIDR domains could be annotated to 5488 of the contigs, whereas the remaining contigs could only be annotated by these smaller domain building blocks, the so-called homology blocks as defined by Rask et al., 2010; Supplementary file 6, 7.
Differential var transcript levels
We first looked for highly similar transcripts present in multiple samples. The Salmon RNA-seq quantification pipeline (Patro et al., 2017), which identifies equivalence classes allowing reads to contribute to the expression estimates of multiple transcripts, was used to estimate expression levels for each transcript. Due to the high diversity in var genes, mainly assembled transcripts of the strain-transcendent variants var1, var2csa, and var3 were found to be differentially expressed. Notably, the var1-IT variant was expressed at higher levels in parasites from first-time infected patients, whereas the var1-3D7 variant was expressed at higher levels in parasites from pre-exposed and non-severe patients (Figure 4A,B, Figure 5A,B, Supplementary file 9, 10). This was confirmed by mapping normalized reads from all patients to both var1 variants as well as var2csa (Figure 3—figure supplement 1). Beyond the conserved variants, several var fragments from B- or C-type var genes were associated with a naive immune status, and three transcripts from A-, DC8-, and B-type var genes as well as var2csa were linked to severe malaria patients (Figures 4 and 5, Supplementary file 9, 10).
Differential var domain transcript levels
To assess the differential expression of specific var domains, read counts corresponding to domains with the same classification were calculated. This showed that different EPCR-binding CIDRα1 domain variants and other domains found in DC with CIDRα1 domains were expressed at significantly higher levels in first-time infected patients (Figure 4C–E, Supplementary file 9). Specifically, besides domains from DC8 (DBLα2, CIDRα1.1, DBLβ12) and DC13 (DBLα1.7, CIDRα1.4), the CIDRα1.7 and DBLα1.2 (DC15) domains were increased upon infection of malaria-naive individuals. The DBLα1.2 domain was, in all of the 32 gene assemblies, flanked by an EPCR-binding CIDRα1 domain, and 56% of these were a CIDRα1.5 domain (Supplementary file 6, 7). In addition, parasites from first-time infected patients showed a significantly higher level of transcripts encoding the CIDRδ1 domain of DC16 (DBLα1.5/6-CIDRδ1/2) and the DBLβ6 domain (Figure 4D–E). The DBLβ6 domain is associated with A-type var genes and can be found adjacent to DC15 and DC16 (Otto et al., 2019; Supplementary file 6, 7). In general, domains associated with the same domain cassette showed the same trend even if some domains did not reach statistical significance set at p<0.05 (Supplementary file 9).
Domains found expressed at lower levels in malaria-naive individuals included group B and C N-terminal head structure domains NTSB and DBLα0.13/22/23, and CD36-binding CIDR domains (CIDRα2.8/9,6) as well as the C-terminal CIDRγ11 domain and domains of the var1-3D7 variant (DBLα1.4, DBLγ15, DBLε5) (Figure 4C–E).
When comparing severe to the non-severe cases, domains of DC8 (DBLα2, CIDRα1.1, DBLβ12), DC15 (DBLα1.2), and DC16 (DBLα1.6) and the A-type-linked DBLβ6 were found associated with severe disease. Domain types expressed at significantly higher levels in non-severe cases included N-terminal head structure domains, DBLα0.23, CIDRα2.4/9 from group B and C PfEMP1, DC16 (DBLα1.5), and the CIDRα1.3 domain from the var1-3D7 allele (Figure 5C–E, Supplementary file 10).
As DBL and CIDR subclasses are poorly defined (Otto et al., 2019) and different domain subclasses confer the same binding phenotype (Higgins and Carrington, 2014), the domains of the N-terminal head structure were grouped according to their binding phenotype and the normalized read counts (transcripts per million [TPM]) were summarized for each patient (Figures 4F and 5F). This showed significant differences for domains associated with EPCR- or CD36-binding PfEMP1. As expected, domains associated with EPCR-binding as well as the CIDRγ3 domain were expressed at higher levels in naive and more severe cases, whereas domains associated with the CD36-binding phenotype were higher expressed in pre-exposed and non-severe patients.
Differential var homology block transcript levels
PfEMP1 domains have been described as composed of 628 homology blocks (Rask et al., 2010). Homology block expression levels were obtained by aggregating read counts for each block after first identifying all occurrences of the block within the assembled contigs. Transcripts encoding block numbers 255, 584, and 614, all typically located within DBLβ domains of DC8- and CIDRα1-containing type A PfEMP1 (Figure 4G,H, Supplementary file 9), number 557, located in the interdomain region between DBLβ and DBLγ domains (no PfEMP1 type association), and block number 155, found in NTSA, were found associated with a naive immune status. Conversely, transcripts encoding block number 88 from DBLα0 domains and 269 from ATSB were found at lower levels in malaria-naive patients, indicating that B- and C-type genes are more frequently expressed in pre-exposed individuals (Figure 4G,H, Supplementary file 9). No homology blocks were associated with severe cases, but two blocks, 591 and 559, found in group B PfEMP1, were found to be lower expressed in severe malaria cases (Figure 5G,H, Supplementary file 10).
Var expression profiling by DBLα-tag sequencing
To supplement the RNA-seq analysis with an orthogonal analysis, we conducted deep sequencing of RT-PCR-amplified DBLα ESTs from 30 of the patient samples (Lavstsen et al., 2012; Figure 2A). Between 851 and 3368 reads with a median of 1666 over all samples were analyzed. Identical DBLα-tag sequences were clustered to generate relative expression levels of each unique var gene tag. Overall, the relative expression levels were similar for sequences found in both the RNA-seq and the DBLα-tag approach with a mean log2(DBLα-PCR/RNA-seq) ratio of 0.4 (CI of 95%: −2.5–3.3) determined by Bland-Altman plotting (Figure 6—figure supplement 1). Around 82.6% (median) of all unique DBLα-tag sequences detected with >10 reads (92.9% of all DBLα-tag sequences) were found in the RNA-seq approach, and 81.8% of the upper 75th percentile of RNA-seq contigs (spanning across the DBLα-tag region) were found by the DBLα-tag approach.
Using the Varia tool (Mackenzie et al., 2020), the domain composition of the var genes from which the DBLα-tag sequences originated was predicted. The tool searches an extended varDB containing >200K annotated var genes for genes with near identical DBLα-tag sequences and returns the consensus domain annotation among the hit sequences. A partial domain annotation was made for ~85% of the DBLα-tag sequences (Figure 2A, Supplementary file 11). In line with the RNA-seq data, this analysis showed that DBLα1 and DBLα2 sequences were enriched in first-time infected and severe malaria patients. Conversely, a significant higher proportion of DBLα0 sequences was found in pre-exposed individuals and mild cases (Figure 6A,B). No difference was observed in the number of reads or unique DBLα-tags detected between patient groups, although a trend towards more DBLα-tag clusters was observed in first-time infected patients and severe cases (Figure 6A,B). Prediction of the NTS and CIDR domains flanking the DBLα domain showed a significantly higher proportion of NTSA in severe cases as well as EPCR-binding CIDRα1 domains in first-time infected and severe cases. Expression of var genes encoding NTSB and CIDRα2–6 domains was significantly associated within pre-exposed and non-severe cases (Figure 6A,B). Subsequent analysis of var expression in relation to other domains showed that var transcripts with DBLβ, γ, and ζ or CIDRγ domains were more frequently expressed in first-time infected and severe malaria patients whereas those encoding DBLδ and CIDRβ were less frequent (Figure 6—figure supplement 2). Assessing expression in relation to the domain subtype, CIDRα1.1/5, DBLβ12, DBLγ2/12, DBLα2, DBLα1.2/2, and DBLδ5 (together with DBLγ12 components of the DC5) were found associated with severe malaria, while CIDRα3.1/4, DBLα0.12/16, and DBLδ1 were associated with non-severe cases (Figure 6—figure supplement 2).
Overall, these data corroborate the main observations from the RNA-seq analysis, confirming the association of EPCR-binding PfEMP1 variants with development of severe malaria symptoms and CD36-binding PfEMP1 variants with establishment of less severe infections in semi-immune individuals.
Correlation of var gene expression with antibody levels against head structure CIDR domains
A detailed analysis of the antibody repertoire of the patients against head structure CIDR domains of PfEMP1 was carried out using a panel of 19 different EPCR-binding CIDRα1 domains, 12 CD36-binding CIDRα2–6 domains, three CIDRδ1 domains as well as a single CIDRγ3 domain (Obeng-Adjei et al., 2020; Bachmann et al., 2019). Additionally, the minimal binding region of VAR2CSA was included. In general, plasma samples from malaria-naive as well as severe cases showed lower mean fluorescence intensity (MFI) values for all antigens tested in comparison to samples from pre-exposed or non-severe cases with significant differences for CIDRα2–6, CIDRδ1, and CIDRγ3, but not for EPCR-binding CIDRα1 domains (Figure 7A,B).
Data were also analyzed using the average MFI reactivity (plus two standard deviations) of a Danish control cohort as a cut off for seropositivity to calculate the coverage of antigen recognition (Supplementary file 1; Cham et al., 2010). In this analysis, almost half of the tested antigens were recognized by pre-exposed (median: 47.2%) and non-severe patients (median: 44.4%), but only 1/4 of the antigens were recognized by first-time infected patients (median: 25.0%) and 1/20 by severely ill patients (median: 5.6%). PfEMP1 antigens recognized by over 60% of the pre-exposed and/or non-severe patient sera were (i) four CIDRα1 domains capable of EPCR-binding (CIDRα1.5, CIDRα1.6, CIDRα1.7, and the DC8 domain CIDRα1.8), (ii) two CD36-binding CIDRα domains (CIDRα2.10, CIDRα3.1), and (iii) two CIDR domains with unknown binding phenotypes (CIDRδ1 and CIDRγ3) (Figure 7C, Supplementary file 1).
Taken together, this analysis indicates that higher levels of antibodies against severe malaria-associated EPCR-binding variants are present in pre-exposed and non-severe cases, which might have selected against parasites expressing CIDRα1 domains during the current infection.
Non-immune travellers and adults from areas of unstable malaria transmission are prone to severe malaria. Currently, only scarce information on the PfEMP1-mediated pathogenicity responsible for the different symptomatology in comparison to paediatric severe malaria and the higher fatality rate in adults is available. Here, we present the first in-depth gene expression analysis of 32 ex vivo blood samples from adult travellers using RNA-seq and expressed sequence tag analyses. Despite the relatively low number of patient samples recruited in 5 years, our data confirmed previously reported associations between transcripts encoding type A and B EPCR-binding PfEMP1 and infections in naive hosts and disease severity (Duffy et al., 2019; Tonkin-Hill et al., 2018; Kessler et al., 2017; Bernabeu et al., 2016; Jespersen et al., 2016; Lavstsen et al., 2012). Our results further suggest that parasite interaction with EPCR is linked to severe disease in children as well as in adults. However, since CIDRα1-containing PfEMP1 possess multiple binding traits (Lennartz et al., 2017; Magallón-Tejada et al., 2016), co-interaction with other receptors may further increase the risk for severe malaria.
Overall, there was a high degree of consensus between observations made on var expression analyzed at different levels of PfEMP1 domain annotation. Stratifying var gene expression according to the different main and subtypes of DBL and CIDR domains showed only A- and DC8-type PfEMP1 domains, and predominantly those linked to EPCR-binding PfEMP1, to be associated with first-time infections. Conversely, domains typical for CD36-binding PfEMP1 proteins were found at higher levels in malaria-experienced adults. Specifically, expression of PfEMP1 domains included in DC8, DC13, and DC15 as well as all EPCR-binding CIDRα1 domains was associated with first-time adult infections, whereas DBLα0- and CD36-binding CIDRα2–6 domains were linked to pre-exposed individuals. These differences were largely due to the differential expression between the first-time infected patients with more severe symptoms and patients with non-severe malaria. Here, domains of DC8 and DC15 as well as all DBLα1/2 and CIDRα1were associated with severe symptoms, while NTSB, DBLα0, and CIDRα2–6 domains, including specific subsets of CIDRα2, were linked to non-severe symptoms. These conclusions were closely mirrored in the DBLα-tag analysis and were further corroborated by the differential RNA-seq expression stratified according to the smaller homology blocks, which identified mainly homology blocks of DBLβ1, -3, -5, and -12 to be associated with first-time infected patients. These DBLβ domains are parts of DCs associated with EPCR binding; so it is hard to distinguish between co-occurring domains and clear associations.
In addition, three other group A PfEMP1-associated domains, CIDRγ3, CIDRδ from DC16, DBLβ9 from DC5, and DBLβ6 were found to be associated with first-time infected patients. DBLβ9 and DBLβ6 could have been detected due to their presence C-terminally to some EPCR-binding or A-type PfEMP1. However, the CIDRδ domain of DC16 (DBLα1.5/6-CIDRδ1/2) constitutes a different subset of A-type PfEMP1, which, together with CIDRβ2- and CIDRγ3-containing group A PfEMP1 (found in DC11), may be associated with rosetting (Carlson et al., 1990; Ghumra et al., 2012). Direct evidence that any of these CIDR domains have intrinsic rosetting properties is lacking (Rowe et al., 2002). Rather, their association with rosetting may be related to their tandem expression with DBLα1 at the N-terminal head (Ghumra et al., 2012). The DC16 group A signature was not associated with severe disease outcomes in previous DBLα-tag studies or qPCR studies by Lavstsen et al., 2012 and Bernabeu et al., 2016, but DBLα1.5/6 and CIDRδ of DC16 were enriched in cerebral malaria cases with retinopathy in the study of Shabani et al., 2017 and Kessler et al., 2017 using the same qPCR primer set. Also, the association of DC11 with severe malaria in Indonesia was found using the same RNA-seq approach as used here (Tonkin-Hill et al., 2018). Rosetting is thought to enhance microvascular obstruction, but its role in severe malaria pathogenesis remains unclear (McQuaid and Rowe, 2020). However, together with previous observations, our data suggest that paediatric cerebral malaria infections are dominated by the expansion of parasites expressing EPCR-binding domains accompanied by parasites expressing other group A PfEMP1, possibly acting as rosetting variants. This was consistent with the more frequent recognition of CIDRα1 domains than CIDRα2–6 domains in pre-exposed patients, indicating that IgG against EPCR-binding CIDR domains were acquired before IgG to other CIDR domains, as has been observed for malaria endemic populations (Obeng-Adjei et al., 2020; Cham et al., 2009; Cham et al., 2010; Turner et al., 2015).
Data from adult cohorts are rather limited, restricting our comparison mainly to three var gene expression studies based on qPCR or a custom cross-strain microarray (Argy et al., 2017; Bernabeu et al., 2016; Subudhi et al., 2015), but the majority of cases from the Indonesian RNA-seq study are also adults (Tonkin-Hill et al., 2018). A high expression of A- and B-type var genes and an association of DC4, -8, and -13 with disease severity has been reported for malaria cases imported to France (Argy et al., 2017), and parasites from severe Indian adults show an elevated expression of DC6, -8 (Bernabeu et al., 2016), and DC13 (Subudhi et al., 2015). In the severe cases from Indonesia, mainly the DCs 4, 8, and 11 were found on an elevated level (Tonkin-Hill et al., 2018). All studies are in agreement with the expression data from our cohort of adult travellers, although we here in addition found a higher expression of DC15 (EPCR binding) and DC16 (putative rosetting variants) var genes in malaria-naive patients.
Measurements of the total abundances of the different var groups are challenging (e.g., due differences in the coverage or sensitivity of qPCR or DBLα-tag primer pairs targeting the different var groups), but there is cumulative evidence that CIDRα1-containing transcripts are dominating the infection in children with severe malaria, severe anemia, and cerebral malaria, and transcripts with CIDRα2–6 domains are most abundantly expressed during uncomplicated malaria (Jespersen et al., 2016; Duffy et al., 2019; Warimwe et al., 2009; Warimwe et al., 2012). Although the median expression of CIDRα2–6 is lower in first-time infected and severe cases compared to pre-exposed and non-severe cases, in most of our adult patients, CD36-binding var transcripts appear to dominate the expression pattern. This is in concordance with all three other adult studies also indicating a substantial expression of B- and C-type variants associated with the binding of CD36 (Argy et al., 2017; Bernabeu et al., 2016; Subudhi et al., 2015), and (Subudhi et al., 2015) even showed an association with complicated adult malaria. Maybe parasite binding to CD36 is specifically enhanced in adult severe malaria cases compared to children, which is interesting due to their different disease symptomatology (Dondorp et al., 2008; Schwartz et al., 2001). Alternatively, our adult cohort differs not only in age but also in terms of disease severity from paediatric cohorts, and less sick patients may simply have a less dominant expression of EPCR-binding variants. However, for the parasite’s survival and transmission, it may be highly beneficial to express more of the less virulent PfEMP1 variants that are able to bind CD36. This interaction may not, or is less likely to, result in obstruction of blood flow, inflammation, and organ failure at least of the brain, where CD36 is nearly absent (Turner et al., 1994).
To the best of our knowledge, this study is the first description of expression differences between the two var1 variants, 3D7 and IT. The var1-IT variant was found enriched in parasites from first-time infected patients, whereas several transcripts of the var1-3D7 variant were increased in pre-exposed and non-severely ill patients. Expression of var1 gene was previously observed to be elevated in malaria cases imported to France with an uncomplicated disease phenotype (Argy et al., 2017). In general, the var1 subfamily is ubiquitously transcribed (Winter et al., 2003; Duffy et al., 2006), atypically late in the cell cycle after transcription of var genes encoding the adhesion phenotype (Kyes et al., 2003; Duffy et al., 2002; Dahlbäck et al., 2007) and is annotated as a pseudogene in 3D7 due to a premature stop codon. Similarly, numerous isolates display frame-shift mutations often in exon two in the full gene sequences (Rask et al., 2010). However, none of these studies addressed the differences in the two var1 variants that were recently identified by comparing var gene sequences from 714 P. falciparum genomes (Otto et al., 2019), and to date, it is still unclear if both variants fulfill the same function or have the same characteristics as previously described. Overall, the var1 gene – the first 3.2 kb of the 3D7 variant in particular – seems to be under high evolutionary pressure (Otto et al., 2019), and both variants can be traced back before the split of P. reichenowi from P. praefalciparum and P. falciparum (Otto et al., 2018b). Our data indicate that the two variants, VAR1-3D7 and VAR1-IT, may have different roles during disease; however, this remains to be determined in future studies.
In summary, our data show a significant increase in transcripts encoding EPCR-binding and other A-type variants in parasites from severe and first-time infected patients. Conversely, transcripts of CD36-binding variants are found more frequently in parasites from non-severe and pre-exposed patients. Since CD36-binding variants are still overrepresented in all groups of adult malaria patients, we postulate that the parasite population in first-time infected individuals may have a broad binding potential after liver release as there is no pre-existing immunity to clear previously experienced PfEMP1 variants. During the blood-stage infection, selection towards EPCR binding and other A-type variants, which may confer a parasite growth advantage and also increase the risk for severe malaria, may already have occurred in our adult severe malaria patients as indicated by the longer period of infection.
Materials and methods
Blood sampling and processingRequest a detailed protocol
EDTA blood samples (1–30 mL) were obtained from the adult patients. The plasma was separated by centrifugation and immediately stored at −20°C. Erythrocytes were isolated by Ficoll gradient centrifugation followed by filtration through Plasmodipur filters (EuroProxima) to clear the remaining granulocytes. An aliquot of erythrocytes (about 50–100 µl) was separated and further processed for gDNA purification. At least 400 µl of purified erythrocytes were rapidly lysed in five volumes of pre-warmed TRIzol (ThermoFisher Scientific) and stored at −80°C until further processing.
Luminex assayRequest a detailed protocol
The Luminex assay was conducted as previously described using the same plex of antigens tested (Bachmann et al., 2019). In brief, plasma samples from patients were screened for the individual recognition of 19 different CIDRα1, 12 CIDRα2–6, three CIDRδ1 domains, and a single CIDRγ3 domain as well as of the controls AMA1, MSP1, CSP, VAR2CSA (VAR2), tetanus toxin (TetTox), and bovine serum albumin (BSA). The data are shown as MFI, allowing comparison between different plasma samples, but not between different antigens. Alternatively, the breadth of antibody recognition (%) was calculated using MFI values from Danish controls plus two standard deviations (SD) as cut off.
Merozoite-triggered antibody-dependent respiratory burstRequest a detailed protocol
The assay to determine the mADRB activity of the patients was set up as described before (Kapelski et al., 2014). Polymorphonuclear neutrophil granulocytes (PMNs) from one healthy volunteer were isolated by a combination of dextran-sedimentation and Ficoll-gradient centrifugation. Meanwhile, 1.25 × 106 merozoites were incubated with 50 µl of 1:5 diluted plasma (decomplemented) from adult patients as well as from established negative and positive control donors for 2 hr. The opsonized merozoites were pelleted (20 min, 1500 g), re-suspended in 25 µl Hanks' Balanced Salt Solution (HBSS), and then transferred to a previously blocked well of an opaque 96-well high-binding plate (Greiner Bio-One). Chemiluminescence was detected in HBSS using 83.3 µM luminol and 1.5 × 105 PMNs at 37°C for 1 hr to characterize the PMN response, with readings taken at 2-min intervals using a multiplate reader (CLARIOstar, BMG Labtech). PMNs were added in the dark, immediately before readings were initiated.
ELISARequest a detailed protocol
Antibody reactivity against PEMS was estimated by ELISA. PEMS were isolated as described before (Llewellyn et al., 2015). For ELISA, 0.625 × 105 PEMS were coated on the ELISA plates in PBS. Plates were blocked using 1% Casein (Thermo Scientific #37528) and incubated for 2 hr at 37°C. After washing using phosphate-buffered saline (PBS)/0.1% Tween, plasma samples from patients and control donors were added at two-fold dilutions of 1:200 to 1:12,800 in PBS/0.1% casein. The samples were incubated for 2 hr at room temperature (RT). IgG was quantified using HRP-conjugated goat anti-human IgG at a dilution of 1:20,000 and incubated for 1 hr. For the colour reaction, 50 µl of TMB substrate was used and stopped by adding 1 M HCl after 20 min. Absorbance was quantified at 450 nm using a multiplate reader (CLARIOstar, BMG Labtech).
Protein microarrayRequest a detailed protocol
Microarrays were produced at the University of California Irvine, Irvine, California, USA (Doolan et al., 2008). In total, 262 P. falciparum proteins representing 228 unique antigens were expressed using an Escherichia coli (E. coli) lysate in vitro expression system and spotted on a 16-pad ONCYTE AVID slide. The selected P. falciparum antigens are known to frequently provide a positive signal when tested with sera from individuals with sterile and naturally acquired immunity against the parasite (Obiero et al., 2019; Dent et al., 2016; Doolan et al., 2008; Felgner et al., 2013). For the detection of binding antibodies, secondary IgG antibody (goat anti-human IgG QDot800; Grace Bio-Labs #110635), secondary IgM antibody (biotin-SP-conjugated goat anti-human IgM; Jackson ImmunoResearch #109-065-043), and Qdot585 Streptavidin Conjugate (Invitrogen #Q10111MP) were used (Taghavian et al., 2018).
Study serum samples as well as the positive and European control sera were diluted 1:50 in 0.05X Super G Blocking Buffer (Grace Bio-Labs, Inc) containing 10% E. coli lysate (GenScript, Piscataway, NJ) and incubated for 30 min on a shaker at RT. Meanwhile, microarray slides were rehydrated using 0.05X Super G Blocking buffer at RT. Rehydration buffer was subsequently removed and samples added onto the slides. Arrays were incubated overnight at 4°C on a shaker (180 rpm). Serum samples were removed the following day and microarrays were washed using 1X TBST buffer (Grace Bio-Labs, Inc). Secondary antibodies were then applied at a dilution of 1:200 and incubated for 2 hr at RT on the shaker, followed by another washing step and a 1 hr incubation in a 1:250 dilution of Qdot585 Streptavidin Conjugate. After a final washing step, slides were dried by centrifugation at 500 g for 10 min. Slide images were taken using the ArrayCAM Imaging System (Grace Bio-Labs) and the ArrayCAM 400 s Microarray Imager Software.
Microarray data were analyzed in R statistical software package version 3.6.2. All images were manually checked for any noise signal. Each antigen spot signal was corrected for local background reactivity by applying a normal-exponential convolution model (McGee and Chen, 2006) using the RMA-75 algorithm for parameter estimation (available in the LIMMA package v3.28.14) (Silver et al., 2009). Data was log2-transformed and further normalized by subtraction of the median signal intensity of mock expression spots on the particular array to correct for background activity of antibodies binding to E. coli lysate. After log2 transformation, data approached normal distribution. Differential antibody levels (protein array signal) in the different patient groups were determined by Welch-corrected Student’s t-test. Antigens with p<0.05 and a fold change >2 of mean signal intensities were defined as differentially recognized between the tested sample groups. Volcano plots were generated using the PAA package (Turewicz et al., 2016) and GraphPad Prism 8. Individual antibody breadths were defined as the number of seropositive features with signal intensities exceeding an antigen-specific threshold set at six standard deviations above the mean intensity in negative control samples.
Unsupervised random forest modelRequest a detailed protocol
An unsupervised random forest (RF) model, a machine learning method based on multiple classification and regression trees, was calculated to estimate proximity between patients. Variable importance was calculated, which shows the decrease in prediction accuracy if values of a variable are permuted randomly. The k-medoids clustering method was applied on the proximity matrix to group patients according to their serological profile. Input data for random forest were Luminex measurements for MSP1, AMA1, and CSP reduced by principal component analysis (PCA; first principal component selected), mADRB, and ELISA, and antibody breadths of IgG and IgM determined by protein microarray were used to fit the RF model. Multidimensional scaling was used to display patient cluster. All analyses were done with R (4.02) using the packages randomForest (4.6–14) to run RF models and cluster (2.1.0) for k-medoids clustering.
Patient classification according to severityRequest a detailed protocol
Severity was defined in line with the WHO criteria for severe malaria in adults (World Health Organization (WHO), 2014). Patients were considered as having severe malaria if they showed signs of impaired organ function (e.g., jaundice, renal failure, and cerebral manifestations) or had extremely high parasitaemia (>10%). In addition, patients #1 and #26 were included into the severe group due to circulating schizonts indicative of a very high sequestering parasite biomass associated with severity (Bernabeu et al., 2016; Supplementary file 2).
DNA purification and MSP1 genotypingRequest a detailed protocol
Genomic DNA was isolated using the QIAamp DNA Mini Kit (Qiagen) according to the manufacturer’s protocol. To assess the number of P. falciparum genotypes present in the patient isolates, MSP1 genotyping was carried out as described elsewhere (Robert et al., 1996).
RNA extraction, RNA-seq library preparation, and sequencingRequest a detailed protocol
TRIzol samples were thawed, mixed rigorously with 0.2 volumes of cold chloroform,and incubated for 3 min at RT. After centrifugation for 30 min at 4°C and the maximum speed, the supernatant was carefully transferred to a new tube and mixed with an equal volume of 70% ethanol. Subsequently, the manufacturer’s instructions from the RNeasy MinElute Kit (Qiagen) were followed with DNase digestion (DNase I, Qiagen) for 30 min on column. Elution of the RNA was carried out in 14 µl. Human globin mRNA was depleted from all samples except from samples #1 and #2 using the GLOBINclear kit (ThermoFisher Scientific). The quality of the RNA was assessed using the Agilent 6000 Pico kit with Bioanalyzer 2100 (Agilent), and the RNA quantity was assessed using the Qubit RNA HA assay kit and a Qubit 3.0 fluorometer (ThermoFisher Scientific). Upon arrival at BGI Genomics Co (Hong Kong), the RNA quality of each sample was double-checked before sequencing. The median RIN value over all ex vivo samples was 6.75 (IQR: 5.93–7.40) (Figure 2—figure supplement 4), although this measurement has only limited significance for samples containing RNA of two species. Customized library construction in accordance with Tonkin-Hill et al., 2018, including amplification with KAPA polymerase and HiSeq 2500 100 bp paired-end sequencing, was also performed by BGI Genomics Co (Hong Kong).
RNA-seq read mapping and data analysis
Differential expression of core genesRequest a detailed protocol
Differential gene expression analysis of P. falciparum core genes was done in accordance with Tonkin-Hill et al., 2018), using the scripts available in the GitHub repository (https://github.com/gtonkinhill/falciparum_transcriptome_manuscript/tree/master/all_gene_analysis). In brief, subread-align v1.4.6 (Liao et al., 2013) was used to align the reads to the Homo sapiens (H. sapiens) and P. falciparum reference genomes. Read counts for each gene were obtained with FeatureCounts v1.20.2 (Liao et al., 2014). To account for parasite life cycle, each sample was considered as a composition of six parasite life cycle stages, excluding the ookinete stage (López-Barragán et al., 2011). Unwanted variations were determined with the 'RUV' (Remove Unwanted Variation) algorithm implemented in the R package ruv v0.9.6 (Gagnon-Bartsch and Speed, 2012) adjusting for systematic errors of unknown origin by using the genes with the 1009 lowest p-values as controls as described in Vignali et al., 2011. The gene counts and estimated ring-stage factor, and factors of unwanted variation were then used as inputs for the Limma/Voom (Law et al., 2014; Smyth, 2005) differential analysis pipeline.
Functional enrichment analysis of differentially expressed core genesRequest a detailed protocol
Genes that were identified as significantly differentially expressed (defined as −1 < logFC > 1, p<0.05) during prior differential gene expression analysis were used for functional enrichment analysis using the R package gprofiler2 (Kolberg et al., 2020). Enrichment analysis was performed on multiple input lists containing genes expressed significantly higher (logFC >1, p<0.05) and lower (logFC <-1, p<0.05) between different patient cohorts. All var genes were excluded from the enrichment analysis. For custom visualization of results, gene set data sources available for P. falciparum were downloaded from gprofiler (Raudvere et al., 2019). Pathway data available in the KEGG database (https://www.kegg.jp/kegg/) was accessed via the KEGG API using KEGGREST (Tenenbaum, 2020) to supplement gprofiler data sources and build a custom data source in Gene Matrix Transposed file format (*.gmt) for subsequent visualization. Functional enrichment results were then output to a Generic Enrichment Map (GEM) for visualization using the Cytoscape EnrichmentMap app (Merico et al., 2010) and RCy3 (Gustavsen et al., 2019). Bar plots of differential gene expression values for genes of selected KEGG pathways were generated using ggplot2 (Wickham, 2016) and enriched KEGG pathways were visualized using KEGGprofile (Zhao et al., 2020).
Var gene assemblyRequest a detailed protocol
Samples from patient #1 and #2 not subjected to globin-mRNA depletion due to their low RNA content after multiple rounds of DNase treatment showed low percentages of P. falciparum-specific reads (12.4% and 15.68%) (Supplementary file 2). Consequently, less than one million P. falciparum reads were obtained for each of these samples, and they were omitted from assembly due to low coverage, but included in the differential gene expression analysis.
Var genes were assembled using the pipeline described in Tonkin-Hill et al., 2018. The separate assembly approach was chosen since it reduces the risk for generating false chimeric genes and results in longer contigs compared to the combined all sample assembly approach. Briefly, non-var reads were first filtered out by removing reads that aligned to H. sapiens, P. vivax, or non-var P. falciparum. Assembly of the remaining reads was then performed using a pipeline combining SOAPdenovo-Trans and Cap3 (Xie et al., 2014; Huang and Madan, 1999; Liao et al., 2013). Finally, contaminants were removed from the resulting contigs and they were then translated into the correct reading frame. Reads were mapped to the contigs using BWA-MEM (Li, 2013) and RPKM values were calculated for each var transcript to compare individual transcript levels in each patient. Although transcripts might be differentially covered by RNA-seq due to their variable GC content, this seems not to be an issue between var genes (Tonkin-Hill et al., 2018).
Var transcript differential expressionRequest a detailed protocol
Expression of the assembled var genes was quantified using Salmon v0.14.1 (Patro et al., 2017) for 531 transcripts with five read counts in at least three patient isolates. Both the naive and pre-exposed groups as well as the severe and non-severe groups were compared. The combined set of all de novo assembled transcripts was used as a reference. As the RNA-seq reads from each sample were assembled independently it is possible for a highly similar transcript to be present multiple times in the combined set of transcripts from all samples. The Salmon algorithm identifies equivalence sets between transcripts, allowing a single read to support the expression of multiple transcripts. As a result, Salmon accounts for the redundancy present in our whole set of var gene contigs from all separate sample-specific assemblies. To confirm the suitability of this approach, we also ran the Corset algorithm as used in Tonkin-Hill et al., 2018 and Davidson and Oshlack, 2014. Unlike Salmon, which attempts to quantify the expression of transcripts themselves, Corset copes with the redundancy present in de novo transcriptome assemblies by clustering similar transcripts together using both the sequence identity of the transcripts as well as multi-mapping read alignments. Of the transcripts identified using the Salmon analysis, 5/15 in the naive versus pre-exposed and 4/13 in the severe versus non-severe were identified in the significant clusters produced using Corset. As the two algorithms take very different approaches and as Salmon is quantifying transcripts rather than the ‘gene’ like clusters of Corset, this represents a fairly reasonable level of concordance between the two methods. However, due to the high diversity in var genes, both of these approaches are only able to identify significant associations between transcripts and phenotypes when there is sufficient similarity within the associated sequences. In both the Salmon and Corset pipelines, differential expression analysis of the resulting var expression values was performed using DESeq2 v1.26 (Love et al., 2014). The Benjamini-Hochberg method was used to control for multiple testing (Benjamini and Hochberg, 1995).
To check the differential expression of the conserved var gene variants var1-3D7, var1-IT, and var2csa, raw reads were mapped with BWA-MEM (AS score >110) to the reference genes from the 3D7 and the IT strains. The mapped raw read counts (bam files) were normalized with the number of 3D7 mappable reads in each isolate using bamCoverage by introducing a scaling factor to generate bigwig files displayed in Artemis (Carver et al., 2012).
Var domain and homology block differential expressionRequest a detailed protocol
Differential expression analysis at the domain and homology level was performed using a similar approach to that described previously (Tonkin-Hill et al., 2018). Initially, the domain families and homology blocks defined in Rask et al. were annotated to the assembled transcripts using HMMER v3.1b2 (Rask et al., 2010; Eddy, 2011). Domains and homology blocks previously identified to be significantly associated with severe disease in Tonkin-Hill et al., 2018) were also annotated by single pairwise comparison in the assembled transcripts using USEARCH v11.0.667 (Tonkin-Hill et al., 2018; Edgar, 2010). Overall, 336 contigs (5.22% of all contigs > 500 bp) possess partial domains in an unusual order, for example, an NTS in an internal region or a tandem arrangement of two DBLα or CIDRα domains. This might be caused by de novo assembly errors, which is challenging from transcriptome data. Therefore, in both cases, the domain or homology block with the most significant alignment was taken as the best annotation for each region of the assembled var transcripts (E-value cut off of 1e-8). The expression at each of these annotations was then quantified using featureCounts v1.6.4 before the counts were aggregated to give a total for each domain and homology block family in each sample. Finally, similar to the transcript level analysis, DESeq2 was used to test for differences in expression levels of both domain and homology block families in the naive versus pre-exposed groups as well as the severe versus non-severe groups. Again, more than five read counts in at least three patient isolates were required for inclusion into differential expression analysis.
DBLα-tag sequencingRequest a detailed protocol
For DBLα-tag PCR, the forward primer varF_dg2 (5’-tcgtcggcagcgtcagatgtgtataagagacagGCAMGMAGTTTYGCNGATATWGG-3’) and the reverse primer brlong2 (5’-gtctcgtgggctcggagatgtgtataagagacagTCTTCDSYCCATTCVTCRAACCA-3’) were used resulting in an amplicon size of 350–500 bp (median 422 bp) plus the 67 bp overhang (small type). Template cDNA (1 µl) was mixed with 5x KAPA HiFi buffer, 0.3 µM of each dNTP, 2 µM of each primer and 0.5 U KAPA HiFi Hotstart Polymerase in a final reaction volume of 25 µl. Reaction mixtures were incubated at 95°C for 2 min and then subjected to 35 cycles of 98°C for 20 s, 54°C for 30 s, and 68°C for 75 s with a final elongation step at 72°C for 2 min. For the first five cycles, cooling from denaturation temperature was performed to 65°C at a maximal ramp of 3°C per second, then cooled to 54°C with a 0.5°C per second ramp. Heating from annealing temperature to elongation temperature was performed with 1°C per second, all other steps with a ramp of 3°C per second. Agarose gel images taken afterwards showed clean amplicons. The DBLα-tag primers contain an overhang, which was used to conduct a second indexing PCR reaction using sample-specific indexing primers as described in Nag et al., 2017. The overhang sequence also serves as annealing site for Illumina sequencing primers, and indexing primers include individual 8-base combinations and adapter sequences that will allow the final PCR product to bind in MiSeq Illumina sequencing flow cells. Indexing PCR reactions were performed with a final primer concentration of 0.065 μM and 1 μl of first PCR amplicon in a final volume of 20 μl using the following steps: heat activation at 95°C, 15 min, 20 cycles of 95°C for 20 s, 60°C for 1 min and 72°C for 1 min, and one final elongation step at 72°C for 10 min. Indexing PCR amplicons were pooled (4 µl of each) and purified using AMPure XP beads (Beckman Coulter, California, United States) according to the manufacturer’s protocol, using 200 µl pooled PCR product and 0.6 x PCR-pool volume of beads, to eliminate primer dimers. The purified PCR pools were analyzed on agarose gels and Agilent 2100 Bioanalyser to verify the elimination of primer dimers and correct amplicon sizes. Concentration of purified PCR pools was measured by Nanodrop2000 (Thermo Fisher Scientific, Waltham, MA, USA), and an aliquot adjusted to 4 nM concentration was pooled with another unrelated DNA material and added to an Illumina MiSeq instrument for paired-end 300 bp reads using a MiSeq v3 flow cell.
DBLα-tag sequence analysisRequest a detailed protocol
The paired-end DBLα-tag sequences were identified and partitioned into the correct sample origin based on unique index sequences. Each indexed raw sequence-pair was then processed through the Galaxy webtool (usegalaxy.eu). Read quality checks were first performed with FastQC to ensure a good NGS run (sufficient base quality, read length, duplication etc.). Next, the sequences were trimmed by the Trimmomatic application, with a four-base sliding window approach and a Phred quality score above 20 to ensure high sequence quality output. The trimmed sequences were then paired and converted, following analysis using the Varia tool for quantification and prediction of the domain composition of the full-length var sequences from which the DBLα-tag originated (Mackenzie et al., 2020). In brief, Varia clusters DBLα-tags with 99% sequence identity using Vsearch program (v2.14.2), and each unique tag is used to search a database consisting of roughly 235,000 annotated var genes for near identical var sequences (95% identity over 200 nucleotides). The domain composition of all ‘hit’ sequences is checked for conflicting annotations, and the most likely domain composition is retuned. The tool validation indicated prediction of correct domain compositions for around 85% of randomly selected var tags, with a higher hit rate and accuracy of the N-terminal domains. An average of 2,223.70 reads per patient sample was obtained, and clusters consisting of less than 10 reads were excluded from the analysis. The raw Varia output file is given in Supplementary file 10. The proportion of transcripts encoding a given PfEMP1 domain type or subtype was calculated for each patient. These expression levels were used to first test the hypothesis that N-terminal domain types associated with EPCR are found more frequently in first-time infections or upon severity of disease, while those associated with non-EPCR binding were associated with pre-exposed or mild cases. Secondly, quantile regression was used to calculate median differences (with 95% confidence intervals) in expression levels for all main domain classes and subtypes between severity and exposure groups. All analyses were done with R (4.02) using the package quantreg (5.73) for quantile regression.
For the comparison of both approaches, DBLα-tag sequencing and RNA-seq, only RNA-seq contigs spanning the whole DBLα-tag region were considered. All conserved variants, the subfamilies var1, var2csa, and var3, detected by RNA-seq, were omitted from analysis since they were not properly amplified by the DBLα-tag primers. To scan for the occurrence of DBLα-tag sequences within the contigs assembled from the RNA-seq data, we applied BLAST (basic local alignment search tool) v2.9.0 software (Altschul et al., 1990). Therefore, we created a BLAST database from the RNA-seq assemblies and screened for the occurrence of those DBLα-tag sequence with more than 97% percent sequence identity using the ‘megablast option’.
Calculation of the proportion of RNA-seq data covered by DBLα-tag was done with the upper 75th percentile based on total RPKM values determined for each patient. Vice versa, only DBLα-tag clusters with more than 10 reads were considered and percent coverage of reads and clusters calculated for each individual patient.
For all samples, the agreement between the two molecular methods DBLα-tag sequencing and RNA-seq was analyzed with a Bland-Altman plot, each individually, and summarized. The ratios between %-transformed measurements are plotted on the y-axis and the mean of the respective DBLα-tag and RNA-seq results is plotted on the x-axis. The bias and the 95% limits of agreement were calculated using GraphPad Prism 8.4.2.
Sequencing data have been deposited at NCBI under the BioProject accession number PRJNA679547.
NCBI BioProjectID PRJNA679547. Data from: Common virulence gene expression in adult first-time infected patients and severe cases.
Basic local alignment search toolJournal of Molecular Biology 215:403–410.https://doi.org/10.1016/S0022-2836(05)80360-2
Controlling the false discovery rate: a practical and powerful approach to multiple testingJournal of the Royal Statistical Society: Series B 57:289–300.https://doi.org/10.1111/j.2517-6161.1995.tb02031.x
Sequential, ordered acquisition of antibodies to plasmodium falciparum erythrocyte membrane protein 1 domainsThe Journal of Immunology 183:3356–3363.https://doi.org/10.4049/jimmunol.0901331
Contrasting patterns of serologic and functional antibody dynamics to plasmodium falciparum antigens in a kenyan birth cohortClinical and Vaccine Immunology 23:104–116.https://doi.org/10.1128/CVI.00452-15
The relationship between age and the manifestations of and mortality associated with severe malariaClinical Infectious Diseases 47:151–157.https://doi.org/10.1086/589287
Transcribed var genes associated with placental malaria in malawian womenInfection and Immunity 74:4875–4883.https://doi.org/10.1128/IAI.01978-05
Accelerated profile HMM searchesPLOS Computational Biology 7:1002195.https://doi.org/10.1371/journal.pcbi.1002195
Search and clustering orders of magnitude faster than BLASTBioinformatics 26:2460–2461.https://doi.org/10.1093/bioinformatics/btq461
CAP3: a DNA sequence assembly programGenome Research 9:868–877.https://doi.org/10.1101/gr.9.9.868
Assessment of the neutrophilic antibody-dependent respiratory burst (ADRB) response to plasmodium falciparumJournal of Leukocyte Biology 96:1131–1142.https://doi.org/10.1189/jlb.4A0614-283RR
Linking EPCR-Binding PfEMP1 to brain swelling in pediatric cerebral malariaCell Host & Microbe 22:601–614.https://doi.org/10.1016/j.chom.2017.09.009
The subread aligner: fast, accurate and scalable read mapping by seed-and-voteNucleic Acids Research 41:e108.https://doi.org/10.1093/nar/gkt214
Parameter estimation for the exponential-normal convolution model for background correction of affymetrix GeneChip dataStatistical Applications in Genetics and Molecular Biology 5:Article24.https://doi.org/10.2202/1544-6115.1237
Plasmodium falciparum erythrocyte membrane protein 1 diversity in seven genomes--divide and conquerPLOS Computational Biology 6:e1000933.https://doi.org/10.1371/journal.pcbi.1000933
G:profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update)Nucleic Acids Research 47:W191–W198.https://doi.org/10.1093/nar/gkz369
Extensive genetic diversity of plasmodium falciparum isolates collected from patients with severe malaria in Dakar, SenegalTransactions of the Royal Society of Tropical Medicine and Hygiene 90:704–711.https://doi.org/10.1016/S0035-9203(96)90446-0
Short report: positive correlation between rosetting and parasitemia in plasmodium falciparum clinical isolatesThe American Journal of Tropical Medicine and Hygiene 66:458–460.https://doi.org/10.4269/ajtmh.2002.66.458
Adhesion of plasmodium falciparum-infected erythrocytes to human cells: molecular mechanisms and therapeutic implicationsExpert Reviews in Molecular Medicine 11:e16.https://doi.org/10.1017/S1462399409001082
Evidence for the involvement of VAR2CSA in pregnancy-associated malariaJournal of Experimental Medicine 200:1197–1203.https://doi.org/10.1084/jem.20041579
Age as a risk factor for severe plasmodium falciparum malaria in nonimmune patientsClinical Infectious Diseases 33:1774–1777.https://doi.org/10.1086/322522
Booklimma: Linear Models for Microarray DataIn: Irizarry R. A, Dudoit S, editors. Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer-Verlag. pp. 397–420.https://doi.org/10.1007/0-387-29362-0_23
Cerebral malaria is associated with differential cytoadherence to brain endothelial cellsEMBO Molecular Medicine 11:e9164.https://doi.org/10.15252/emmm.201809164
Plasmodium falciparum complicated malaria: modulation and connectivity between exportome and variant surface antigen gene familiesMolecular and Biochemical Parasitology 201:31–46.https://doi.org/10.1016/j.molbiopara.2015.05.005
An immunohistochemical study of the pathology of fatal malaria: evidence for widespread endothelial activation and a potential role for intercellular adhesion molecule-1 in cerebral sequestrationThe American Journal of Pathology 145:1057–1069.
IgG antibodies to endothelial protein C receptor-binding cysteine-rich interdomain region domains of plasmodium falciparum erythrocyte membrane protein 1 are acquired early in life in individuals exposed to malariaInfection and Immunity 83:3096–3103.https://doi.org/10.1128/IAI.00271-15
NSR-seq transcriptional profiling enables identification of a gene signature of plasmodium falciparum parasites infecting childrenJournal of Clinical Investigation 121:1119–1129.https://doi.org/10.1172/JCI43457
Prognostic indicators of life-threatening malaria are associated with distinct parasite variant antigen profilesScience Translational Medicine 4:129ra45.https://doi.org/10.1126/scitranslmed.3003247
The 3d7var5.2 (var COMMON) type var gene family is commonly expressed in non-placental Plasmodium falciparum malariaMolecular and Biochemical Parasitology 127:179–191.https://doi.org/10.1016/S0166-6851(03)00004-5
Severe malariaTropical Medicine & International Health 19:7–131.https://doi.org/10.1111/tmi.12313_2
Urszula KrzychReviewing Editor; Walter Reed Army Institute of Research, United States
Dominique Soldati-FavreSenior Editor; University of Geneva, Switzerland
In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.
[Editors' note: this paper was reviewed by Review Commons.]
This study describes a detailed analysis of gene expression in Plasmodium falciparum, malaria causing parasite, isolated from a group of adult travelers returning to Germany. The authors developed new approaches that allowed them to confirm the presence of particular parasite variants, whereby one is associated with severe malaria and the other with non-severe malaria infection. This is a comprehensive study with an interesting dataset and the findings highlight the key knowledge gap about variant antigenic expression in adult severe malaria.https://doi.org/10.7554/eLife.69040.sa1
Summary: The submission from Wichers and colleagues describes a detailed analysis of gene expression in P. falciparum malaria parasites isolated from a cohort of patients in Germany who recently traveled in regions of the world with significant malaria transmission. The authors analyzed gene expression in parasites isolated from 32 patients and associated specific expression profiles with disease severity and pre-exposure of the patient to malaria. In particular, the authors focused on var/PfEMP1 expression since this gene family has long been associated with disease severity. The overall findings of the study are largely supportive of previous conclusions from other researchers and reinforce the hypothesis that PfEMP1s that bind to EPCR are associated with severe disease while those binding to CD36 are typically expressed in non-severe disease. The researchers also report a shift in the "age" of circulating parasites between severe and non-severe disease as well as a curious association with expression of specific var1 alleles.
1. The authors detected a shift in "age" of parasites obtained from non-severe infections compared to severe malaria patients, which they thought was important enough to include the in the abstract of the paper. However, in the Results section, the authors refer to this as a "small bias" and it seems that the difference (as displayed in Figure 2) does not reach statistical significance. The shift in "age" might indeed be important and therefore is worth pointing out to the readers, however the authors should explicitly state that the difference does not reach statistical significance. Given that it is mentioned prominently in the abstract, it is important not to mislead readers.
We decided to omit the statement about parasite age in the abstract and moved figure 2 into the supplements (Figure 2 —figure supplement 1). Additionally, we included a statement in the manuscript that the observed difference in age did not reach statistical significance (Line 215ff): “However, a small, statistically not significant bias (p=0.17) towards younger parasites in the severe cases was observed with a median age of 8.2 hpi (IQR: 8.0–9.8) in comparison to 9.8 hpi in the non-severe cases (IQR: 8.2–11.4) (Table 1, Figure 2 —figure supplement 1D).”
2. The authors performed differential expression analysis using their RNA-seq datasets and identified genes that change in expression according to disease severity and the patient's exposure status. However, given that the authors also detected a shift in "age" (or more specifically stage of the asexual cycle) between the different sample subsets, these changes in gene expression could simply reflect the shifted cycle. A reanalysis of the gene sets that shift up or down in the different samples should enable the authors to determine if these shifts can be explained simply by a shift in timing. If the change in timing explains many or most of the changes in gene expression, this provides additional support for the authors' conclusion that parasite "age" correlates with disease severity. In contrast, genes in which the change in expression does not correlate with the shift in "age" might be directly involved in disease severity.
We currently do control for the parasite age in the differential gene expression analysis by including the inferred parasite stage proportions as covariates in differential expression regression analysis. However, we realized this could be made clearer in the text. We have now added the following sentence to clarify that the differential expression analysis accounts for the lifecycle stage of the parasite within each sample (Line 220ff): “The estimated proportions were used to control for differences in parasite stage between samples by including them as covariates in the regression analysis of differential gene expression (Figure 2A, Figure 2 —figure supplement 2).” Moreover, we have inserted an overview of the methods used in Figure 2A and Figure 2 —figure supplement 2 showing the control for parasite stages and other unwanted factors of variation in the analysis of core gene expression.
3. The authors refer to the "age" of the parasites in their samples, by which they mean how far into the 48 hour asexual cycle the parasites have progressed. This is fine, however readers not intimately familiar with this concept might find this initially confusing. It would be good to define what the authors mean by age when they first mention this term.
Thanks for pointing this out. We included a definition of ‘parasite age’ in Line 209ff: “Variation in parasite ages – defined as the progression of the intraerythrocytic development cycle measured in hours post invasion – in the different patient samples was analyzed with a mixture model in accordance to Tonkin-Hill et al. (Tonkin-Hill et al., 2018) using published data from López-Barragán et al. as a reference (López-Barragán et al., 2011).”
4. The authors focus somewhat on patient #21 in analysis of the data shown in Figure 1. It might be helpful to mark on the panels in the figure which of the circles represent this specific patient. If I looked close, I could figure it out, but it would be more intuitive if it were marked, or example with an arrow or asterisk.
We have marked patient #21 and #26 in all panels of Figure 1 as indicated in the figure legend (Line 1171f): “Patient #21 is shown as filled circle in grey with a cross, patient #26 is represented by an open circle with cross.”
5. Also in Figure 1, the authors assay for previous exposure to P. falciparum by measuring several immune responses. I was a bit surprised that they didn't include a few samples from uninfected individuals who have never traveled to regions of malaria transmission to serve as negative controls.
We actually included negative controls in all serological assays and positive controls in mADRB, ELISA and protein microarray. This data is provided in the Supplementary table S1. Furthermore, we now explicitly state that also in the Material and Method section:
– For the mADRB assay (Line 546ff): “Meanwhile, 1.25 x 106 merozoites were incubated with 50 µl of 1:5 diluted plasma (decomplemented) from adult patients as well as from established negative and positive control donors for 2 h.”
– For ELISA (Line 559f): “After washing using PBS/0.1% Tween, plasma samples from patients and control donors were added at two-fold dilutions of 1:200 to 1:12800 in PBS/0.1% Casein.”
– For protein microarray (Line 578ff): “Study serum samples as well as the positive and European control sera were diluted 1:50 in 0.05X Super G Blocking Buffer (Grace Bio-Labs, Inc.) containing 10% E. coli lysate (GenScript, Piscataway, NJ) and incubated for 30 minutes on a shaker at RT.”
We also would like to point out that a Danish cohort was also used to determine the background MFI in the Luminex assay and served as a cut-off (plus 2 STD) for seropositivity in Figure 7C. The data set for the whole danish cohort as well as the calculation of the threshold for seropositivity can be found in Supplementary Table S1. This is mentioned in the text line 374ff: “The data was also analyzed using the average MFI reactivity (plus two standard deviations) of an Danish control cohort as a cut off for seropositivity to calculate the coverage of antigen recognition (Table S1) (Cham et al., 2010).” And in Line 538ff: “Alternatively, the breadth of antibody recognition (%) was calculated using MFI values from Danish controls plus two standard deviations
(SD) as cut off.”
We feel that addition of more sample groups in the plots of Figure 1 would make this already data-dense figure even more complex. Additionally, since the individuals used for control samples differed between the serological assays (in dependence of the location the assays were performed), the controls cannot be integrated into the random forest approach.
6. In the analysis shown in Figure 4, the authors note that var1, var2csa and var3 are expressed at higher frequencies in first-time infected patients. It would be good to mark the contigs representing these genes on the figure.
We have grouped and marked contigs of the conserved var gene variants var1, var2csa and var3 in Figure 4 and 5 to make the figures more readable. See also major comment 7 and minor comments of Reviewer 2.
7. The authors refer to var1 when describing the conserved gene found within one of the subtelomeric regions of chromosome 5. Since this gene was originally named "var1csa", it would be good for the authors to mention the original name when first describing the gene to avoid confusion. I support the idea of renaming this gene to simply var1 since the "csa" part of the name is misleading, however explicitly stating that they are referring to this previously named gene will be helpful.
We referred to the original name of the var1 variant by placing “previously known as var1csa” into brackets in Line 94ff when we first mention the gene.
8. The tables listed at the end of the manuscript (Tables 1-8) were not available. I suspect this was an error in how the manuscript files were uploaded.
Sorry for that. Yes, that was indeed an error during the upload process, which has already been corrected as recognized. Since we removed Table 1 and 3-8 and included the data into the main figures, the updated version contains only a single table (previous Table 2).
Significance: Overall this is a comprehensive and detailed study that researchers studying malaria pathogenesis will find valuable. Some of the specifics, for example the nuances of the different combinations of PfEMP1 domains, might appeal to a somewhat narrow readership, although the methods and approaches that the authors employed are broadly valuable.
My expertise: Gene expression analysis of malaria parasites.
Referees cross-commenting: Similar to reviewer #3, I was also happy to see that the three reviews are in agreement that no additional experiments are requested. The authors should be able to modify the manuscript with the reviewers’ comments in mind and make the paper suitable for publication.
Summary: The var gene family is a major virulence determinant for P. falciparum, but it is extremely challenging to study because of the large size of genes and the diversity of var genes between parasite genotypes. This study developed new approaches to analyze and classify var genes expressed by patient isolates. The technical approaches are state of the art and provide the most complete picture to date of the types of parasite variants that are circulating in patients with different malaria symptoms. The authors provide strong evidence that the ratio of different subsets of parasite variants (EPCR and CD36 binding) are skewed in patient subsets with elevated EPCR-binding var transcripts in malaria naïve and severe malaria patients and elevated CD36-binding variants in non-severe infections. The new approaches for var gene profiling and classification of predicted binding phenotypes are a major technical advance. Conversely, there is no direct evidence that the EPCR-binding phenotype confers more efficient sequestration and alternative hypotheses have not been considered. Overall, the new methodology is a tour de force, but the manuscript has not been written in a way that will be easily understood by a broad readership. I have some suggestions to improve its clarity.
1. A strength of the manuscript is the detailed characterization of parasite var transcripts by both NGS analysis and deep sequencing of DBLa amplicons. These two approaches were leveraged by the powerful bioinformatic classification into different functional groupings. The conclusion about skewed expression of EPCR-binding var transcripts in severe malaria and CD36-binding var transcripts in non-severe patients is strongly supported by the combined var profiling approaches.
We are grateful for this comment.
2. At the same time, it is worthwhile to point out that all patients are infected by a mixture of parasite variants and that CD36-binding var transcripts appear to be the dominant var transcripts in most patients, median ~60% in firsttime infected and ~80% in pre-exposed (Figure 7). I think that the literature has been more focused on the association of EPCR with severity, but less attention has been placed on the proportion of different var subsets in all patients.
Indeed, most studies – including this one – are looking at differentially expressed genes since measurements of total abundances of the different var groups are even more challenging. Previous studies tried to estimate transcript abundance encoding CD36-binding variants were hampered by a lower coverage or poor sensitivity of qPCR primer pairs targeting these N-terminal head structure domains (DBLa0, CIDRa2-6). Vice versa, DBLatag approaches are slightly biased towards CD36-binding variants (recognize more DBLa0 than DBLa1/2) and fail to detect most of the conserved var variants. Therefore, the abundance of certain transcripts may have been underestimated in the different approaches. However, there is cumulative evidence also from other studies, that CIDRa1 containing transcripts are dominating the infection in children with severe malaria, severe anemia and cerebral malaria. In contrast, transcripts with CIDRa2-6 domains are most abundantly expressed during uncomplicated malaria (Jespersen et al., 2016; Duffy et al., 2019; Warimwe et al., 2009 & 2012). Our adult patient cohort differs not only in age of the patients, but also in terms of severity, so maybe our firsttime infected and less sick adult patients have less dominant expression of EPCR-binding variants. This is also in concordance with the studies by Bernabeu et al. (2016) and Subudhi et.al. (2015) who analyzed var expression in adult Indian cohorts. Using either qPCR primers targeting mainly domain cassettes or a custom cross-strain microarray they also found an elevated expression of domains with EPCR, rosetting and CD36 binding PfEMP1 in severe adult cases. This may indicate that parasite binding to CD36 is enhanced in adult severe malaria cases compared to children, which is interesting due to their different disease symptomatology.
To address this in the manuscript the following paragraph was inserted into the Discussion section (Line 457ff): “Measurements of the total abundances of the different var groups are challenging (e.g., due differences in the coverage or sensitivity of qPCR or DBLα-tag primer pairs targeting the different var groups), but there is cumulative evidence that CIDRα1 containing transcripts are dominating the infection in children with severe malaria, severe anemia and cerebral malaria and transcripts with CIDRα2-6 domains are most abundantly expressed during uncomplicated malaria (Jespersen et al., 2016; Duffy et al., 2019; Warimwe et al., 2009, 2012). Although the median expression of CIDRα2-6 is lower in first-time infected and severe cases compared to preexposed and non-severe cases, in most of our adult patients CD36-binding var transcripts appear to dominate the expression pattern. This is in concordance with all three other adult studies also indicating a substantial expression of B- and C-type variants associated with binding of CD36 (Argy et al., 2017; Bernabeu et al., 2016; Subudhi et al., 2015) and Subudhi et al. even show an association with complicated adult malaria (Subudhi et al., 2015). Maybe parasite binding to CD36 is specifically enhanced in adult severe malaria cases compared to children, which is interesting due to their different disease symptomatology (Dondorp et al., 2008; Schwartz et al., 2001). Alternatively, our adult cohort differs not only in age but also in terms of disease severity from pedatric cohorts, and less sick patients may simply have a less dominant expression of EPCR-binding variants. However, for the parasite’s survival and transmission, it may be highly beneficial to express more of the less virulent PfEMP1 variants able to bind CD36. This interaction may not, or is less likely to, result in obstruction of blood flow, inflammation and organ failure at least of the brain, where CD36 is nearly absent (Turner et al., 1994).”
3. There is no direct evidence that patients with first-time infection were selected "for parasites with more efficient sequestration" (Line 44) or "that the EPCR-binding phenotype confers more efficient sequestration of infected erythrocytes" (Line 127). Indeed, predicted CD36 var transcripts are highly expressed in all patients, so differences in circulating parasite ages are difficult to attribute to any given cytoadhesion trait. Moreover, alternative hypothesis for earlier sequestration have not been considered, such as febrile temperature increases cytoadherence of ring-infected P. falciparum-infected erythrocytes (Udomsanpetch et al. PNAS 2002, PMID 12177447) or that parasites may respond to host factors, like lactate levels, by increasing other transcripts that may modify the IE rigidity or adhesion strength (e.g. Lee et al. Nat. Micro. 2019, PMID 29950443 or their own previous study Tonkin-Hill Plos Biology 2018). Although the age of circulating parasites was earlier in severe cases, it did not reach significance (Figure 2D). Unless there is more direct evidence for this conclusion, it should be removed from the abstract and qualified, as well as consider other hypotheses that could contribute to the strength of ringIE cytoadherence.
We agree, that no direct evidence is given and hence we modified/deleted the statements mentioned by the reviewer in the abstract (Line 45ff) and in the last paragraph of the introduction (Line 141ff) accordingly. The paragraph concerning the earlier sequestration upon severity has been removed from the discussion and we modified the last paragraph (Line 496ff). Figure 2 showing the stage distribution has been moved to the Supplement (Figure 2 —figure supplement 1).
– Abstract (Line 45ff): “First-time infected adults were more likely to develop severe symptoms and tended to be infected for a longer period. Thus, parasites with more pathogenic PfEMP1 variants are more common in patients with a naïve immune status and/or adverse inflammatory host responses to first infections favors growth of EPCR-binding parasites.”
– Introduction (Line 141ff): “Interestingly, severe complications occurred only in the group of first-time infected patients who tended to be infected for a longer period, indicating that severity of infection in adults is dependent on duration of infection, host immunity and parasite virulence gene expression.”
– Discussion (Line 496ff): “In summary, our data show a significant increase in transcripts encoding EPCRbinding and other A-type variants in parasites from severe and first-time infected patients, conversely transcripts of CD36-binding variants are found more frequently in parasites from non-severe and preexposed patients. Since CD36-binding variants are still overrepresented in all groups of adult malaria patients we postulate that the parasite population in first-time infected individuals may have broad binding potential after liver release as there is no pre-existing immunity to clear previously experienced PfEMP1 variants. During the blood stage infection selection towards EPCR-binding and other A-type variants, which may confer a parasite growth advantage and also increase the risk for severe malaria, may already have occurred in our adult severe malaria patients indicated by the longer period of infection.”
4. I don't think that additional experiments are needed to support the paper's claims. However, the clarity of the manuscript needs to be improved, so it can be understood by a broader readership.
Introduction: The introduction is heavy on the var nomenclature. Although I appreciate that authors need to explain the var terminology, the introduction is very light on the specific research question. Moreover, they have not really placed the work in context to other attempts at var profiling. It is difficult for the reader to appreciate why it is difficult to profile var transcripts and why these genes are usually ignored in NGS approaches. This is one of the few studies that have attempted to assemble var reads from NGS. The reality is that NGS provides only a partial view of the var transcriptome. Most of the expressed var transcripts cannot be assembled into full-length contigs and contigs may represent different parts of the same gene. Here the authors have developed innovative bioinformatic approaches to classify the assembled var contigs (partial or otherwise) into different var groups, domains, and homology blocks by comparison to an annotated database of 2,400 sequenced parasite genomes and 235,000 annotated var genes. Also, there is almost nothing about the cohort in the introduction or why adult travelers are interesting to study. I think the manuscript will be much clearer if there is less var nomenclature in the introduction and more explanation of how the authors tackled this challenge of var profiling.
We are thankful for the detailed suggestions and have modified the manuscript accordingly:
Introduction: We included more information on adult severe malaria and our cohort as well as on the challenge of studying var gene expression in patient isolates:
– Line 58ff: “In particular, children under five years of age and pregnant women suffer from severe disease, but adults from areas of lower endemicity and non-immune travelers are also vulnerable to severe malaria. Both, children and adults are affected by cerebral malaria, but the prevalence of different features of severe malaria differs with increasing age. Anemia and convulsions are more frequent in children, jaundice indicative of hepatic dysfunction and oliguric renal failure are the dominant manifestations in adults (Dondorp et al., 2008; World Health Organization (WHO), 2014). Moreover, the mortality increases with age (Dondorp et al., 2008) and was previously determined as a risk factor for severe malaria and fatal outcome in non-immune patients, but the causing factors are largely unknown (Schwartz et al., 2001).”
– Line 114ff: “Due to the sequence diversity of var genes, studies of var expression in patients have relied on analysis of DBLα expressed sequence tags (EST) (Warimwe et al., 2009, 2012) informing on relative distribution of different var transcripts and qPCR primer sets covering some but not all subsets of DBL and CIDR domains (Lavstsen et al., 2012; Mkumbaye et al., 2017). So far, only very few studies (Tonkin-
Hill et al., 2018; Andrade et al., 2020; Duffy et al., 2016; Kamaliddin et al., 2019) have used the RNA-seq technology to quantify assembled var transcripts in vivo. Moreover, most studies have focused on the role of PfEMP1 in severe pediatric malaria. Consensus from these studies is, that severe malaria in children is associated with expression of PfEMP1 with EPCR-binding CIDRα1 domains (Jespersen et al., 2016; Kessler et al., 2017; Storm et al., 2019; Shabani et al., 2017; Mkumbaye et al., 2017; MagallónTejada et al., 2016), but elevated expression of dual EPCR and ICAM-1-binding PfEMP1 (Lennartz et al., 2017) and the group A associated DC5 and DC6 have also been associated with severe disease outcome (Magallón-Tejada et al., 2016; Avril et al., 2013, 2012; Claessens et al., 2012; Lavstsen et al., 2012; Duffy et al., 2019). Less effort has been put into understanding the role of PfEMP1 in relation to severe disease in adults, and its different symptomatology and higher fatality rate. Two gene expression studies from regions of unstable transmission in India showed elevated expression of EPCR-binding variants (DC8, DC13) and DC6 (Bernabeu et al., 2016; Subudhi et al., 2015), but also of transcripts encoding B- and Ctype PfEMP1 in severe cases (Subudhi et al., 2015).”
Discussion: Two additional paragraphs about our cohort and severe adult malaria (Line 388ff) as well as on var gene expression data from other adult malaria cohorts in relation to our data (Line 443ff) were integrated.
– Line 391ff: “Non-immune travelers and adults from areas of unstable malaria transmission are prone to severe malaria. Currently, scarce information on the PfEMP1-mediated pathogenicity responsible for the different symptomatology in comparison to pediatric severe malaria and the higher fatality rate in adults is available. Here we present the first in-depth gene expression analysis of 32 ex vivo blood samples from adult travelers using RNA-seq and expressed sequence tag analyses. Despite the relatively low number of patient samples recruited in 5 years, our data confirmed previously reported associations between transcripts encoding type A and B EPCR-binding PfEMP1 and infections in naïve hosts and disease severity (Table S6, S7)(Duffy et al., 2019; Tonkin-Hill et al., 2018; Kessler et al., 2017; Bernabeu et al., 2016; Jespersen et al., 2016; Lavstsen et al., 2012). Our results further suggests that parasite interaction with EPCR is linked to severe disease in children as well as in adults. However, since CIDRα1containing PfEMP1s possess multiple binding traits (Lennartz et al., 2017; Magallón-Tejada et al., 2016), co-interaction with other receptors may further increase the risk for severe malaria.”
– Line 446ff: “Data from adult cohorts are rather limited restricting our comparison mainly to three var gene expression studies based on qPCR or a custom cross-strain microarray (Argy et al., 2017; Bernabeu et al., 2016; Subudhi et al., 2015), but the majority of cases from the Indonesian RNA-seq study are also adults (Tonkin-Hill et al., 2018). A high expression of A- and B-type var genes and an association of DC4, 8 and 13 with disease severity has been reported for malaria cases imported to France (Argy et al., 2017) and parasites from severe Indian adults show an elevated expression of DC6 and 8 (Bernabeu et al., 2016) and DC13 (Subudhi et al., 2015). In the severe cases from Indonesia mainly the DCs 4, 8 and 11 were found on an elevated level (Tonkin-Hill et al., 2018). All studies are in agreement with the expression data from our cohort of adult travelers, although we here in addition found a higher expression of DC15 (EPCR binding) and DC16 (putative rosetting variants) var genes in malaria-naïve patients.“
5. Add Supplemental Overview Figure of methodology: Because of read depth, it is not possible to get the full-length sequence of most var genes and contigs may represent different parts of the same gene. A major technological advance here was leveraging the vast resource of sequenced and annotated var genes from thousands of parasite genomes to classify var gene fragments into domains and homology blocks and then make predictions about binding properties. It would be useful to provide a supplemental overview figure about how the two var profiling approaches (NGS and DBL amplicons) were fed into the system for bioinformatic classification.
We added two methodology figures. A general overview of both analysis approaches (NGS and EST=DBLa-tag) is now shown in Figure 2A. This will give the reader an overview about the concept of the different layers of var gene analysis used in this study. Furthermore, a more detailed Figure for the RNA-seq analysis including all the bioinformatic approaches used is presented in the Figure 2 —figure supplement 2 and we provide a Venn diagram of our patient cohort in Figure 1G.
6. The Figures and Tables relating to the var gene analysis are many and difficult to interpret without knowing the significance of the different domain types and homology blocks. Whereas analysis of domains and homology blocks are shown in Figures 4- 7, it is not until you come to summary Figure 8 that it shows how different domain types and homology blocks map to PfEMP1 subsets/adhesion types. Without this information, it is challenging to see how different var subsets differ before patient groups. The Figures needs to be more interpretable on their own without having to reference accompanying Tables. Consider moving the summary Figure 8 before Figures 4 to 7. Alternatively, introduce a var/PfEMP1 schematic in Figure 4. The schematic should introduce the concept of different var groups/PfEMP1 head structures/binding phenotypes/homology blocks. I recommend that you dispense with the terminology "segment level" from Tonkin-Hill 2018. There is already so much nomenclature for the reader to follow, plus many Figures and Tables to make sense of (9 Figures, 8 Tables, 10 Supplemental Tables).
To address this comment, we have moved Figure 8 (new Figure 3) showing a simplified version of the PfEMP1 scheme and a summary of the results from expression analysis to the beginning of the var expression analysis result section. The figure and the detailed legend will help the readers to understand the overall concept of var genes, their further subclassification and binding properties of the encoded PfEMP1 proteins. We have replaced the term “segment” by homology block in the entire manuscript (Materials and methods section ‘var domain and homology block differential expression’, Line 732ff).
7. Main Figures: In addition, it would help immensely if you would organize Figures 4-7, so that domains or homology blocks are arranged by predicted functional categories (e.g., DC8-EPCR, Group A-EPCR, Group A-nonEPCR, Group B or C-CD36). While this information is present in the accompanying Table, this information should be evident from the Figures, too. For instance, Figure 5 – Rearrange by var groups. Put the three DC8 domains together (DBLa2, CIDRa1.1, DBLb12), so it is easier to do a side-by-side comparison. For example, arrange by DC8-EPCR (DBLa2, CIDRa1.1, DBLb12), Group A-EPCR (e.g. DBLa1.7, CIDRa1.4, CIDRa1.7, etc), Group B or C-CD36. This will make it much clearer whether specific var subsets differ in the patient comparisons. Additionally, it would be more intuitive if you underline domains that target the same var group (DC8-EPCR, Group A-EPCR, Group A-nonEPCR, Group B or C-CD36) in panels B, D, E, and F. Having the information within the Figure will make it so much easier to follow. Figure 6 – Likewise, homology blocks should be arranged by var groups and add a label underneath the panels (e.g. block 584 = DC8, block 155 = NTSA). Figure 7 – Likewise, rearrange and label functional groups. To understand how you classified DBLa amplicons, I think that this figure also requires a schematic illustration at the top. As I understand it, you amplified a small tag from the DBLa domain and BLASTed against 235,000 annotated var genes to assign the tag to a PfEMP1/var group (DBLa0 = Group B or C, DBLa1 = group A, DBLa2 = DC8) and to predict the flanking NTS and CIDR domain types. Thus, this approach allows you to predict the PfEMP1 head structure from a small DBLa tag (not the full domain). In panel A, this approach was used to distinguish between Group A head structures that either bind EPCR (CIDRa1) or do not (CIDRb, y, or d). Furthermore, based on the DBLa amplicon read density, you are classifying the proportion of different var transcripts of different var groups/head structures. This approach would be more evident if you added a schematic at the top of Figure 7 showing the different types of PfEMP1 head structures, plus what was amplified and what was predicted by BLAST classification.
We have reorganized the data in all panels of the figures 4-7 as suggested (see also reviewer 1, minor comment #6). First, to make a better comparison between the different levels of var gene analysis, we have merged results according to the patient groups analyzed leading to the new Figures 4 (first-time infected versus pre-exposed) and 5 (severe versus non-severe). For all figures we have introduced a color code for (i) var gene groups (A: red, A-var1: dark red, A-var3: orange, B: blue, C: green, E-var2csa: yellow) and (ii) associated binding phenotypes (EPCR: beige, CD36: turquois, brown: unknown A-type, pink: ICAM-1, salmon: PECAM1, pale red: gC1qR). The color code enables the uniform labelling of transcripts, domains and homology blocks in the most correct way and accounts for the complexity of var genes/PfEMP1. We have also integrated the domain composition of the assembled contigs as well as the p-values into the Figures 4 and 5. Since all data from the tables 3-8 is now shown in the figures or can be easily found in the Supplementary Tables S9 and S10, we omitted the tables to avoid redundancy. Furthermore, we have grouped contigs of the conserved var variants var1, var2csa and var3 and marked them with the color code scheme in the analysis on transcript level (Figure 4&5A,B). On domain level we labelled the associated var gene groups and binding phenotypes and arranged the domains in accordance to their DC. Moreover, a scheme showing the domain cassettes (DC) 1, 8, 13, 15, 16 was added showing how many domains of the DCs were found enriched in the different groups of parasite isolates. On homology block level we have indicated the associated var groups, binding phenotype and PfEMP1 domains either by using the color code and by extending the labelling.
Regarding the DBLa-tag approach: The limitation of this approach is, that it can only predicts the domain composition if a hit of the DBLa-tag is found in the database (requirements: at least 95% sequence identity over 200 bp). So, Varia can predict as much as the data allows, in our case about 85% of the DBLa-tags. This is now also explained in the methodology overview (Figure 2A) and therefore a scheme as suggested by the reviewer was skipped for this figure. But we also reorganized the figure in terms of introducing the color codes mentioned for the RNA-seq domain level analysis.
1. Other Table comments
Tables 1 & 2 and cohort description:
– More information is needed in the Methods and the Results sections about this patient cohort. In particular, the Methods should explain how patients were classified as severe and non-severe and whether WHO severity criteria were applied.
In line with the WHO criteria of severe malaria in adults we classified patients as having severe malaria if they showed clinical or laboratory signs of organ dysfunction or had extremely high parasitemia of over 10% or circulating schizonts upon hospitalization. Hyperparasitemia (>5%) per se was not sufficient for inclusion into the severe subgroup. We have clarified this in the Materials and methods section by inserting a whole paragraph on ‘patient classification according to severity’ (Line 618ff): “Severity was defined in line with the WHO criteria for severe malaria in adults (World Health Organization (WHO), 2014). Patients were considered as having severe malaria if they showed signs of impaired organ function (e.g., jaundice, renal failure, cerebral manifestations) or had extremely high parasitemia (>10%). In addition, patients #1 and #26 were included into the severe group due to circulating schizonts indicative of a very high sequestering parasite biomass associated with severity (Bernabeu et al., 2016) (Table S2).”
We refer to this paragraph in the Results section (Line 192ff): "Eight patients from the malaria-naïve group were considered as having severe malaria based on the predefined criteria. The remaining 24 cases were assigned to the non-severe malaria group (Figure 1G, Table S2)."
– In addition, the terms "cerebral dysfunction" and liver dysfunction" should be defined in the Methods. For instance, patient 9 in Table 1 has "liver dysfunction with highly increased transaminases with two upward arrows" but is labeled "non-severe". Is this symptom different from jaundice?
Thank you for pointing out this inaccuracy. We have reviewed the clinical information and altered the table (now Table S2) to include only criteria that qualified the patients for the classification of severe malaria. We have removed mention of the transaminases as these are not a criterium for severe malaria. The specific patient mentioned here had elevated transaminases but no jaundice or elevated bilirubin and was clinically relatively well. Therefore, he was not classified as having severe malaria.
– More information is needed in the Results section about the characteristics of this cohort. For instance, it is noteworthy that the cohort was all adult cases (Table 2) and all patients had febrile malaria (Table 1). Thus, they were all symptomatic. This information could go in the beginning of the Results (line 130 – Cohort characterization).
We totally agree and now described the cohort in more detail as suggested (Line 149f): “This study is based on a cohort of 32 adult malaria patients hospitalized in Hamburg, Germany. All patients had fever indicative of symptomatic malaria.”
– Explanation is needed for MSP-1 groups 1 to 4.
We changed the heading of the line in Table 1 into “Number of MSP1 genotypes [n (%)]” for explanation.
– Site of infection: Is it correct that one patient was infected in Germany?
Yes, actually this patient was infected via needlestick accident in a German hospital. To avoid confusion, we have now changed the term into “geographical origin of parasite isolates”. Since the naturally infected patient returned from Sub-Saharan Afrika, we have simplified our list underneath Table 1 stating now: “Geographic origin of the parasite isolates: Ghana (n=10), Nigeria (n=6), other Sub-Saharan African countries (n=15), unknown (n=1)”.
– Salmon RNAseq pipeline (lines 243-267, Figure 4 & Table 3)
– This section is very confusing. As most var genes are not conserved across parasite genotypes it is important to explain to the reader that there are a few strain-transcendent variants in the parasite repertoire. Indeed, 8 of 15 contigs in the Salmon analysis belonged to strain-transcendent variants (var1, var2csa, var3). It would help to label the strain-transcendent variants in Figure 4.
To address this (please also refer to comment #6 from reviewer 1) we have now labelled conserved var variants in the Figures 4 and 5 and explicitly state in the section ‘Differential var transcript levels’ (Line 262ff): “We first looked for highly similar transcripts present in multiple samples. The Salmon RNA-seq quantification pipeline (Patro et al., 2017), which identifies equivalence classes allowing reads to contribute to the expression estimates of multiple transcripts, was used to estimate expression levels for each transcript. Due to the high diversity in var genes, mainly assembled transcripts of the strain-transcendent variants var1, var2csa and var3 were found to be differentially expressed.”
– Additionally, it would be good to say something about the 7 contigs that are not known as strain transcendent (e.g., NS_L1_88_Contig5431, Contig4767). What are these variants?
We already tried to address this in line 270ff of the result section: “Beyond the conserved variants, several var fragments from B- or C-type var genes were associated with a naïve immune status and three transcripts from A, DC8 and B-type var genes as well as var2csa were linked to severe malaria patients (Figure 4, 5, Table S9, S10).” Furthermore, the domain composition and associated var gene group and binding phenotype is now shown in the Figures 4 and 5 and can be found in the Supplement Tables S6. Results from BLAST of the contigs against the var database (varDB) and PacBio-assembled genomes can be found in addition to domain and homology block composition in Table S7.
– It is interesting that the two var1 alleles differed between patient subsets. Var1 has been considered a pseudogene with delayed transcription relative to other var genes, and less work has been done it.
Yes, we agree that this is a noteworthy observation which is already discussed in a whole paragraph (Line 477ff): “To the best of our knowledge, this study is the first description of expression differences between the two var1 variants, 3D7 and IT. The var1-IT variant was found enriched in parasites from first-time infected patients, whereas several transcripts of the var1-3D7 variant were increased in pre-exposed and non-severely ill patients. Expression of the var1 gene was previously observed to be elevated in malaria cases imported to France with an uncomplicated disease phenotype (Argy et al., 2017). In general, the var1 subfamily is ubiquitously transcribed (Winter et al., 2003; Duffy et al., 2006), atypically late in the cell cycle after transcription of var genes encoding the adhesion phenotype (Kyes et al., 2003; Duffy et al., 2002) and is annotated as a pseudogene in 3D7 due to a premature stop codon. Similarly, numerous isolates display frame-shift mutations often in exon 2 in the full gene sequences (Rask et al., 2010). However, none of these studies addressed differences in the two var1 variants that were recently identified by comparing var gene sequences from 714 P. falciparum genomes (Otto et al., 2019), and to date it is still unclear if both variants fulfill the same function or have the same characteristics previously described. Overall, the var1 gene – and the first 3.2 kb of the 3D7 variant in particular – seems to be under high evolutionary pressure (Otto et al., 2019) and both variants can be traced back before the split of P. reichenowi from P. praefalciparum and P. falciparum (Otto et al., 2018b). Our data indicate that the two variants, VAR13D7 and VAR1-IT, may have different roles during disease, however, this remains to be determined in future studies.”
– It also appears that mostly the 5' portion of var1 was transcribed (Table 1 and Figure S2). Do the authors have evidence if the full var1 transcript is being made or are these partial transcripts?
A common sequencing issue is the GC-bias leading to a higher coverage of GC-rich regions. We have already tried to address this by using the KAPA polymerase for amplification of the library, which is more or less dealing with that problem (Oyola et al., 2012). Mapping of the RNA-seq reads to the two different var1 variants shown in Figure 3 —figure supplement 1 indicates, that these transcripts are full-length, but with a better coverage at the 5’-end. Due the scaling the low coverage at the 3’end is difficult to display. Of note, for the var2csa gene coverage at the 3’-end was better than at the 5’-end.
– What is the difference between Contig2160 and Contig1987 in Table 3? Both include the NTSA-DBLa1.4 region of var1-3D7.
The contigs were de novo assembled from RNA-seq reads of different parasite isolates. The longer contig NS_L8_65_Contig2160 from patient isolate #29 and the shorter contig NS_L6_18_contig1987 from patient isolate #28. Both differ in length (7,253 bp and 548 bp) and pairwise sequence alignment reveals 7 mismatches (the sequences are provided in the Supplementary data S1). Since the var1-3D7 allele is particularly conserved in the 5’-region (Otto et.al., 2019), this may also be caused by sequencing errors. The Salmon expression quantification algorithm allows reads to contribute to the expression estimates of multiple transcripts and is this able to account for this redundancy in the set of var transcripts.
– Why is Contig36393 assigned "var1" and not to the 3D7 or IT allele?
The contig contains only C-terminal domains of the var1 gene, where both variants – 3D7 and IT – do not differ in domain composition. But pair-wise sequence alignment and BLAST reveal a substantial higher similarity to the var1-IT variant, so the contig was labelled accordingly in Figure 4
“NS_L1_80_Contig36393 γDBLγ8-DBLζ1-DBLε8 (var1-IT)”.
– Why does Table 4 have 5 contigs labeled "var1-3D7"? They appear to contain overlapping domains.
These contigs are assembled from separate samples. The Salmon expression quantification algorithm was used as it allows for redundancy in the reference database. Thus, multiple similar contigs originating from different samples can be called as differentially expressed. The results of the Salmon algorithm were compared with Corset, another algorithm that instead clusters similar transcripts and was found to give similar results. To make this clearer in the text we have added (Line 705ff): “As the RNA-seq reads from each sample were assembled independently it is possible for a highly similar transcript to be present multiple times in the combined set of transcripts from all samples. The Salmon algorithm identifies equivalence sets between transcripts allowing a single read to support the expression of multiple transcripts. As a result, Salmon accounts for the redundancy present in our whole set of var gene contigs from all separate sample-specific assemblies.”
2. Lines 145 and Lines 155-157: Specific patients are called out in the text "19-year old patient (#21)" and "patient #26", but these are not labeled in the figures.
This comment is in agreement with the minor comment 4 of reviewer 1. We have marked both patients in all panels of Figure 1.
3. Line 159 – This statement is confusing "the medical report showing that this patient from Jamaica was infected during his first trip to Africa". It is unclear when this patient was enrolled in your study. Are you saying patient #26 had prior malaria exposure on a previous African trip and was subsequently enrolled in this study from a new exposure to P. falciparum malaria?
We rephrased our sentence (Line 176ff): “The patient #26, positioned at the borderline to pre-exposed patients, was grouped into the naïve cluster in accordance with the Luminex data and the patient statement that this potentially pre-exposed patient returned from his first trip to Africa.”
4. MSP-1 genotyping was used to estimate the number of parasite genotypes. This should be mentioned in the Results section when referring to this genotype estimate.
We referred to this data shown in Table 1 by inserting the sentence (Line 150f): “MSP1 genotyping estimated a low number of different parasite genotypes present in the patients (Table 1).”
5. Line 167: Van Den Hoogen 2019 citation is missing in the references
Thanks for pointing this out. The reference is now correctly listed (Line 949ff).
6. Earlier work from your group and Lee et al. Nat Micro 2019 suggested that other non-var transcripts, such as KnobAssociated Histidine Rich Protein (KAHRP), were increased in severe malaria patients. Were any non-var transcripts involved in cytoadhesion and rigidity increased in severe malaria patients?
We already included expression data of genes involved in PfEMP1 biology (manually selected) in Supplementary Table S4 and S5 (“manual selection_PfEMP1 biology”), which is also referenced in the manuscript (Line 245ff): “In addition, we manually screened differentially expressed genes known to be involved in var gene regulation or correct display of PfEMP1 at the host cell surface (Table S4, S5).” In the first BioRxiv preprint version a detailed paragraph on this topic was included in the result and Discussion sections, but this was removed in order to streamline the manuscript. Nevertheless, we hope the interested reader might find our supplementary tables helpful. We didn’t observe a clear trend for KAHRP expression, even though we observed a slightly higher mean expression of KAHRP in the severe cases (logFC of 0.235), but this did not reach statistically significance. Instead, we observed significant higher expression of other PfEMP1 related genes, such as PTP1, PTP5 and SBP1.
Significance and Audience: This study is a technical tour de force and greatly extends the ability to investigate the P. falciparum var gene family in patient isolates, but it is not written in a manner that most readers will be able to interpret or appreciate the scientific advance because there is so much jargon. While the authors have done a heroic job of leveraging an enormous database of sequenced and annotated parasite genomes to classify assembled var contigs and DBLα amplicons into different functional groupings, the main take-aways will not be obvious to the general reader. I think this can be a highly influential study for its technical advance and because it more definitively implicates specific var subsets in malaria severity. As well, it addresses a relatively understudied adult travelers patient population. From a technical perspective, the tools and methods developed here will have great utility for future investigation of patient cohorts. There have been few attempts to profile var transcripts by NGS and the approaches developed here go much further beyond the author's previous work (Tonkin-Hill, Plos Biol, 2018). Put simply, it is extremely challenging to profile such a highly diverse gene family and this study offers a new and powerful approach to gain a more complete picture (still imperfect, but much improved).
Conversely, some of the study conclusions are not strongly supported. For instance, there is no direct evidence that patients with first-time infection were selected "for parasites with more efficient sequestration" (Line 44) or "that the EPCR-binding phenotype confers more efficient sequestration of infected erythrocytes" (Line 127). Indeed, predicted CD36 var transcripts are highly expressed in all patients, so there is no evidence that the differences in circulating parasite ages are attributable to any given cytoadhesion trait. Moreover, alternative hypothesis for earlier sequestration have not been considered. Thus, a more balanced discussion is needed and the figures and texts should be improved to explain this study to a broader readership.
Referees cross-commenting: Agree with the other two reviewers with the recommendation that no new experiments needed.
The investigators obtained malaria parasite samples from 32 adult travelers returning to Germany of which 10 had their first malaria episode, 9 had a history of malaria exposure while the malaria exposure status of 13 was unknown. They then generated anti-parasite antibody data using ELISA, luminex and protein microarray approach and then used the data to cluster the patients into malaria naïve and malaria exposed. They also used the clinical data to categorize the study participants into those with severe malaria and non-severe malaria. The parasite gene expression data was generated using whole genome transcriptome analysis approach and PCR amplification of var genes (DBLa-tag) and sequencing. The parasite gene expression data particularly var gene expression was related to host malaria pre-exposure and severity status. In summary, the study showed that parasites from naïve and severe cases express higher quantity of PfEMP1 containing the EPCR binding CIDRa1 domain compared to non-severe cases, while the pre-exposed and the non-severe cases expressed higher quantity of PfEMP1 containing domains known to mediate CD36 binding. In my opinion, the study methods and results have been described in details for expert in the field to follow but may be hard for non-experts because of the detailed analysis and data presented in the result section.
The methods have been adequately described and the paper is generally well written but has lots of detailed analysis that can only be followed by experts in the field which probably not a problem. Below are my comments which I hope the authors will find useful.
1. The gap in knowledge the study is trying to fill has not been captured well in the introduction
To address this comment that was also raised by reviewer #2 we modified our introduction, trying to emphasize more clearly why it is difficult to profile var gene transcription especially in patient isolates and the advantages of our approach (see also comment 4 of reviewer #2).
2. Linking all PfEMP1 proteins containing CIDRa1 to EPCR-binding, in my opinion, is an over simplification of PfEMP1's cytoadherence versatility. There is no doubt that PfEMP1 containing CIDRa1 domain play an important role in disease pathogenesis and several field and in vitro studies referenced in the manuscript have confirmed this, but I think the evidence that all CIDRa1 bind EPCR is not strong. Again, no clear conserved EPCR binding motif has been identified in CIDRa1 domains unlike ICAM-1 binding DBLb for which a conserved motif has been identified. To be on the caution side, I suggest the name "CIDRa1 domain containing PfEMP1" be retained without linking them to EPCR binding function.
We disagree on this point. The molecular requirements for EPCR binding have been clearly shown by structural resolution, test of many CIDRa1 domains and unambiguous bioinformatics (Lau et al., 2015; Rask et al., 2010). Sequence identity-wise it is very easy to identify an EPCR-binding CIDR domain (even by eye!). CIDR domains have a unique compositions of sequence motifs (= homology blocks, HBs) (Rask et al., 2010) and CIDRa1 domains with HB137 bind EPCR. The molecular basis of the few exceptions to this, the CIDRa1 variants not binding EPCR, are the var1-CIDRa1.2/3 and CIDRa1.5b. The molecular basis for these exceptions has been clarified and the reason is very clearly defined sequence wise. Apart from that, all parasites found to have HB137-containing CIDRa1 domains have bound EPCR. Only a single study found no EPCR binding of A types (Azasi et al., 2018), whereas other studies with same parasites have demonstrated EPCR binding (Turner et al., 2013; Lau et al., 2015; Bernabeu et al., 2016; Bernabeu et al., 2019, Kessler et al., 2017).
The mentioned ICAM1-binding DBLb motif is only specific for a subset (although the largest) of the group A type found DBLb domains. So not all A type ICAM1 binders have this motif and none of the group B type DBLb has this motif. Moreover, while EPCR-binding domains are rigid domains, DBLb domains change conformation in the ICAM1 binding, which may be related to the observation that no other trait of the DBLb domain can explain dome DBLb domains preference for ICAM1 or not.
3. I feel like there is no need to use the host immunity data to categorize the patients into malaria naive and preexposed. Instead, I suggest the antibody data, as a continuous variable, be correlated with the parasite gene expression to test the impact of host immunity on the phenotype (gene expression) of the infecting parasites. I think defining a binary variable (exposed and non-exposed group) using the host antibody data and then comparing their antibody response, as presented in Figure 1, is a circular analysis and not helpful.
Our rational for the categorization of patient samples into groups is based on the requirements for our differential gene expression analysis approach. The Luminex data were used for categorization of patients into first-time infected and pre-exposed, data from mADRB, ELISA and protein microarray validated our patient groups independently, but were mostly used to confirm patient #21 as first-time infected. The ultimate analysis, the random forest approach, was fed with data from all assays and clearly confirmed our subgrouping.
4. Severity of an infection is a factor of time (duration of infection), parasite virulence and host immunity. I think an analysis approach that allows testing of the relationship between the var expression (parasite virulence) and severity while adjusting for host immunity would have been better.
To address this and the previous point, we have additionally used the serological data as a continuous variable and plotted these PCR-reduced data from all assays against the expression of EPCR-binding CIDRa1 domains or CD36-binding CIDRa2-6 domains and calculated Spearman’s rank correlation coefficients (ρ). See Author response image 1. This analysis shows a weak negative correlation (ρ=-0.49) between host immunity and parasites expression of CIDRa1encoding transcripts; vice versa a weak positive correlation (ρ=0.36) was seen for host immunity and parasites expression of CD36 domains. However, this analysis has some limitations: (i) the severely ill patients were infected for a longer period and an already developed immune response during the acute infection may introduce a bias, (ii) a larger patient cohort would be necessary to see stronger correlations, but samples from travelers are rather rare (we sampled over 5 years only blood from 32 cases) and (iii) a higher expression of any PfEMP1 could also favor sequestration and severity, especially in adult cohorts with higher expression of CD36binding variants compared to children. To avoid confusions, we would suggest to leave this data point out of the manuscript.
1. What is the authors' interpretation of the fact that domains typical for CD36-binding PfEMP1 proteins are expressed at higher levels in malaria experienced while at the same time parasites from malaria experienced individuals also show elevated antibodies to the same PfEMP1 subset (Figure 9). Some few lines in the discussion will be helpful.
Since most PfEMP1 proteins have CIDRa2-6 domains capable to bind CD36, the parasite is able to express antigenically distant variants not recognized by the preformed immunity. We included a whole paragraph on CD36-binding in the discussion as also suggested by reviewer 2, major comment 2.
2. Any possibility of variation in var type expression over the asexual life cycle? And if yes, could the observed difference in parasite age between severe and non-severe cases be contributing to the observed variation in the expressed var subgroup between severe and non-severe?
Other studies have already shown that at least in NF54 parasites all var genes possess an almost identical expression pattern (parasite age of 5-25 hpi) (Dahlbäck et al., 2007). A transcription pattern different from the other var genes is only known for the var1 gene(s). In contrast to the expression seen in ring stages only for other var genes, var1 is expressed throughout the different life cycle stages (see Discussion section, Line 482ff). This may be explained by a deletion of the intron responsible for repression upon parasite maturation in some parasite lines. We checked for a correlation between parasite age and var1-3D7 expression in our sample set, but couldn’t find any. This is also clearly seen in the Figures, e.g., isolate #6 has a high abundance of early trophozoite stages (19 hpi) (Figure 2 —figure supplement 1), but we couldn’t detect any of the var1 variants.
3. Line 478-482: the statement seems not to adequately explain why there are circulating relatively older parasites in non-severe cases since trophozoite stage parasites are expected to exist in both severe and non-severe cases. May be revise the statement to capture what you wanted to state.
We deleted the whole paragraph.
4. In my opinion, the word (direct ex vivo) in the title is not necessary.
We have modified the title to ‘Gene expression profiling of adult malaria patients reveals common virulence gene expression in first-time infected patients and severe cases’
5. Line 112: the sentence beginning with "No other domain has been…" needs to be revised.
The whole paragraph was modified and this particular sentence deleted.
Significance: This a descriptive study and the data generated confirms results of previously published studies and will be useful to Malariologist and those working in the field of malaria pathogenesis. I think results that confirm published findings, such as this study, are as important as those presenting novel findings. Overall, the conclusions are supported by the results and the methods used have been adequately described and presented. New experiments have not been recommended, instead an alternative analysis approach have been suggested (see comments).
My expertise: I am a Malariologist with particular experience in the field of malaria pathogenesis.
Referees cross-commenting: Went through the comments of the two other reviewers and I am happy to see that our observations are very close
Article and author information
Deutsche Forschungsgemeinschaft (BA 5213/3-1)
- Jan Stephan Wichers
- Anna Bachmann
Danish Council for Independent Research (9039-00285B)
- Rasmus Weisel Jensen
- Louise Turner
- Thomas Lavstsen
Deutsches Zentrum für Infektionsforschung (TTU Malaria)
- Ralf Krumkamp
- Egbert Tannich
- Rolf Fendel
- Anna Bachmann
- Jan Strauss
- Tim Wolf Gilberger
- Judith Anna Marie Scholz
National Health and Medical Research Council
- Michael Duffy
Kirsten og Freddy Johansens Fond
- Rasmus Weisel Jensen
- Louise Turner
- Thomas Lavstsen
- Rasmus Weisel Jensen
- Louise Turner
- Thomas Lavstsen
Wellcome Trust (104111/Z/14/ZR)
- Thomas D Otto
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
We thank all the patients who provided an extra blood sample for our research purposes. We would also like to thank the staff of the I Medical Department at the UKE and of the Bundeswehrkrankenhaus for identifying patients for the study and, specifically, Maria Mackroth, Julian Schulze zur Wiesch, Johannes Jochum, and Thierry Rolling for assisting in the recruitment of patients. Furthermore, we thank Jürgen May for critical reading of the manuscript and Tobias Spielmann for helpful discussions. We thank Marlene Danner Dalgaard, Kathrine Hald Langhoff, and Sif Ravn Søeborg for technical assistance with DBLα-tag sequencing.
Human subjects: The study was conducted according to the principles of the Declaration of Helsinki in its 6th revision as well as International Conference on Harmonization-Good Clinical Practice (ICH-GCP) guidelines. Blood samples for this analysis were collected after patients were informed about the aims and risks of the study and signed an informed consent form for voluntary blood draw (n=21). In the remaining cases, no designated blood samples were drawn, instead remains from diag-nostic blood samples were used (n=11). The study was approved by the relevant ethics committee (Ethical Review Board of the Medical Association of Hamburg, reference numbers PV3828 and PV4539).
- Dominique Soldati-Favre, University of Geneva, Switzerland
- Urszula Krzych, Walter Reed Army Institute of Research, United States
- Received: April 6, 2021
- Accepted: April 18, 2021
- Accepted Manuscript published: April 28, 2021 (version 1)
- Version of Record published: May 6, 2021 (version 2)
© 2021, Wichers et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
- Page views
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
- Cell Biology
- Chromosomes and Gene Expression
Most studies of cohesin function consider the Stromalin Antigen (STAG/SA) proteins as core complex members given their ubiquitous interaction with the cohesin ring. Here, we provide functional data to support the notion that the SA subunit is not a mere passenger in this structure, but instead plays a key role in the localization of cohesin to diverse biological processes and promotes loading of the complex at these sites. We show that in cells acutely depleted for RAD21, SA proteins remain bound to chromatin, cluster in 3D and interact with CTCF, as well as with a wide range of RNA binding proteins involved in multiple RNA processing mechanisms. Accordingly, SA proteins interact with RNA, and R-loops, even in the absence of cohesin. Our results place SA1 on chromatin upstream of the cohesin ring and reveal a role for SA1 in cohesin loading which is independent of NIPBL, the canonical cohesin loader. We propose that SA1 takes advantage of structural R-loop platforms to link cohesin loading and chromatin structure with diverse functions. Since SA proteins are pan-cancer targets, and R-loops play an increasingly prevalent role in cancer biology, our results have important implications for the mechanistic understanding of SA proteins in cancer and disease.
- Chromosomes and Gene Expression
- Genetics and Genomics
Degradation of most yeast mRNAs involves decapping by Dcp1/Dcp2. DEAD-box protein Dhh1 has been implicated as an activator of decapping, in coupling codon non-optimality to enhanced degradation, and as a translational repressor, but its functions in cells are incompletely understood. RNA-Seq analyses coupled with CAGE sequencing of all capped mRNAs revealed increased abundance of hundreds of mRNAs in dcp2Δ cells that appears to result directly from impaired decapping rather than elevated transcription. Interestingly, only a subset of mRNAs requires Dhh1 for targeting by Dcp2, and also generally requires the other decapping activators Pat1, Edc3 or Scd6; whereas most of the remaining transcripts utilize NMD factors for Dcp2-mediated turnover. Neither inefficient translation initiation nor stalled elongation appears to be a major driver of Dhh1-enhanced mRNA degradation. Surprisingly, ribosome profiling revealed that dcp2Δ confers widespread changes in relative translational efficiencies (TEs) that generally favor well-translated mRNAs. Because ribosome biogenesis is reduced while capped mRNA abundance is increased by dcp2&Delta, we propose that an increased ratio of mRNA to ribosomes increases competition among mRNAs for limiting ribosomes to favor efficiently translated mRNAs in dcp2Δ cells. Interestingly, genes involved in respiration or utilization of alternative carbon or nitrogen sources are up-regulated, and both mitochondrial function and cell filamentation are elevated in dcp2Δ cells, suggesting that decapping sculpts gene expression post-transcriptionally to fine-tune metabolic pathways and morphological transitions according to nutrient availability.