Research Advance

Microbiology and Infectious Disease

A novel computational pipeline for var gene expression augments the discovery of changes in the Plasmodium falciparum transcriptome during transition from in vivo to short-term in vitro culture

Section of Paediatric Infectious Disease, Department of Infectious Disease, Imperial College London, United Kingdom
Department of Life Sciences, Imperial College London, South Kensington, United Kingdom
Centre for Paediatrics and Child Health, Imperial College London, United Kingdom
Bernhard Nocht Institute for Tropical Medicine, Bernhard-Nocht-Strasse, Germany
Centre for Structural Systems Biology, Germany
Biology Department, University of Hamburg, Germany
Center for Medical Parasitology, Department of Immunology and Microbiology, University of Copenhagen, Denmark
Department of Infectious Diseases, Copenhagen University Hospital, Denmark
Department of Microbiology and Immunology, University of Melbourne, Australia
School of Biomedical Sciences, Faculty of Medicine & Health, UNSW, Kensington, United Kingdom
School of Infection & Immunity, MVLS, University of Glasgow, United Kingdom
German Center for Infection Research (DZIF), partner site Hamburg-Borstel-Lübeck-Riems, Germany

Jan 25, 2024

https://doi.org/10.7554/eLife.87726.3

Open access
Copyright information

eLife assessment

Focusing mainly on var genes, the investigators performed comprehensive computational analyses of gene expression in malaria parasites isolated from patients and assessed changes that occur as these parasites adapt to in vitro culture conditions. The study provides an improved computational pipeline for monitoring var gene expression, and importantly, the study documents changes in expression of the core genome and thus provides important insights into metabolic adaptations that parasites undergo while transitioning to culture conditions. The findings are important for their technical advances that are more rigorous than the current state-of-the-art. The solid data analyses, broadly support the claims with only minor weaknesses, tell us to be cautious when interpreting results obtained only from cultured parasites.

https://doi.org/10.7554/eLife.87726.3.sa0

Significance of the findings:

Important: Findings that have theoretical or practical implications beyond a single subfield

Landmark
Fundamental
Important
Valuable
Useful

Strength of evidence:

Solid: Methods, data and analyses broadly support the claims with only minor weaknesses

Exceptional
Compelling
Convincing
Solid
Incomplete
Inadequate

During the peer-review process the editor and reviewers write an eLife Assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife Assessments

Abstract
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

The pathogenesis of severe Plasmodium falciparum malaria involves cytoadhesive microvascular sequestration of infected erythrocytes, mediated by P. falciparum erythrocyte membrane protein 1 (PfEMP1). PfEMP1 variants are encoded by the highly polymorphic family of var genes, the sequences of which are largely unknown in clinical samples. Previously, we published new approaches for var gene profiling and classification of predicted binding phenotypes in clinical P. falciparum isolates (Wichers et al., 2021), which represented a major technical advance. Building on this, we report here a novel method for var gene assembly and multidimensional quantification from RNA-sequencing that outperforms the earlier approach of Wichers et al., 2021, on both laboratory and clinical isolates across a combination of metrics. Importantly, the tool can interrogate the var transcriptome in context with the rest of the transcriptome and can be applied to enhance our understanding of the role of var genes in malaria pathogenesis. We applied this new method to investigate changes in var gene expression through early transition of parasite isolates to in vitro culture, using paired sets of ex vivo samples from our previous study, cultured for up to three generations. In parallel, changes in non-polymorphic core gene expression were investigated. Modest but unpredictable var gene switching and convergence towards var2csa were observed in culture, along with differential expression of 19% of the core transcriptome between paired ex vivo and generation 1 samples. Our results cast doubt on the validity of the common practice of using short-term cultured parasites to make inferences about in vivo phenotype and behaviour.

Introduction

Malaria is a parasitic life-threatening disease caused by species of the Plasmodium genus. In 2021, there were an estimated 619,000 deaths due to malaria, with children under 5 accounting for 77% of these (WHO, 2022). Plasmodium falciparum causes the greatest disease burden and most severe outcomes, but our efforts to combat the disease are challenged by its complex life cycle and its sophisticated immune evasion strategies. P. falciparum has several highly polymorphic variant surface antigens encoded by multi-gene families, with the best studied high molecular weight P. falciparum erythrocyte membrane protein 1 (PfEMP1) family of proteins known to play a major role in the pathogenesis of malaria (Leech et al., 1984; Wahlgren et al., 2017). About 60 polymorphic var genes per parasite genome encode different PfEMP1 variants, which are exported to the surface of parasite-infected erythrocytes, where they mediate cytoadherence to host endothelial cells (Leech et al., 1984; Su et al., 1995; Smith et al., 1995; Baruch et al., 1995; Rask et al., 2010). Var genes are expressed in a mutually exclusive pattern, resulting in each parasite expressing only one var gene, and therefore one PfEMP1 protein, at a time (Scherf et al., 1998). Due to the exposure of PfEMP1 proteins to the host immune system, switching expression between the approximately 60 var genes in the genome is an effective immune evasion strategy, which can result in selection and dominance of parasites expressing particular var genes within each host (Smith et al., 1995).

Despite their sequence polymorphism, var genes could be classified into four categories (A, B, C, and E) according to their chromosomal location, transcriptional direction, type of 5'-upstream sequence (UPSA–E), and encoded protein domains with associated binding phenotype (Figure 1; Lavstsen et al., 2003; Kraemer and Smith, 2003; Kyes et al., 2007; Rask et al., 2010). PfEMP1 proteins have up to 10 extracellular domains, with the N-terminal domains forming a semi-conserved head structure complex typically containing the N-terminal segment (NTS), a Duffy binding-like domain of class α (DBLα) coupled to a cysteine-rich interdomain region (CIDR). C-terminally to this head structure, PfEMP1 proteins exhibit a varying but semi-ordered composition of additional DBL and CIDR domains of different subtypes (Figure 1c). The PfEMP1 family divides into three main groups based on the receptor specificity of the N-terminal CIDR domain: (i) PfEMP1 proteins with CIDRα1 domains bind endothelial protein C receptor (EPCR), while (ii) PfEMP1 proteins with CIDRα2–6 domains bind CD36 and (iii) the atypical VAR2CSA PfEMP1 proteins bind placental chondroitin sulphate A (CSA) (Salanti et al., 2004). In addition to these, a subset of PfEMP1 proteins have N-terminal CIDRβ/γ/δ domains of unknown function. This functional diversification correlates with the genetic organisation of the var genes. Thus, UPSA var genes encode PfEMP1 proteins with domain sub-variants NTSA-DBLα1-CIDRα1/β/γ/δ, whereas UPSB and UPSC var genes encode PfEMP1 proteins with NTSB-DBLα0-CIDRα2–6. One exception to this rule is the B/A chimeric var genes, which encode NTSB-DBLα2-CIDRα1 domains. The different receptor binding specificities are associated with different clinical outcomes of infection. Pregnancy-associated malaria is linked to parasites expressing VAR2CSA, whereas parasites expressing EPCR-binding PfEMP1 are linked to severe malaria and parasites expressing CD36-binding PfEMP1 are linked to uncomplicated malaria (Turner et al., 2013; Lavstsen et al., 2012; Avril et al., 2012; Claessens et al., 2012; Tonkin-Hill et al., 2018; Wichers et al., 2021). The clinical relevance of PfEMP1 proteins with unknown binding phenotypes of the N-terminal head structure and C-terminal PfEMP1 domains is largely unknown, albeit specific interactions with endothelial receptors and plasma proteins have been described (Tuikue Ndam et al., 2017; Quintana et al., 2019; Stevenson et al., 2015). Each parasite genome carries a similar repertoire of var genes, which in addition to the described variants include a highly conserved var1 variant of either type 3D7 or IT, which in most genomes occurs with a truncated or absent exon 2. Also, most genomes carry the unusually small and highly conserved var3 genes, of unknown function (Figure 1c; Otto et al., 2019).

Figure 1

Download asset Open asset

Summary of the *var* chromosomal location, *var* gene, PfEMP1 protein structure, and PfEMP1-binding phenotypes.

(a) Chromosomal position and transcriptional direction (indicated by arrows) of the different *var* gene groups, designated by the respective type of upstream sequence (Kraemer and Smith, 2003; Lavstsen et al., 2003). (b) Structure of the *var* gene which encodes the PfEMP1 protein. The *var* gene is composed of two exons, the first, around 3–9.4 kb, encodes the highly variable extracellular region and the transmembrane region (TM) of PfEMP1. Exon 2 is shorter with about 1.2 kb and encodes a semi-conserved intracellular region (acidic terminal segment [ATS]). The PfEMP1 protein is composed of an N-terminal segment (NTS), followed by a variable number of Duffy binding-like (DBL) domains and cysteine-rich interdomain regions (CIDR) (Rask et al., 2010). (c) Summary of PfEMP1 proteins encoded in the parasite genome, their composition of domain subtypes, and associated N-terminal binding phenotype. Group A and some B proteins have an EPCR-binding phenotype; the vast majority of group B and C PfEMP1 proteins bind to CD36. Group A proteins also include those that bind a yet unknown receptor, as well as VAR1 and VAR3 variants with unknown function and binding phenotype. VAR2CSA (group E) binds placental chondroitin sulphate A (CSA).

Comprehensive characterisation and quantification of var gene expression in field samples have been complicated by biological and technical challenges. The extreme polymorphism of var genes precludes a reference var sequence. Var genes can be lowly expressed or not expressed at all, contain repetitive domains, and can have large duplications (Otto et al., 2019). Consequently, most studies relating var gene expression to severe malaria have relied on primers with restricted coverage of the var family, use of laboratory-adapted parasite strains, or have predicted the downstream sequence from DBLα domains (Sahu et al., 2021; Storm et al., 2019; Shabani et al., 2017; Mkumbaye et al., 2017; Kessler et al., 2017; Bernabeu et al., 2016; Jespersen et al., 2016; Lavstsen et al., 2012). This has resulted in incomplete var gene expression quantification and the inability to elucidate specific or detect atypical var sequences. RNA-sequencing has the potential to overcome these limitations and provide a better link between var expression and PfEMP1 phenotype in in vitro assays, co-expression with other genes or gene families and epigenetics. While approaches for var assembly and quantification based on RNA-sequencing have recently been proposed (Wichers et al., 2021; Stucke et al., 2021; Andrade et al., 2020; Tonkin-Hill et al., 2018; Duffy et al., 2016 ), these still produce inadequate assembly of the biologically important N-terminal domain region, have a relatively high number of misassemblies, and do not provide an adequate solution for handling the conserved var variants (Table 1).

Table 1

Comparison of previous var assembly approaches based on DNA- and RNA-sequencing.

Study	Assembler	k-mer	Transcript or gene assembly	Validation on reference strain(s)(Yes/No)	Validation on field strain(s)(Yes/No)	Validation across different expression levels (Yes/No)	Read length (Short/Long)	Read correction (Yes/No)	Scaffolding(Yes/No)	Var transcript filter approach	Assumption	Other limitations
Duffy et al., 2016	Oases		Transcript	No	No	No	Short	Yes	Yes	Aligned to 399 var genes with BLAST (e-value <10^–5)		– No quantification of misassemblies – Unable to recover full-length transcript assemblies
Dara et al., 2017	Sprai and Celera (no longer maintained)	71	Gene	Yes (strain NF54)	Yes (12 UM patient samples)	NA – only genome assemblies	Long and short	Method assumes combination of long- and short-read sequencing will identify errors	No	>500 bp and aligned to VarDom database	Assumes a whole genome assembly is available	– Require prior filtering of human DNA – Need a combination of short-read and long-read sequencing
Tonkin-Hill et al., 2018 ^*	SoapDeNovo-Trans	21, 31, 41, 52, and 61	Transcript	Yes (strain ITG)	No	No	Short	No	No	>500 bp and containing a sig. annotated var domain		– Unable to fully resolve N-terminus – No quantification of misassemblies
Otto et al., 2019	Masurca +post-assembly improvements	Default	Gene	Yes (clone 3D7)	Yes (15 Pf3k reference genomes)	No	Short	Yes	Yes		Whole genome dataset
Andrade et al., 2020	Velvet	41	Transcript	No	No	No	Short	No	No	Aligned to VarDom database		– No quantification of misassemblies
Stucke et al., 2021	rnaSPAdes	Default	Transcript	No	Yes (6 UM patient samples)	No – only the most expressed var gene	Short	Yes	Unclear	>500 bp and containing a sig. annotated var domain	Information about the true var annotation is available	– Performs de novo assembly on all non-human and P. falciparum mapping reads – Inconsistent results in three samples when comparing genomic and RNA-sequencing results for dominant var gene

*

Also used in Wichers et al., 2021; Guillochon et al., 2022.
UM: uncomplicated malaria.

Plasmodium parasites from human blood samples are often adapted to or expanded through in vitro culture to provide sufficient parasites for subsequent investigation of parasite biology and phenotype (Brown and Guler, 2020). This is also the case for several studies assessing the PfEMP1 phenotype of parasites isolated from malaria-infected donors (Pickford et al., 2021; Joste et al., 2020; Storm et al., 2019; Tuikue Ndam et al., 2017; Bruske et al., 2016; Claessens et al., 2012; Lavstsen et al., 2005; Jensen et al., 2004; Kirchgatter and Hernando del, 2002; Dimonte et al., 2016; Hoo et al., 2019). However, in vitro conditions are considerably different to those found in vivo, e.g., in terms of different nutrient availability and lack of a host immune response (Brown and Guler, 2020). Previous studies found inconsistent results in terms of whether var gene expression is impacted by culture and, if so, which var groups were the most affected (Zhang et al., 2011; Peters et al., 2007). Similar challenges apply to the understanding of changes in P. falciparum non-polymorphic core genes in culture, with the focus previously being on long-term laboratory-adapted parasites (Claessens et al., 2017; Mackinnon et al., 2009). Consequently, direct interpretation of a short-term cultured parasite’s transcriptome remains a challenge. It is fundamental to understand var genes in context with the parasite’s core transcriptome. This could provide insights into var gene regulation and phenomena such as the proposed lower level of var gene expression in asymptomatic individuals (Almelli et al., 2014; Andrade et al., 2020).

Here, we present an improved method for assembly, characterisation, and quantification of var gene expression from RNA-sequencing data. This new approach overcomes previous limitations and outperforms current methods, enabling a much greater understanding of the var transcriptome. We demonstrate the power of this new approach by evaluating changes in var gene expression of paired samples from clinical isolates of P. falciparum during their early transition to in vitro culture, across several generations. The use of paired samples, which are genetically identical and hence have the same var gene repertoire, allows validation of assembled transcripts and direct comparisons of expression. We complement this with a comparison of changes which occur in the non-polymorphic core transcriptome over the same transition into culture. We find a background of modest changes in var gene expression with unpredictable patterns of var gene switching, favouring an apparent convergence towards var2csa expression. More extensive changes were observed in the core transcriptome during the first cycle of culture, suggestive of a parasite stress response.

Results

To extend our ability to characterise var gene expression profiles and changes over time in clinical P. falciparum isolates, we set out to improve current assembly methods. Previous methods for assembling var transcripts have focussed on assembling whole transcripts (Tonkin-Hill et al., 2018; Wichers et al., 2021; Guillochon et al., 2022; Andrade et al., 2020). However, due to the diversity within PfEMP1 domains, their associations with disease severity, and the fact different domain types are not inherited together, a method focussing on domain assembly was first developed. In addition, a novel whole transcript approach, using a different de novo assembler, was developed and their performance compared to the method of Wichers et al. (hereafter termed ‘original approach’, Figure 2; Wichers et al., 2021). The new approaches made use of the MalariaGEN P. falciparum dataset, which led to the identification of additional multi-mapping non-core reads (a median of 3955 reads per sample) prior to var transcript assembly (Ahouidi et al., 2021). We incorporated read error correction and improved large scaffold construction with fewer misassemblies (see Materials and methods). We then applied this pipeline to paired ex vivo and short-term in vitro cultured parasites to enhance our understanding of the impact of short-term culturing on the var transcriptome (Figure 3). The var transcriptome was assessed at several complementary levels: first, changes in the dominantly expressed var gene and the homogeneity of the var expression profile in paired samples were investigated; second, changes in var domain expression through culture were assessed; and third, var group and global var gene expression changes were evaluated. All these analyses on var expression were accompanied by analysis of the core transcriptome at the transition to short-term culture.

Figure 2 with 4 supplements see all

Download asset Open asset

Overview of novel computational pipelines for assembling *var* transcripts.

The original approach (green) used SoapDeNovo-Trans (k=71) to perform whole *var* transcript assembly. The whole transcript approach (blue) focussed on assembling whole *var* transcripts from the non-core reads using rnaSPAdes (k=71). Contigs were then joined into longer transcripts using SSPACE. The domain approach (orange) assembled *var* domains first and then joined the domains into whole transcripts. Domains were assembled separately using three different de novo assemblers (SoapDeNovo2, Oases, and rnaSPAdes). Next, a reference of assembled domains was composed and cd-hit (at sequence identity = 99%) was used to remove redundant sequences. Cap3 was used to merge and extend domain assemblies. Finally, SSPACE was used to join domains together. Hidden Markov models (HMM) built on the Rask et al., 2010, dataset were used to annotate the assembled transcripts (Rask et al., 2010). The most significant alignment was taken as the best annotation for each region of the assembled transcript (significance ≤ 1e-5) identified using cath-resolve-hits0. Transcripts <500 nt were removed. A *var* transcript was selected if it contained at least one significantly annotated domain (in exon 1). *Var* transcripts that encoded only the more conserved exon 2 (acidic terminal segment [ATS] domain) were discarded. The three pipelines were run on the 32 malaria patient ex vivo samples from Wichers et al., 2021.

Figure 3 with 2 supplements see all

Download asset Open asset

Summary of analyses of *var* and core gene transcriptome changes in paired ex vivo and short-term in vitro cultured parasites.

From a total of 13 parasite isolates, the ex vivo samples (Wichers et al., 2021) and the corresponding in vitro cultured parasites of the first (n=13), second (n=10), and third (n=1) replication cycle were analysed by RNA-sequencing. The expression of non-polymorphic core genes and polymorphic *var* genes was determined in different analysis streams: (1) non-polymorphic core gene reads were mapped to the 3D7 reference genome, expression was quantified using HTSeq and differential expression analysis performed (orange); (2) non-core reads were identified, whole transcripts were assembled with rnaSPAdes, expression of both *var* transcripts (red) and domains (light green) was quantified, and *var* domain differential expression analysis was performed. ‘Per patient analysis’ (yellow) represents combining all assembled *var* transcripts for samples originating from the same ex vivo sample only. For each conserved *var* gene (*var1-3D7, var1-IT, var2csa,* and *var3*), all significantly assembled conserved *var* transcripts were identified and put into a combined reference (pink). The normalised counts for each conserved gene were summed. Non-core reads were mapped to this and DESeq2 normalisation performed. *Var* type (group A vs groups B and C) expression (purple) was quantified using the Duffy binding-like domain of class α (DBLα) and N-terminal segment (NTS) assembled sequences and differences across generations were assessed. Total *var* gene expression (turquoise) was quantified by assembling and quantifying the coverage over the highly conserved LARSFADIG motif, with the performance of assembly using Trinity and rnaSPAdes assessed. DBLα-tag data was used to confirm the results of the dominant *var* gene expression analysis and the *var* type analysis (dark green). *Var* expression homogeneity (VEH) was analysed at the patient level (α diversity curves). All differential expression analyses were performed using DESeq2.To ensure a fair comparison of samples, which may contain different proportions of life cycle stages, the performance of two different in silico approaches was evaluated by counting Giemsa-stained thin blood smears (blue).

Improving var transcript assembly, annotation, and quantification

A laboratory and a clinical dataset were used to assess the performance of the different var assembly pipelines (Figure 2). The laboratory dataset was a P. falciparum 3D7 time course RNA-sequencing dataset (European nucleotide archive [ENA]: PRJEB31535) (Wichers et al., 2019). The clinical dataset contained samples from 32 adult malaria patients, hospitalised in Hamburg, Germany (National Center for Biotechnology Information [NCBI] BioProject ID: PRJNA679547). Fifteen were malaria-naïve and 17 were previously exposed to malaria. Eight of the malaria-naïve patients went on to develop severe malaria and 24 had non-severe malaria (Wichers et al., 2021).

Our (i) new whole transcript approach, (ii) domain assembly approach, and (iii) modified version of the original approach (see Materials and methods) were first applied to a P. falciparum 3D7 time course RNA-sequencing dataset to benchmark their performance (Wichers et al., 2019; Figure 2—figure supplement 1). The whole transcript approach performed best, achieving near perfect alignment scores for the dominantly expressed var gene (Figure 2—figure supplement 1a). The domain and the original approach produced shorter contigs and required more contigs to assemble the var transcripts at the 8 and 16 hr post-invasion time points, when var gene expression is maximal (Figure 2—figure supplement 1c, f, g, and h). However, we found high accuracies (>0.95) across all approaches, meaning the sequences we assembled were correct (Figure 2—figure supplement 1b). The whole transcript approach also performed the best when assembling the lower expressed var genes (Figure 2—figure supplement 1e) and produced the fewest var chimeras compared to the original approach on P. falciparum 3D7. Fourteen misassemblies were observed with the whole transcript approach compared to 19 with the original approach (Supplementary file 1). This reduction in misassemblies was particularly apparent in the ring-stage samples.

Next, the assembled transcripts produced from the original approach of Wichers et al., 2021, were compared to those produced from our new whole transcript and domain assembly approaches for ex vivo samples from German travellers. Summary statistics are shown in Table 2. The whole transcript approach produced the fewest transcripts, but of greater length than the domain approach and the original approach (Figure 2—figure supplement 2). The whole transcript approach also returned the largest N50 score (more than doubling the N50 of the original approach), which means that it was the most contiguous assembly produced. Remarkably, with the new whole transcript method, we observed a significant decrease (2 vs 336) in clearly misassembled transcripts with, for example, an N-terminal domain at an internal position.

Table 2

Statistics for the different approaches used to assemble the var transcripts.

Var assembly approaches were applied to malaria patient ex vivo samples (n=32) from Wichers et al., 2021, and statistics determined. Given are the total number of assembled var transcripts longer than 500 nt containing at least one significantly annotated var domain, the maximum length of the longest assembled var transcript in nucleotides and the N50 value, respectively. The N50 is defined as the sequence length of the shortest var contig, with all var contigs greater than or equal to this length together accounting for 50% of the total length of concatenated var transcript assemblies. Misassemblies represent the number of misassemblies for each approach.

	Number of contigs ≥500 nt	Maximum length (nt)	Average contig length (nt)	N50	Number of misassemblies
Original approach	6441	10,412	1621	2302	336
Domain approach	4691	5003	954	1088	NA*
Whole transcript approach	3011	12,586	2771	5381	2

*

Number of misassemblies were not determined for the domain approach due to its poor performance in other metrics.

When genome sequencing is not available, concordance of different var profiling approaches can support the validation of an approach. Here, the same methods used in the original analysis were applied for quantifying the expression of the assembled var transcripts and domains. This suggests any concordance in expression estimates likely reflects concordance at the domain annotation level. The original approach and the new whole transcript approach gave similar results for domain expression in each sample with greater correlation in results observed between the highly expressed domains (Figure 2—figure supplement 3). As expected, comparable results were also seen for the differentially expressed transcripts identified in the original analysis between the naïve vs pre-exposed and severe vs non-severe comparisons, respectively (Figure 2—figure supplement 4).

Overall, the new whole transcript approach performed the best on the laboratory 3D7 dataset (ENA: PRJEB31535) (Wichers et al., 2019), had the greatest N50, the longest var transcripts, and produced concordant results with the original analysis on the clinical ex vivo samples (NCBI: PRJNA679547) (Wichers et al., 2021). Therefore, it was selected for all subsequent analyses unless specified otherwise.

Establishing characterisation of var transcripts from ex vivo and in vitro samples

Of the 32 clinical isolates of P. falciparum from the German traveller dataset, 13 underwent one replication cycle of in vitro culture, 10 of these underwent a second generation and one underwent a third generation (Table 3). Most (9/13, 69%) isolates entering culture had a single MSP1 genotype, indicative of monoclonal infections. All samples were sequenced with a high read depth, although the ex vivo samples had a greater read depth than the in vitro samples (Table 3). Figure 3 shows a summary of the analysis performed.

Table 3

Summary of the clinical dataset used to analyse the impact of parasite culturing on gene expression.

RNA-sequencing was performed on 32 malaria-infected German traveller samples (Wichers et al., 2021). The 32 ex vivo samples were used to compare the performance of the var assembly approaches. Parasites from 13 of these ex vivo samples underwent one cycle of in vitro replication, 10 parasite samples were also subjected to a second cycle of replication in vitro, and a single parasite isolate was also analysed after a third cycle of replication. For the ex vivo vs short-term in vitro cultivation analysis only paired samples were used. The number of assembled var contigs represents results per sample using the whole transcript approach, and shows either the number of assembled var contigs significantly annotated as var gene and ≥500 nt in length, or the number of assembled var transcripts identified with a length ≥ 1500 nt and containing at least three significantly annotated var domains*.

			Generation
		Ex vivo (n=32)		1 (n=13)	2 (n=10)	3 (n=1)
Malaria exposure (n)	Naïve		15	6	4	1
Malaria exposure (n)	Previously exposed		17	7	6	0
Malaria severity (n)	Severe		8	3	1	0
Malaria severity (n)	Non-severe		24	10	9	1
Number of MSP1 genotypes (number of samples)	1		22	9	7	1
	2		4	0	0	0
	3		5	3	0	0
	4		1	1	0	0
Number of P. falciparum PE* reads (non-var) (median, IQR) (million of reads)			34.6 (27.0–36.5)	17.1 (12.9–18.0)	17.2 (12.9–19.1)	15.1
Number of non-core P. falciparum PE* reads (median, IQR) (million of reads)			5.05 (3.62–6.60)	1.16 (1.07–1.40)	1.29 (1.04–1.58)	0.91
Number of assembled var contigs in a sample (≥500 nt) (whole transcript approach) (median, IQR)			53 (44–84)	61 (38–76)	71.5 (48.25–79.5)	75
Number of assembled var contigs in a sample (≥1500 nt and 3 sig. domain annotations) (whole transcript approach) (median, IQR)			20 (7–31)	15.5 (10–26)	15 (10.25–23.75)	18

*

PE, paired-end reads.

To account for differences in parasite developmental stage within each sample, which are known to impact gene expression levels (Bozdech et al., 2003), the proportions of life cycle stages were estimated using the mixture model approach of the original analysis (Tonkin-Hill et al., 2018; Wichers et al., 2021). As a complementary approach, single-cell differential composition analysis (SCDC) with the Malaria Cell Atlas as a reference was also used to determine parasite age (Dong et al., 2021; Howick et al., 2019). SCDC and the mixture model approaches produced concordant estimates that most parasites were at ring stage in all ex vivo and in vitro samples (Figure 3—figure supplement 1a and b). Whilst there was no significant difference in ring-stage proportions across the generations, we observed a slight increase in parasite age in the cultured samples. Overall, there were more rings and early trophozoites in the ex vivo samples compared to the cultured parasite samples and an increase of late trophozoite, schizont, and gametocyte proportions during the culturing process (Figure 3—figure supplement 1c). The estimates produced from the mixture model approach showed high concordance with those observed by counting Giemsa-stained blood smears (Figure 3—figure supplement 1d). Due to the potential confounding effect of differences in stage distribution on gene expression, we adjusted for developmental stage determined by the mixture model in all subsequent analyses.

Our new approach was applied to RNA-sequencing samples of ex vivo and short-term in vitro cultured parasites from German travellers (Wichers et al., 2021). Supplementary file 2 shows the assembled var transcripts on a per sample basis. Interestingly, we observed SSPACE did not provide improvement in terms of extending var assembled contigs in 9/37 samples. We observed a significant increase in the number of assembled var transcripts in generation 2 parasites compared to paired generation 1 parasites (p_adj = 0.04, paired Wilcoxon test). We observed no significant differences in the length of the assembled var transcripts across the generations. Three different filtering approaches were applied in comparison to maximise the likelihood that correct assemblies were taken forward for further analysis and to avoid the overinterpretation of lowly expressed partial var transcripts (Supplementary file 3). Filtering for var transcripts at least 1500 nt long and containing at least three significantly annotated var domains was the least restrictive, while the other approaches required the presence of a DBLα domain within the transcript. All three filtering approaches generated the same maximum length var transcript and similar N50 values. This suggests minimal differences in the three filtering approaches, whilst highlighting the importance of filtering assembled var transcripts.

In the original approach of Wichers et al., 2021, the non-core reads of each sample used for var assembly were mapped against a pooled reference of assembled var transcripts from all samples, as a preliminary step towards differential var transcript expression analysis. This approach returned a small number of var transcripts which were expressed across multiple patient samples (Figure 3—figure supplement 2a). As genome sequencing was not available, it was not possible to know whether there was truly overlap in var genomic repertoires of the different patient samples, but substantial overlap was not expected. Stricter mapping approaches (e.g. excluding transcripts shorter than 1500 nt) changed the resulting var expression profiles and produced more realistic scenarios where similar var expression profiles were generated across paired samples, whilst there was decreasing overlap across different patient samples (Figure 3—figure supplement 2b and c). Given this limitation, we used the paired samples to analyse var gene expression at an individual subject level, where we confirmed the MSP1 genotypes and alleles were still present after short-term in vitro cultivation. The per patient approach showed consistent expression of var transcripts within samples from each patient but no overlap of var expression profiles across different patients (Figure 3—figure supplement 2d). Taken together, the per patient approach was better suited for assessing var transcriptional changes in longitudinal samples. However, it has been hypothesised that more conserved var genes in field isolates increase parasite fitness during chronic infections, necessitating the need to correctly identify them (Dimonte et al., 2020, Otto et al., 2019). Accordingly, further work is needed to optimise the pooled sample approach to identify truly conserved var transcripts across different parasite isolates in cross-sectional studies.

Longitudinal analysis of var transcriptome from ex vivo to in vitro samples

To assess the changes in the var transcriptome induced by parasite culturing, we performed a series of analyses, all of which addressed different aspects: (i) changes in individual var gene expression pattern and var expression homogeneity (VEH) (‘per patient analysis’), (ii) changes in the expression of var variants conserved between strains, (iii) changes in the expression of PfEMP1 domains, (iv) changes in expression at the var group level, and (v) at the overall var expression level. We validated our results using the DBLα-tag approach and complemented the var-specific analysis by also examining changes in the core transcriptome.

To investigate whether dominant var gene expression changes through in vitro culture, rank analysis of var transcript expression was performed (Figure 4, Figure 4—figure supplement 1). In most cases a single dominant var transcript was detected. The dominant var gene did not change in most patient samples and the ranking of var gene expression remained similar. However, we observed a change in the dominant var gene being expressed through culture in isolates from 3 of 13 (23%) patients (#6, #17, and #26). Changes in the dominant var gene expression were also observed in the DBLα-tag data for these patients (described below). In parasites from three additional patients, #1, #7, and #14, the top expressed var gene remained the same, however we observed a change in the ranking of other highly expressed var genes in the cultured samples compared to the ex vivo sample. Interestingly, in patient #26 we observed a switch from a dominant group A var gene to a group B and C var gene. This finding was also observed in the DBLα-tag analysis (results below). A similar finding was seen in patient #7. In the ex vivo sample, the second most expressed var transcript was a group A transcript. However, in the cultured samples expression of this transcript was reduced and we observed an increase in the expression of group B and C var transcripts. A similar pattern was observed in the DBLα-tag analysis for patient #7, whereby the expression of a group A transcript was reduced during the first cycle of cultivation. Overall, the data suggest that some patient samples underwent a larger var transcriptional change when cultured compared to the other patient samples and that culturing parasites can lead to an unpredictable var transcriptional change.

Figure 4 with 3 supplements see all

Download asset Open asset

Rank *var* gene expression analysis.

For each patient, the paired ex vivo (n=13) and in vitro samples (generation 1: n=13, generation 2: n=10, generation 3: n=1) were analysed. The assembled *var* transcripts with at least 1500 nt and containing three significantly annotated *var* domains across all the generations for a patient were combined into a reference, redundancy was removed using cd-hit (at sequence identity = 99%), and expression was quantified using Salmon. *Var* transcript expression was ranked. Plots show the top 10 *var* gene expression rankings for each patient and their ex vivo and short-term in vitro cultured parasite samples. Group A *var* transcripts (red), group B or C *var* transcripts (blue), group E *var* transcripts (green), and transcripts of unknown *var* group (orange).

In line with these results, VEH on a per patient basis showed in some patients a clear change, with the ex vivo sample diversity curve distinct from those of in vitro generation 1 and generation 2 samples (patients #1, #2, #4) (Figure 4—figure supplement 2). Similarly, in other patient samples, we observed a clear difference in the curves of ex vivo and generation 1 samples (patients #25 and #26, both from first-time infected severe malaria patients). Some of these samples (#1 and #26, both from first-time infected severe malaria patients) also showed changes in their dominant var gene expression during culture, taken together indicating much greater var transcriptional changes in vitro compared to the other samples.

Expression of conserved var gene variants through short-term in vitro culture

Due to the relatively high level of conservation observed in var1, var2csa, and var3, they do not present with the same limitations as regular var genes. Therefore, changes in their expression through short-term culture was investigated across all samples together. We observed no significant differences in the expression of conserved var gene variants, var1-IT (p_adj = 0.61, paired Wilcoxon test), var1-3D7 (p_adj = 0.93, paired Wilcoxon test), and var2csa (p_adj = 0.54, paired Wilcoxon test) between paired ex vivo and generation 1 parasites, but var2csa was significantly differentially expressed between generation 1 and generation 2 parasites (p_adj = 0.029, paired Wilcoxon test) (Figure 4—figure supplement 3). However, var2csa expression previously appeared to have decreased in some paired samples during the first cycle of cultivation (Figure 4—figure supplement 3).

Differential expression of var domains from ex vivo to in vitro samples

There is overlap in PfEMP1 domain subtypes of different parasite isolates which can be associated with var gene groups and receptor binding phenotypes. This allows performing differential expression analysis on the level of encoded PfEMP1 domain subtypes, as done in previous studies (Tonkin-Hill et al., 2018; Wichers et al., 2021). PCA on var domain expression (Figure 5a) showed some patients’ ex vivo samples clustering away from their respective generation 1 sample (patients #1, #2, #4, #12, #17, #25), again indicating a greater var transcriptional change relative to the other samples during the first cycle of cultivation. However, in the pooled comparison of the generation 1 vs ex vivo of all isolates, a single domain was significantly differentially expressed, CIDRα2.5 associated with B-type PfEMP1 proteins and CD36 binding (Figure 5b). In the generation 2 vs ex vivo comparison, there were no domains significantly differentially expressed, however we observed large log₂FC values in similar domains to those changing most in the ex vivo vs generation 1 comparison (Figure 5c). No differentially expressed domains were found in the generation 1 vs generation 2 comparison. These results suggest that individual changes in var expression are not reflected in the pooled analysis and the per patient approach is more suitable.

Figure 5

Download asset Open asset

*Var* domain transcriptome analysis through short-term in vitro culture.

*Var* transcripts for paired ex vivo (n=13), generation 1 (n=13), generation 2 (n=10), and generation 3 (n=1) were de novo assembled using the whole transcript approach. *Var* transcripts were filtered for those ≥ 1500 nt in length and containing at least three significantly annotated *var* domains. Transcripts were annotated using hidden Markov models (HMM) built on the Rask et al., 2010, dataset (Rask et al., 2010). When annotating the whole transcript, the most significant alignment was taken as the best annotation for each region of the assembled transcript (e-value cut-off 1e-5). Multiple annotations were allowed on the transcript if they were not overlapping, determined using cath-resolve-hits. *Var* domain expression was quantified using FeatureCounts and the domain counts aggregated. (a) PCA plot of log₂ normalised read counts (adjusted for life cycle stage, derived from the mixture model approach). Points are coloured by their generation (ex vivo, purple; generation 1, red; generation 2, green; and generation 3, blue) and labelled by their patient identity. (b) Volcano plot showing extent and significance of up- or downregulation of *var* domain expression in ex vivo (n=13) compared with paired generation 1 cultured parasites (n=13) (red and blue, p<0.05 after Benjamini-Hochberg adjustment for FDR; red and green, absolute log₂ fold change [log₂FC] in expression ≥ 2). Domains with a log₂FC ≥ 2 represent those upregulated in generation 1 parasites. Domains with a log₂FC ≤ –2 represent those downregulated in generation 1 parasites. (c) Volcano plot showing extent and significance of up- or downregulation of *var* domain expression in ex vivo (n=10) compared with paired generation 2 cultured parasites (n=10) (green, absolute log₂ fold change [log₂FC] in expression ≥ 2). Domains with a log₂FC ≥ 2 represent those upregulated in generation 2 parasites. Domains with a log₂FC ≤ –2 represent those downregulated in generation 2 parasites. Differential expression analysis was performed using DESeq2 (adjusted for life cycle stage, derived from the mixture model approach).

Var group expression analysis

A previous study found group A var genes to have a rapid transcriptional decline in culture compared to group B var genes, however another study found a decrease in both group A and group B var genes in culture (Zhang et al., 2011; Peters et al., 2007). These studies were limited as the var type was determined by analysing the sequence diversity of DBLα domains, and by quantitative PCR (qPCR) methodology which restricts analysis to quantification of known/conserved sequences. Due to these results, the expression of group A var genes vs. group B and C var genes was investigated using a paired analysis on all the DBLα (DBLα1 vs DBLα0 and DBLα2) and NTS (NTSA vs NTSB) sequences assembled from ex vivo samples and across multiple generations in culture. A linear model was created with group A expression as the response variable, the generation and life cycle stage as independent variables, and the patient information included as a random effect. The same was performed using group B and C expression levels.

In both approaches, DBLα and NTS, we found no significant changes in total group A or group B and C var gene expression levels (Figure 6). We observed high levels of group B and C var gene expression compared to group A in all patients, both in the ex vivo samples and in the in vitro samples. In some patients we observed a decrease in group A var genes from ex vivo to generation 1 (patients #1, #2, #5, #6, #9, #12, #17, #26) (Figure 6a), however in all but four patients (patient #1, #2, #5, #6) the levels of group B and C var genes remained consistently high from ex vivo to generation 1 (Figure 6b). Interestingly, patients #6 and #17 also had a change in the dominant var gene expression through culture. Taken together with the preceding results, it appears that observed differences in var transcript expression occurring with transition to short-term culture are not due to modulation of recognised var classes, but due to differences in expression of particular var transcripts.

Figure 6 with 2 supplements see all

Download asset Open asset

*Var* group expression analysis through short-term in vitro culture.

The Duffy binding-like domain of class α (DBLα) domain sequence for each transcript was determined and for each patient a reference of all assembled DBLα domains combined. Group A *var* genes possess DBLα1 domains, some group B encode DBLα2 domains, and groups B and C encode DBLα0 domains. Domains were grouped by type and their expression summed. The relevant sample’s non-core reads were mapped to this using Salmon and DBLα expression quantified. DESeq2 normalisation was performed, with patient identity and life cycle stage proportions included as covariates. A similar approach was repeated for N-terminal segment (NTS) domains. Group A *var* genes encode NTSA compared to group B and C *var* genes which encode NTSB. Boxplots show log₂ normalised Salmon read counts for (a) group A *var* gene expression through cultured generations assessed using the DBLα domain sequences, (b) group B and C *var* gene expression through cultured generations assessed using the DBLα domain sequences, (c) group A *var* gene expression through cultured generations assessed using the NTS domain sequences, and (d) group B and C *var* gene expression through cultured generations assessed using the NTS domain sequences. Different coloured lines connect paired patient samples through the generations: ex vivo (n=13), generation 1 (n=13), generation 2 (n=10), and generation 3 (n=1). Axis shows different scaling.

Quantification of total var gene expression

We observed a trend of decreasing total var gene expression between generations irrespective of the assembler used in the analysis (Figure 6—figure supplement 1). A similar trend is seen with the LARSFADIG count, which is commonly used as a proxy for the number of different var genes expressed (Otto et al., 2019). A linear model was created (using only paired samples from ex vivo and generation 1) (Supplementary file 4) with proportion of total gene expression dedicated to var gene expression as the response variable, the generation and life cycle stage as independent variables, and the patient information included as a random effect. This model showed no significant differences between generations, suggesting that differences observed in the raw data may be a consequence of small changes in developmental stage distribution in culture.

Validation of var expression profiling by DBLα-tag sequencing

Deep sequencing of RT-PCR-amplified DBLα expressed sequence tags combined with prediction of the associated transcripts and their encoded domains using the Varia tool (Mackenzie et al., 2022) was performed to supplement the RNA-sequencing analysis. The raw Varia output file is given in Supplementary file 5. Overall, we found a high agreement between the detected DBLα-tag sequences and the de novo assembled var transcripts. A median of 96% (IQR: 93–100%) of all unique DBLα-tag sequences detected with >10 reads were found in the RNA-sequencing approach. This is a significant improvement on the original approach (p=0.0077, paired Wilcoxon test), in which a median of 83% (IQR: 79–96%) was found (Wichers et al., 2021). To allow for a fair comparison of the >10 reads threshold used in the DBLα-tag approach, the upper 75th percentile of the RNA-sequencing-assembled DBLα domains were analysed. A median of 77.4% (IQR: 61–88%) of the upper 75th percentile of the assembled DBLα domains were found in the DBLα-tag approach. This is a lower median percentage than the median of 81.3% (IQR: 73–98%) found in the original analysis (p=0.28, paired Wilcoxon test) and suggests the new assembly approach is better at capturing all expressed DBLα domains.

The new whole transcript assembly approach also had high consistency with the domain annotations predicted from Varia. Varia predicts var sequences and domain annotations based on short sequence tags, using a database of previously defined var sequences and annotations (Mackenzie et al., 2022). A median of 85% of the DBLα annotations and 73% of the DBLα-CIDR domain annotations, respectively, identified using the DBLα-tag approach were found in the RNA-sequencing approach. This further confirms the performance of the whole transcript approach and it was not restricted by the pooled approach of the original analysis. We also observed consistent results with the per patient analysis, in terms of changes in the dominant var gene expression (described above) (Supplementary file 5). In line with the RNA-sequencing data, the DBLα-tag approach revealed no significant differences in group A and groups B and C during short-term culture, further highlighting the agreement of both methods (Figure 6—figure supplement 2).

Differential expression analysis of the core transcriptome between ex vivo and in vitro samples

Given the modest changes in var gene expression repertoire upon culture, we wanted to investigate the extent of any accompanying changes in the core parasite transcriptome. PCA was performed on core gene (var, rif, stevor, surf, and rRNA genes removed) expression, adjusted for life cycle stage. We observed distinct clustering of ex vivo, generation 1, and generation 2 samples, with patient identity having much less influence (Figure 7a). There was also a change from the heterogeneity between the ex vivo samples to more uniform clustering of the generation 1 samples (Figure 7a), suggesting that during the first cycle of cultivation the core transcriptomes of different parasite isolates become more alike.

Figure 7 with 3 supplements see all

Download asset Open asset

Core gene transcriptome analysis of ex vivo and short-term in vitro cultured samples.

Core gene expression was assessed for paired ex vivo (n=13), generation 1 (n=13), generation 2 (n=10), and generation 3 (n=1) parasite samples. Subread align was used, as in the original analysis, to align the reads to the human genome and *P. falciparum* 3D7 genome, *with var, rif, stevor, surf,* and *rRNA* genes removed. HTSeq count was used to quantify gene counts. (a) PCA plot of log₂ normalised read counts. Points are coloured by their generation (ex vivo: purple, generation 1: red, generation 2: green, and generation 3: blue) and labelled by their patient identity. (b) Volcano plot showing extent and significance of up- or downregulation of core gene expression in ex vivo (n=13) compared with paired generation 1 cultured parasites (n=13) and (c) in ex vivo (n=10) compared with paired generation 2 cultured parasites (n=10). Dots in red and blue represent those genes with p<0.05 after Benjamini-Hochberg adjustment for FDR, red and green dots label genes with absolute log₂ fold change (log₂FC) in expression ≥ 2. Accordingly, genes with a log₂FC ≥ 2 represent those upregulated in generation 1 parasites and genes with a log₂FC ≤ –2 represent those downregulated in generation 1 parasites. Normalised read counts of the core gene analysis were adjusted for life cycle stage, derived from the mixture model approach.

In total, 920 core genes (19% of the core transcriptome) were found to be differentially expressed after adjusting for life cycle stages using the mixture model approach between ex vivo and generation 1 samples (Supplementary file 6). The majority were upregulated, indicating a substantial transcriptional change during the first cycle of in vitro cultivation (Figure 7b). 74 genes were found to be upregulated in generation 2 when compared to the ex vivo samples, many with log₂FC greater than those in the ex vivo vs generation 1 comparison (Figure 7c). No genes were found to be significantly differentially expressed between generation 1 and generation 2. However, five genes had a log₂FC ≥ 2 and were all upregulated in generation 2 compared to generation 1. Interestingly, the gene with the greatest fold change, encoding ROM3 (PF3D7_0828000), was also found to be significantly downregulated in generation 1 parasites in the ex vivo vs generation 1 analysis. The other four genes were also found to be non-significantly downregulated in generation 1 parasites in the ex vivo vs generation 1 analysis. This suggests that changes in gene expression during the first cycle of cultivation are the greatest compared to the other cycles.

The most significantly upregulated genes (in terms of fold change) in generation 1 contained several small nuclear RNAs, splicesomal RNAs, and non-coding RNAs (ncRNAs). 16 ncRNAs were found upregulated in generation 1, with several RNA-associated proteins having large fold changes (log₂FC >7). Significant gene ontology (GO) terms and Kyoto encyclopedia of genes and genomes (KEGG) pathways for the core genes upregulated in generation 1 included ‘entry into host’, ‘movement into host’, and ‘cytoskeletal organisation’ suggesting the parasites undergo a change in invasion efficiency, which is connected to the cytoskeleton, during their first cycle of in vitro cultivation (Figure 7—figure supplement 1). We observed eight AP2 transcription factors upregulated in generation 1 (PF3D7_0404100/AP2-SP2, PF3D7_0604100/SIP2, PF3D7_0611200/AP2-EXP2, PF3D7_0613800, PF3D7_0802100/AP2-LT, PF3D7_1143100/AP2-O, PF3D7_1239200, PF3D7_1456000/AP2-HC) with no AP2 transcription factors found to be downregulated in generation 1. To confirm the core gene expression changes identified were not due to the increase in parasite age during culture, as indicated by upregulation of many schizont-related genes, core gene differential expression analysis was performed on paired ex vivo and generation 1 samples that contained no schizonts or gametocytes in generation 1. The same genes were identified as significantly differentially expressed with a Spearman’s rank correlation of 0.99 for the log₂FC correlation between this restricted sample approach and those produced using all samples (Figure 7—figure supplement 2).

Cultured parasites as surrogates for assessing the in vivo core gene transcriptome

In the original analysis of ex vivo samples, hundreds of core genes were identified as significantly differentially expressed between pre-exposed and naïve malaria patients. We investigated whether these differences persisted after in vitro cultivation. We performed differential expression analysis comparing parasite isolates from naïve (n=6) vs pre-exposed (n=7) patients, first between their ex vivo samples, and then between the corresponding generation 1 samples. Interestingly, when using the ex vivo samples, we observed 206 core genes significantly upregulated in naïve patients compared to pre-exposed patients (Figure 7—figure supplement 3a). Conversely, we observed no differentially expressed genes in the naïve vs pre-exposed analysis of the paired generation 1 samples (Figure 7—figure supplement 3b). Taken together with the preceding findings, this suggests one cycle of cultivation shifts the core transcriptomes of parasites to be more alike each other, diminishing inferences about parasite biology in vivo.

A summary describing the rationale, results, and interpretation of each approach in our analysis pipeline can be found in Table 4.

Table 4

Summary of the different levels of analysis performed to assess the effect of short-term parasite culturing on var and core gene expression, their rational, method, results, and interpretation.

Analysis level	Analysis	Rationale	Method	Results	Interpretation
var transcript	Per patient expression ranking	Relative quantification of var transcripts over consecutive generations of parasites originating from the same patient to reveal var gene switching events	Combine assembled var transcripts for each patient into a reference and quantify expression, validated with DBLα-tag analysis	46% of the patient samples had a change in the dominant or top 3 highest expressed var gene	Modest changes in most samples, but unpredictable var gene switching during culture in some samples
	Per patient var expression homogeneity (VEH)	Determine the overall diversity of var gene expression (number of different variants expressed and their abundance) to assess impact of culturing on the overall var gene expression pattern	Comparison of diversity curves based on per patient quantification of the var transcriptome	39% of ex vivo samples diversity curves distinct from in vitro samples	Some patient samples underwent a much greater var transcriptional change compared to others
	Conserved var variants	Assessing and comparing the expression levels of strain-transcendent var gene variants (var1, var2csa, var3) between samples	Reference of all assembled transcripts for each conserved var gene and quantify expression	var2csa expression increases in second in vitro generation	Parasites converge to var2csa during short-term in vitro culture
var-encoded PfEMP1 domains	Differential expression of PfEMP1 domains	Identification, quantification, and comparison of expression levels of different var gene-encoded PfEMP1 domains associated with different disease manifestations	Pool all assembled var transcripts into a reference and quantify expression of each domain	46% of the ex vivo samples cluster away from their in vitro samples in PCA plots, distinct clustering by in vitro generation was not observed; CIDRα2.5 significantly differentially expressed between ex vivo and generation 1	Transition to culture results in modest modulation of particular var domains
var group	Expression of NTS (NTSA vs NTSB) and DBLα (DBLα1 vs DBLα0+DBLα2)	Quantification and comparison of expression levels of different var gene groups (group A vs. groups B and C)	Create a reference of all assembled DBLα and NTS domains for each patient and quantify expression. Validated with DBLα-tag analysis	No significant changes	No preferential up- or downregulation of certain var groups during transition to culture
Global var expression	LARSFADIG coverage	Assessing the overall var gene expression level (excluding var2csa)	Assemble the LARSFADIG motif and map non-core reads to quantify coverage	Trend for decrease in global var expression during culture, but no significant changes	Subtle reduction in global var gene expression may reflect increase in parasite age during culture
Core genes	Differential gene expression (DGE)	Assessing the impact of cultivation on the parasite core gene transcriptome	Differential expression analysis of core genes (P. falciparum 3D7 used as reference)	19% of the core transcriptome significantly differentially expressed between paired ex vivo and generation 1 in vitro samples; distinct clustering by parasite generation observed; upregulation of invasion and replication-related genes in vitro	Parasites core gene expression changes substantially upon entering culture

Discussion

Multiple lines of evidence point to PfEMP1 as a major determinant of malaria pathogenesis, but previous approaches for characterising var expression profiles in field samples have limited in vivo studies of PfEMP1 function, regulation, and association with clinical symptoms (Tarr et al., 2018; Lee et al., 2018; Warimwe et al., 2013; Rorick et al., 2013; Zhang et al., 2011; Taylor et al., 2002). A more recent approach, based on RNA-sequencing, overcame many of the limitations imposed by the previous primer-based methods (Tonkin-Hill et al., 2018; Wichers et al., 2021). However, depending on the expression level and sequencing depth, var transcripts were found to be fragmented and only a partial reconstruction of the var transcriptome was achieved (Tonkin-Hill et al., 2018; Wichers et al., 2021; Andrade et al., 2020; Guillochon et al., 2022; Yamagishi et al., 2014). The present study developed a novel approach for var gene assembly and quantification that overcomes many of these limitations.

Our new approach used the most geographically diverse reference of var gene sequences to date, which improved the identification of reads derived from var transcripts. This is crucial when analysing patient samples with low parasitaemia where var transcripts are hard to assemble due to their low abundancy (Guillochon et al., 2022). Our approach has wide utility due to stable performance on both laboratory-adapted and clinical samples. Concordance in the different var expression profiling approaches (RNA-sequencing and DBLα-tag) on ex vivo samples increased using the new approach by 13%, when compared to the original approach (96% in the whole transcript approach compared to 83% in Wichers et al., 2021). This suggests the new approach provides a more accurate method for characterising var genes, especially in samples collected directly from patients. Ultimately, this will allow a deeper understanding of relationships between var gene expression and clinical manifestations of malaria.

Having a low number of long contigs is desirable in any de novo assembly. This reflects a continuous assembly, as opposed to a highly fragmented one where polymorphic and repeat regions could not be resolved (Lischer and Shimizu, 2017). An excessive number of contigs cannot be reasonably handled computationally and results from a high level of ambiguity in the assembly (Yang et al., 2012). We observed a greater than 50% reduction in the number of contigs produced in our new approach, which also had a 21% increase in the maximum length of the assembled var transcripts, when compared to the original approach. It doubled the assembly continuity and assembled an average of 13% more of the var transcripts. This was particularly apparent in the N-terminal region, which has often been poorly characterised by existing approaches. The original approach failed to assemble the N-terminal region in 58% of the samples, compared to just 4% in the new approach with assembly consistently achieved with an accuracy >90%. This is important because the N-terminal region is known to contribute to the adhesion phenotypes of most PfEMP1 proteins.

The new approach allows for var transcript reconstruction across a range of expression levels, which is required when characterising var transcripts from multi-clonal infections. Assembly completeness of the lowly expressed var genes increased fivefold using the new approach. Biases towards certain parasite stages have been observed in non-severe and severe malaria cases, so it is valuable to assemble the var transcripts from different life cycle stages (Tonkin-Hill et al., 2018). Our new approach is not limited by parasite stage. It was able to assemble the whole var transcript, in a single contig, at later stages in the P. falciparum 3D7 intra-erythrocytic cycle, something previously unachievable. The new approach allows for a more accurate and complete picture of the var transcriptome. It provides new perspectives for relating var expression to regulation, co-expression, epigenetics, and malaria pathogenesis. It can be applied, for example, in the analysis of patient samples with different clinical outcomes and longitudinal tracking of infections in vivo. It represents a crucial improvement for quantifying the var transcriptome. In this work, the improved approach for var gene assembly and quantification was used to characterise var gene expression during transition from in vivo to short-term culture.

This study had substantial power through the use of paired samples. However, many var gene expression studies do not have longitudinal sampling. Future work should focus on identifying the best approach for analysing the var transcripts in cross-sectional samples. Higher-level var classification systems, such as the PfEMP1 predicted binding phenotype or domain cassettes, could be applied to test for over-representation of different var gene features in different groups of interest, because the assumption of overlapping var repertoires at these levels of classification would be more realistic. This was briefly explored in our analysis through var domain differential expression analysis, which found minimal changes in var domain expression through short-term culture, supporting the per patient analysis results. This could be further improved by advancing the classifications of domain subtypes. This has recently been studied using MEME to identify short nucleotide motifs that are representative of domain subtypes (Otto et al., 2019). Other research could investigate clustering var transcripts based on sequence identity and testing for clusters associated with specific malaria disease groups.

Studies have been performed investigating differences between long-term laboratory-adapted clones and clinical isolates, with hundreds of genes found to be differentially expressed (Hoo et al., 2019; Tarr et al., 2018; Mackinnon et al., 2009). Surprisingly, studies investigating the impact of short-term culture on parasites are extremely limited, despite it being commonly undertaken for making inferences about the in vivo transcriptome (Vignali et al., 2011). Using the new var assembly approach, we found that var gene expression remains relatively stable during transition to culture. However, the conserved var2csa had increased expression from generation 1 to generation 2. It has previously been suggested that long-term cultured parasites converge to expressing var2csa, but our findings suggest this begins within two cycles of cultivation (Zhang et al., 2022; Mok et al., 2008). Switching to var2csa has been shown to be favourable and is suggested to be the default var gene upon perturbation to var-specific heterochromatin (Ukaegbu et al., 2015). These studies also suggested that var2csa has a unique role in var gene switching and our results are consistent with the role of var2csa as the dominant ‘sink node’ (Zhang et al., 2022; Ukaegbu et al., 2015; Ukaegbu et al., 2014; Mok et al., 2008). A previous study suggested that in vitro cultivation of controlled human malaria infection samples resulted in dramatic changes in var gene expression (Lavstsen et al., 2005; Peters et al., 2007). Almost a quarter of samples in our analysis showed more pronounced and unpredictable changes. In these individuals, the dominant var gene being expressed changed within one cycle of cultivation. This implies that short-term culture can result in unpredictable var gene expression as observed previously using a semi-quantitative RT-PCR approach (Bachmann et al., 2011) and that one would need to confirm in vivo expression matches in vitro expression. This can be achieved using the assembly approach described here.

We observed no generalised pattern of up- or downregulation of specific var groups following transition to culture. This implies that there is probably not a selection event occurring during culture but may represent a loss of selection that is present in vivo. A global downregulation of certain var groups might only occur as a selective process over many cycles in extended culture. Determining changes in var group expression levels are difficult using degenerate qPCR primers bias and previous studies have found conflicting results in terms of changes of expression of var groups through cultivation. Zhang et al., 2011, found a rapid transcriptional decline of group A and group B var genes, however Peters et al., 2007, found group A var genes to have a high rate of downregulation, when compared to group B var genes. These studies differed in the stage distribution of the parasites and were limited in measuring enough variants through their use of primers. Our new approach allowed for the identification of more sequences, with 26.6% of assembled DBLα domains not found via the DBLα-tag approach. This better coverage of the expressed var diversity was not possible in these previous studies and may explain discrepancies observed.

Generally, there was a high consensus between all levels of var gene analysis and changes observed during short-term in vitro cultivation. However, the impact of short-term culture was the most apparent at the var transcript level and became less clear at the var domain, var type and global var gene expression level. This highlights the need for accurate characterisation of full-length var transcripts and analysis of the var transcriptome at different levels, both of which can be achieved with the new approach developed here.

We saw striking changes in the core gene transcriptomes between ex vivo and generation 1 parasites with 19% of the core genome being differentially expressed. A previous study showed that expression of 18% of core genes were significantly altered after ~50 cycles through culture (Mackinnon et al., 2009), but our data suggest that much of this change occurs early in the transition to culture. We observed that genes with functions unrelated to ring-stage parasites were among those most significantly expressed in the generation 1 vs ex vivo analysis, suggesting the culture conditions may temporarily dysregulate stage-specific expression patterns or result in the parasites undergoing a rapid adaptation response (Andreadaki et al., 2020; Beeson et al., 2016). Several AP2 transcription factors (AP2-SP2, AP2-EXP2, AP2-LT, AP2-O, and AP2-HC) were upregulated in generation 1. AP2-HC has been shown to be expressed in asexual parasites (Carrington et al., 2021). AP2-O is thought to be specific for the ookinete stage and AP2-SP2 plays a key role in sporozoite stage-specific gene expression (Kaneko et al., 2015; Yuda et al., 2010). Our findings are consistent with another study investigating the impact of long-term culture (Mackinnon et al., 2009) which also found genes like merozoite surface proteins differentially expressed, however they were downregulated in long-term cultured parasites, whereas we found them upregulated in generation 1. This suggests that short-term cultured parasites might be transcriptionally different from long-term cultured parasites, especially in their invasion capabilities, something previously unobserved. Several genes involved in the stress response of parasites were upregulated in generation 1, e.g., DnaJ proteins, serine proteases, and ATP-dependent CLP proteases (Oakley et al., 2007). The similarity of the core transcriptomes of the in vitro samples compared to the heterogeneity seen in the ex vivo samples could be explained by a stress response upon entry to culture. Studies investigating whether the dysregulation of stage-specific expression and the expression of stress-associated genes persist in long-term culture are required to understand whether they are important for growth in culture. Critically, the marked differences presented here suggest that the impact of short-term culture can override differences observed in both the in vivo core and var transcriptomes of different disease manifestations.

In summary, we present an enhanced approach for var transcript assembly which allows for var gene expression to be studied in connection to P. falciparum’s core transcriptome through RNA-sequencing. This will be useful for expanding our understanding of var gene regulation and function in in vivo samples. As an example of the capabilities of the new approach, the method was used to quantify differences in gene expression upon short-term culture adaptation. This revealed that inferences from clinical isolates of P. falciparum put into short-term culture must be made with a degree of caution. Whilst var gene expression is often maintained, unpredictable switching does occur, necessitating that the similarity of in vivo and in vitro expression should be confirmed. The more extreme changes in the core transcriptome could have much bigger implications for understanding other aspects of parasite biology such as growth rates and drug susceptibility and raise a need for additional caution. Further work is needed to examine var and core transcriptome changes during longer-term culture on a larger sample size. Understanding the ground truth of the var expression repertoire of Plasmodium field isolates still presents a unique challenge and this work expands the database of var sequences globally. The increase in long-read sequencing and the growing size of var gene databases containing isolates from across the globe will help overcome this issue in future studies.

Materials and methods

Key resources table

Reagent type (species) or resource	Designation	Source or reference	Identifiers	Additional information
Biological sample (Plasmodium falciparum)	Isolate #1	Patient isolate Wichers et al., 2021		Freshly isolated from a naïve, severely ill malaria patient
Biological sample (Plasmodium falciparum)	Isolate #2	Patient isolate Wichers et al., 2021		Freshly isolated from a pre-exposed, non-severely ill malaria patient
Biological sample (Plasmodium falciparum)	Isolate #3	Patient isolate Wichers et al., 2021		Freshly isolated from a naïve, non-severely ill malaria patient
Biological sample (Plasmodium falciparum)	Isolate #4	Patient isolate Wichers et al., 2021		Freshly isolated from a pre-exposed, non-severely ill malaria patient
Biological sample (Plasmodium falciparum)	Isolate #5	Patient isolate Wichers et al., 2021		Freshly isolated from a pre-exposed, non-severely ill malaria patient
Biological sample (Plasmodium falciparum)	Isolate #6	Patient isolate Wichers et al., 2021		Freshly isolated from a naïve, non-severely ill malaria patient
Biological sample (Plasmodium falciparum)	Isolate #7	Patient isolate Wichers et al., 2021		Freshly isolated from a pre-exposed, non-severely ill malaria patient
Biological sample (Plasmodium falciparum)	Isolate #9	Patient isolate Wichers et al., 2021		Freshly isolated from a naïve, non-severely ill malaria patient
Biological sample (Plasmodium falciparum)	Isolate #10	Patient isolate Wichers et al., 2021		Freshly isolated from a pre-exposed, non-severely ill malaria patient
Biological sample (Plasmodium falciparum)	Isolate #11	Patient isolate Wichers et al., 2021		Freshly isolated from a pre-exposed, non-severely ill malaria patient
Biological sample (Plasmodium falciparum)	Isolate #12	Patient isolate Wichers et al., 2021		Freshly isolated from a pre-exposed, non-severely ill malaria patient
Biological sample (Plasmodium falciparum)	Isolate #13	Patient isolate Wichers et al., 2021		Freshly isolated from a naïve, severely ill malaria patient
Biological sample (Plasmodium falciparum)	Isolate #14	Patient isolate Wichers et al., 2021		Freshly isolated from a naïve, non-severely ill malaria patient
Biological sample (Plasmodium falciparum)	Isolate #15	Patient isolate Wichers et al., 2021		Freshly isolated from a naïve, severely ill malaria patient
Biological sample (Plasmodium falciparum)	Isolate #16	Patient isolate Wichers et al., 2021		Freshly isolated from a naïve, non-severely ill malaria patient
Biological sample (Plasmodium falciparum)	Isolate #17	Patient isolate Wichers et al., 2021		Freshly isolated from a pre-exposed, non-severely ill malaria patient
Biological sample (Plasmodium falciparum)	Isolate #18	Patient isolate Wichers et al., 2021		Freshly isolated from a pre-exposed, non-severely ill malaria patient
Biological sample (Plasmodium falciparum)	Isolate #19	Patient isolate Wichers et al., 2021		Freshly isolated from a pre-exposed, non-severely ill malaria patient
Biological sample (Plasmodium falciparum)	Isolate #20	Patient isolate Wichers et al., 2021		Freshly isolated from a naïve, severely ill malaria patient
Biological sample (Plasmodium falciparum)	Isolate #21	Patient isolate Wichers et al., 2021		Freshly isolated from a naïve, non-severely ill malaria patient
Biological sample (Plasmodium falciparum)	Isolate #22	Patient isolate Wichers et al., 2021		Freshly isolated from a naïve, non-severely ill malaria patient
Biological sample (Plasmodium falciparum)	Isolate #23	Patient isolate Wichers et al., 2021		Freshly isolated from a naïve, severely ill malaria patient
Biological sample (Plasmodium falciparum)	Isolate #24	Patient isolate Wichers et al., 2021		Freshly isolated from a pre-exposed, non-severely ill malaria patient
Biological sample (Plasmodium falciparum)	Isolate #25	Patient isolate Wichers et al., 2021		Freshly isolated from a naïve, severely ill malaria patient
Biological sample (Plasmodium falciparum)	Isolate #26	Patient isolate Wichers et al., 2021		Freshly isolated from a naïve, severely ill malaria patient
Biological sample (Plasmodium falciparum)	Isolate #27	Patient isolate Wichers et al., 2021		Freshly isolated from a pre-exposed, non-severely ill malaria patient
Biological sample (Plasmodium falciparum)	Isolate #28	Patient isolate Wichers et al., 2021		Freshly isolated from a pre-exposed, non-severely ill malaria patient
Biological sample (Plasmodium falciparum)	Isolate #29	Patient isolate Wichers et al., 2021		Freshly isolated from a pre-exposed, non-severely ill malaria patient
Biological sample (Plasmodium falciparum)	Isolate #30	Patient isolate Wichers et al., 2021		Freshly isolated from a naïve, severely ill malaria patient
Biological sample (Plasmodium falciparum)	Isolate #31	Patient isolate Wichers et al., 2021		Freshly isolated from a pre-exposed, non-severely ill malaria patient
Biological sample (Plasmodium falciparum)	Isolate #32	Patient isolate Wichers et al., 2021		Freshly isolated from a pre-exposed, non-severely ill malaria patient
Biological sample (Plasmodium falciparum)	Isolate #33	Patient isolate Wichers et al., 2021		Freshly isolated from a pre-exposed, non-severely ill malaria patient
Others	Plasmodipur filter	EuroProxima	Transia Cat. #: 8011Filter25u	Filter to remove residual granulocytes from erythrocyte pellet after Ficoll gradient centrifugation (see Materials and methods section ‘Blood sampling, processing, and in vitro cultivation of P. falciparum’)
Others	TRIzol	ThermoFisher	Cat. #: 15596026	Commercial reagent
Commercial assay or kit	RNeasy MinElute Cleanup Kit	QIAGEN	Cat. #: 74204
Commercial assay or kit	GLOBINclear kit	ThermoFisher	Cat. #: AM1980
Commercial assay or kit	NEBNextPoly(A) mRNA Magnetic Isolation Module	NEB	Cat. #: E7490L
Commercial assay or kit	NEBNext Ultra RNA Library Prep Kit for Illumina	NEB	Cat. #: E7530L
Commercial assay or kit	NEBNext Multiplex Oligos for Illumina (Index Primers Set 1)	NEB	Cat. #: E7335L
Commercial assay or kit	KAPA HiFi plus dNTPs	Roche	Cat. #: 7958846001
Software, algorithm	SoapDeNovo-Trans	SoapDeNovo-Trans	RRID:SCR_013268
Software, algorithm	rnaSPAdes	rnaSPAdes	RRID:SCR_016992
Software, algorithm	Oases	Oases	RRID:SCR_011896
Software, algorithm	DESeq2	DESeq2	RRID:SCR_015687
Software, algorithm	Salmon	Salmon	RRID:SCR_017036
Software, algorithm	CAP3	CAP3	RRID:SCR_007250
Software, algorithm	SSPACE	SSPACE	RRID:SCR_005056
Software, algorithm	featureCounts	featureCounts	RRID:SCR_012919

Share this article

Cite this article

Summary of the var chromosomal location, var gene, PfEMP1 protein structure, and PfEMP1-binding phenotypes.

Comparison of previous var assembly approaches based on DNA- and RNA-sequencing.

Overview of novel computational pipelines for assembling var transcripts.

Summary of analyses of var and core gene transcriptome changes in paired ex vivo and short-term in vitro cultured parasites.

Statistics for the different approaches used to assemble the var transcripts.

Summary of the clinical dataset used to analyse the impact of parasite culturing on gene expression.

Rank var gene expression analysis.

Var domain transcriptome analysis through short-term in vitro culture.

Var group expression analysis through short-term in vitro culture.

Core gene transcriptome analysis of ex vivo and short-term in vitro cultured samples.

Summary of the different levels of analysis performed to assess the effect of short-term parasite culturing on var and core gene expression, their rational, method, results, and interpretation.

Author details

Clare Andradi-Brown

Contribution

Competing interests

Jan Stephan Wichers-Misterek

Contribution

Competing interests

Heidrun von Thien

Contribution

Competing interests

Yannick D Höppner

Contribution

Competing interests

Judith AM Scholz

Contribution

Competing interests

Helle Hansson

Contribution

Competing interests

Emma Filtenborg Hocke

Contribution

Competing interests

Tim Wolf Gilberger

Contribution

Competing interests

Michael F Duffy

Contribution

Competing interests

Thomas Lavstsen

Contribution

Competing interests

Jake Baum

Contribution

Competing interests

Thomas D Otto

Contribution

Contributed equally with

Competing interests

Aubrey J Cunnington

Contribution

Contributed equally with

For correspondence

Competing interests

Anna Bachmann

Contribution

Contributed equally with

For correspondence

Competing interests

Citations by DOI

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organism

Further reading