Summary of the var chromosomal location, var gene, PfEMP1 protein structure, and PfEMP1 binding phenotypes.

a) Chromosomal position and transcriptional direction (indicated by arrows) of the different var gene groups, designated by the respective type of upstream sequence (Kraemer and Smith et al., 2003, Lavstsen et al., 2003). b) Structure of the var gene which encodes the PfEMP1 protein. The var gene is composed of two exons, the first, around 3–9.4 kb, encodes the highly variable extracellular region and the transmembrane region (TM) of PfEMP1. Exon 2 is shorter with about 1.2 kb and encodes a semi-conserved intracellular region (acidic terminal segment, ATS). The PfEMP1 protein is composed of an N-terminal segment (NTS), followed by a variable number of Duffy binding-like (DBL) domains and cysteine-rich interdomain regions (CIDR) (Rask et al., 2010). c) Summary of PfEMP1 proteins encoded in the parasite genome, their composition of domain subtypes and associated N-terminal binding phenotype. Group A and some B proteins have an EPCR-binding phenotype; the vast majority of group B and C PfEMP1 proteins bind to CD36. Group A proteins also include those that bind a yet unknown receptor, as well as VAR1 and VAR3 variants with unknown function and binding phenotype. VAR2CSA (group E) binds placental CSA.

Overview of novel computational pipelines for assembling var transcripts.

The original approach (green) used SoapDeNovo-Trans (k=71) to perform whole var transcript assembly. The whole transcript approach (blue) focused on assembling whole var transcripts from the non-core reads using rnaSPAdes (k = 71). Contigs were then joined into longer transcripts using SSPACE. The domain approach (orange) assembled var domains first and then joined the domains into whole transcripts. Domains were assembled separately using three different de novo assemblers (SoapDeNovo2, Oases and rnaSPAdes). Next, a reference of assembled domains was composed and cd-hit (at sequence identity = 99%) was used to remove redundant sequences. Cap3 was used to merge and extend domain assemblies. Finally, SSPACE was used to join domains together. HMM models built on the Rask et al., 2010 dataset were used to annotate the assembled transcripts (Rask et al., 2010). The most significant alignment was taken as the best annotation for each region of the assembled transcript (significance <= 1e-5) identified using cath-resolve-hits0. Transcripts < 500nt were removed. A var transcript was selected if it contained at least one significantly annotated domain (in exon 1). Var transcripts that encoded only the more conserved exon 2 (ATS domain) were discarded. The three pipelines were run on the 32 malaria patient ex vivo samples from Wichers et al., 2021 (Wichers et al., 2021).

Summary of analyses of var and core gene transcriptome changes in paired ex vivo and short-term in vitro cultured parasites.

From a total of 13 parasite isolates, the ex vivo samples (Wichers et al., 2021) and the corresponding in vitro-cultured parasites of the first (n=13), second (n=10) and third (n=1) replication cycle were analysed by RNA sequencing. The expression of non-polymorphic core genes and polymorphic var genes was determined in different analysis streams: (1) Non-polymorphic core gene reads were mapped to the 3D7 reference genome, expression was quantified using HTSeq and differential expression analysis performed (orange); (2) Non-core reads were identified, whole transcripts were assembled with rnaSPAdes, expression of both var transcripts (red) and domains (light green) was quantified, and var domain differential expression analysis was performed. “Per patient analysis” (yellow) represents combining all assembled var transcripts for samples originating from the same ex vivo sample only. For each conserved var gene (var1-3D7, var1-IT, var2csa and var3) all significantly assembled conserved var transcripts were identified and put into a combined reference (pink). The normalised counts for each conserved gene were summed. Non-core reads were mapped to this and DESeq2 normalisation performed. Var type (group A vs group B and C) expression (purple) was quantified using the DBLα and NTS assembled sequences and differences across generations were assessed. Total var gene expression (turquoise) was quantified by assembling and quantifying the coverage over the highly conserved LARSFADIG motif, with the performance of assembly using Trinity and rnaSPAdes assessed. DBLα-tag data was used to confirm the results of the dominant var gene expression analysis and the var type analysis (dark green). Var expression homogeneity (VEH) was analysed at the patient level (α diversity curves). All differential expression analyses were performed using DESeq2.To ensure a fair comparison of samples, which may contain different proportions of life cycle stages, the performance of two different in silico approaches was evaluated by counting Giemsa-stained thin blood smears (blue).

Statistics for the different approaches used to assemble the var transcripts.

Var assembly approaches were applied to malaria patient ex vivo samples (n=32) from (Wichers et al., 2021) and statistics determined. Given are the total number of assembled var transcripts longer than 500 nt containing at least one significantly annotated var domain, the maximum length of the longest assembled var transcript in nucleotides and the N50 value, respectively. The N50 is defined as the sequence length of the shortest var contig, with all var contigs greater than or equal to this length together accounting for 50% of the total length of concatenated var transcript assemblies. Misassemblies represents the number of misassemblies for each approach. **Number of misassemblies were not determined for the domain approach due to its poor performance in other metrics.

Summary of the clinical dataset used to analyze the impact of parasite culturing on gene expression.

RNA-sequencing was performed on 32 malaria infected German traveler samples (Wichers et al., 2021). The 32 ex vivo samples were used to compare the performance of the var assembly approaches. Parasites from 13 of these ex vivo samples underwent one cycle of in vitro replication, 10 parasite samples were also subjected to a second cycle of replication in vitro, and a single parasite isolate was also analyzed after a third cycle of replication. For the ex vivo vs short-term in vitro cultivation analysis only paired samples were used. The number of assembled var contigs represents results per sample using the whole transcript approach, and shows either the number of assembled var contigs significantly annotated as var gene and > =500nt in length, or the number of assembled var transcripts identified with a length >= 1500nt and containing at least 3 significantly annotated var domains. *PE; paired-end reads.

Rank var gene expression analysis.

For each patient, the paired ex vivo (n=13) and in vitro samples (generation 1: n=13, generation 2: n=10, generation 3: n=1) were analysed. The assembled var transcripts with at least 1500nt and containing 3 significantly annotated var domains across all the generations for a patient were combined into a reference, redundancy was removed using cd-hit (at sequence identity = 99%), and expression was quantified using Salmon. Var transcript expression was ranked. Plots show the top 10 var gene expression rankings for each patient and their ex vivo and short-term in vitro cultured parasite samples. Group A var transcripts (red), group B or C var transcripts (blue), group E var transcripts (green) and transcripts of unknown var group (orange).

Var domain transcriptome analysis through short-term in vitro culture.

Var transcripts for paired ex vivo (n=13), generation 1 (n=13), generation 2 (n=10) and generation 3 (n=1) were de novo assembled using the whole transcript approach. Var transcripts were filtered for those >= 1500nt in length and containing at least 3 significantly annotated var domains. Transcripts were annotated using HMM models built on the Rask et al., 2010 dataset (Rask et al., 2010). When annotating the whole transcript, the most significant alignment was taken as the best annotation for each region of the assembled transcript (e-value cut off 1e-5). Multiple annotations were allowed on the transcript if they were not overlapping, determined using cath-resolve-hits. Var domain expression was quantified using FeatureCounts and the domain counts aggregated a) PCA plot of log2 normalized read counts (adjusted for life cycle stage, derived from the mixture model approach). Points are coloured by their generation (ex vivo; purple, generation 1; red, generation 2; green and generation 3; blue) and labelled by their patient identity b) Volcano plot showing extent and significance of up-or down-regulation of var domain expression in ex vivo (n=13) compared with paired generation 1 cultured parasites (n=13) (red and blue, P < 0.05 after Benjamini-Hochberg adjustment for FDR; red and green, absolute log2 fold change log2FC in expression >= 2). Domains with a log2FC > = 2 represent those upregulated in generation 1 parasites. Domains with a log2FC <= –2 represent those downregulated in generation 1 parasites. c) Volcano plot showing extent and significance of up-or down-regulation of var domain expression in ex vivo (n=10) compared with paired generation 2 cultured parasites (n=10) (green, absolute log2 fold change log2FC in expression >= 2). Domains with a log2FC >= 2 represent those upregulated in generation 2 parasites. Domains with a log2FC <= –2 represent those downregulated in generation 2 parasites. Differential expression analysis was performed using DESeq2 (adjusted for life cycle stage, derived from the mixture model approach).

Var group expression analysis through short-term in vitro culture.

The DBLα domain sequence for each transcript was determined and for each patient a reference of all assembled DBLα domains combined. Group A var genes possess DBLα1 domains, some group B encode DBLα2 domains and groups B and C encode DBLα0 domains. Domains were grouped by type and their expression summed. The relevant sample’s non-core reads were mapped to this using Salmon and DBLα expression quantified. DESeq2 normalisation was performed, with patient identity and life cycle stage proportions included as covariates. A similar approach was repeated for NTS domains. Group A var genes encode NTSA compared to group B and C var genes which encode NTSB. Boxplots show log2 normalised Salmon read counts for a) group A var gene expression through cultured generations assessed using the DBLα domain sequences, b) group B and C var gene expression through cultured generations assessed using the DBLα domain sequences, c) group A var gene expression through cultured generations assessed using the NTS domain sequences, and d) group B and C var gene expression through cultured generations assessed using the NTS domain sequences. Different coloured lines connect paired patient samples through the generations: ex vivo (n=13), generation 1 (n=13), generation 2 (n=10) and generation 3 (n=1). Axis shows different scaling.

Core gene transcriptome analysis of ex vivo and short-term in vitro cultured samples.

Core gene expression was assessed for paired ex vivo (n=13), generation 1 (n=13), generation 2 (n=10) and generation 3 (n=1) parasite samples. Subread align was used, as in the original analysis, to align the reads to the human genome and P. falciparum 3D7 genome, with var, rif, stevor, surf and rRNA genes removed. HTSeq count was used to quantify gene counts. a) PCA plot of log2 normalized read counts. Points are coloured by their generation (ex vivo: purple, generation 1: red, generation 2: green, and generation 3: blue) and labelled by their patient identity. b) Volcano plot showing extent and significance of up-or down-regulation of core gene expression in ex vivo (n=13) compared with paired generation 1 cultured parasites (n=13) and c) in ex vivo (n=10) compared with paired generation 2 cultured parasites (n=10). Dots in red and blue represent those genes with P < 0.05 after Benjamini-Hochberg adjustment for FDR, red and green dots label genes with absolute log2 fold change log2FC in expression >= 2). Accordingly, genes with a log2FC > = 2 represent those upregulated in generation 1 parasites and genes with a log2FC <= –2 represent those downregulated in generation 1 parasites. Normalized read counts of the core gene analysis were adjusted for life cycle stage, derived from the mixture model approach.

Summary of the different levels of analysis performed to assess the effect of short-term parasite culturing on var and core gene expression, their rational, method, results, and interpretation.