(a) Numbers and types of mutations in the core population compared to genetically distinct strains LV9 (MHOM/ET/1967/HU3) and JPCM5 (MCAN/ES/1998/LLM-877), as well as the ISC1/2. N/S ratio denotes the number of nonsynonymous SNPs divided by the synonymous ones.
The 'LV9+JPCM5-191 lineage' refers to homozygous SNPs occurring in the core population ancestral branch only since its divergence with LV9 and JPCM5. The 'ISC1-191 lineage' refers to fixed homozygous SNPs between all of the core population and all of ISC1. The 'ISC2-183 lineage' refers to fixed homozygous SNPs between ISC2 (Bangladesh) and ISC3/4/5/6/7/8/9/10 and ungrouped isolates (India or Nepal). (b) Number of SNPs unique to each population or lineage. SNPs were determined as unique where their intraspecific frequency was >0.95 and <0.05 in all other samples (excluding ungrouped samples). The number of samples did not reflect the number of SNPs or haplotypes: haplotype diversity was >0.99 for all groups except ISC7 (0.34). (c) Genes ordered by ID with two or more SNPs and π/kb>1. The non-synonymous variant V185M in a NADH-flavin oxidoreductase/NADH oxidase gene (LdBPK_120730) was present here in the ISC1/2/3 as well as previously being observed in Kenyan L. donovani (Downing et al., 2012). 956 genes had at least one SNP, of which 566 had SNPs that were not singletons. Only 63 genes contain ≥2 SNPs and 12 genes contain ≥3 SNPs. (d) Mean numbers of SNP differences within and between groups. Rows and columns denote comparisons: within populations, the mean number of SNPs between strain pairs (π) is shown; and between populations, the mean number of SNPs between samples from each population is given. Within the core population, there was an excess of singletons (Tajima’s D=-2.2, p<0.01, Fu and Li's D=-0.78, Fu and Li’s F=-1.87: the values were the same with either JPCM5 or LV9 as the outgroup). 62.6% of SNPs occurred in one sample only and the correlation of the number of homozygous and heterozygous SNPs per sample was small (r2=0.005). 291 distinct haplotypes out of a maximum possible of 382 were resolved in the phased SNPs, with a resulting haplotype diversity (Hd) of 0.994. Nepalese samples were on average more diverse compared to the Indian ones (π=84.3 per Mb vs 66.5, Hd=0.996 vs 0.982). ISC6 was restricted to Nepal (π=12.2 per Mb). (e) Branch distances between groups using the 2 population (f2) statistic. The scaled correlation in allele frequencies were computed for each reference group (top row) and groups (first column). ISC9 and ISC10 were examined together as one group. The distribution of allele sharing across ISC3/4/5/6 in subsets of trios was compared to that for ISC2 using F-statistics. The relative pairwise correlation in allele frequencies between populations using the f2 test indicated ISC2 as the most consistently divergent population and also highlighted distinctive allele frequencies in the recently emerged Indian ISC7 group. In contrast, groups with possible admixture (ISC9 and ISC10) had much shorter branch lengths. (f) f4 ratios for candidate admixed populations for mixture from ISC5, ISC6 and ISC7.Each ratio was computed as with A and B as either ISC6 or ISC7. Populations X were the 6 samples identified as possible mixtures of the ancestors of ISC5 and ISC6/7: BHU815/0, BHU764A1, BHU274/0, BHU574cl4, BHU581cl2, and BHU572cl3. The gene flow from ISC5 was the f4 ratio and that from B (ISC6 or ISC7) was 1-f4. As expected, if ISC6 and 7 grouped together relative to other groups (f4>0). f4 statistics were considered to support a particular relationship between four individual isolates if 0.05<f4<0.05. ISC2 clustered with ISC3 much more frequently (66.7%) than any other (ISC4, ISC5, ISC6 all 22.2% each). ISC3 had drift patterns more similar to ISC4 (44.4%) than ISC6 (22.2%) or ISC5 (0%). ISC4 had greater correlation with ISC5 (44.4%) than ISC6 (22.2%). ISC5 and ISC6 had more similar drift in 66.7% of tests. These values are consistent with the relationship (ISC2,(ISC3,(ISC4,(ISC5,ISC6))))) shown by the phylogenetic analysis. Admixture levels were measured for the six ISC5-ISC6/7 hybrids as a single population using f4 ratios with ISC2, ISC3 and ISC4 as outgroups. The six hybrids were BHU815/0, BHU764A1, BHU274/0, BHU574cl4, BHU581cl2 and BHU572cl3. The fraction of ancestry attributed to ISC5 was 54–56% and to ISC6 was 44–46% for ISC5-ISC6 mixing, and the alternative ISC5-ISC7 mixture had ISC5 at 59–61% and ISC7 at 39–41%. (g) Diagnostic SNPs for ISC7. 20 mutations were unique to ISC7 compared to the other genetically homogeneous populations (ISC2/3/4/6/7/8/9/10) with a ISC7 frequency >0.98 and a non-ISC7 frequency <0.02. Six were nonsynonymous. These SNPs coincided with a distinct chromosome copy number pattern in ISC7. SNPs were assigned to the 5’ and 3’ UTR if they are within 1 Kb of the start or end of the gene (respectively). ISC7 was a recent radiation restricted to India (π=1.8 per Mb). (h) Diagnostic SNPs for ISC5. 32 mutations were unique to ISC5 compared to the other genetically homogeneous populations (ISC2/3/4/6/7/8/9/10), with ISC5 frequency >0.95 and a non-ISC5 frequency <0.05. Eight changes were nonsynonymous. *Nine were previously associated with antimonial resistance, including four samples from this ISC5 dataset (Downing et al., 2011): six of these would change the amino acid sequence. SNPs were assigned to the 5’ and 3’ UTR if they are within 1 Kb of the start or end of the gene (respectively). (i) Diagnostic SNPs for disomic clade within ISC5. 16 diagnostic SNPs unique to a subset of the ISC5 associated with an expansion of disomic strains from polysomic ancestors (BHU1216/0, BHU239/0, BHU271/0, BHU267/0, BHU800/1, BBU269/0, and BHU273/0). Each SNP had a read-depth allele frequency >0.978 within this group and a frequency <0.04 in the other strains not in this group. These SNPs coincided with a distinct chromosome copy number pattern in this group. Ref is the reference genome BPK282/0cl4 allele and Var is the observed derived one. (j) Linkage disequilibrium (LD) and SNP density across chromosomes in the core population. LD was computed as the r2 values between SNP pairs. SD is the standard deviation. The number of homozygous (hom) and heterozygous (het) mutations per chromosome across all 191 samples is shown. LD was markedly higher for chromosome 31, and lower for 1, 6, 7, 8 and 10. The number of recombinant pairs was largest on chr31 (469), which had the highest somy level and was the most SNP-dense chromosome. (k) A genomic map of SNPs in the core population whose ancestry could be assigned to a population. Using the 191x6 ChromoPainter model, the 610 ancestry-informative SNPs were examined to highlight those with probabilities of being donated given ISC group was >0.4: ISC2 (gold), ISC3 (grey), ISC4 (red), ISC5 (green), ISC6 (blue) and ISC7 (light blue). The representative samples used for the 191x6 ChromoPainter model were BD09 (ISC2), NEP123/6 (ISC3), BPK087/0cl1 (ISC4), BPK275/0cl18 (ISC5), BPK282/0cl4 (ISC6) and BHU200/0 (ISC7). SNPs with probabilities <0.4 are uncoloured. For chr32 and chr35, most recombinant pairs could be attributed to the eight ungrouped hybrid isolates. (l) Number and heterozygosity of ancestry-informative SNPs per isolate. The number of homozygous (hom) and heterozygous (het) SNPs distinguishing each core sample (191 total) from six representative samples from each genetically distinct ISC population: ISC2 (BD09), ISC3 (NEP123/6), ISC4 (BPK087/0cl1), ISC5 (BPK275/0cl18), ISC6 (BPK282/0cl4) and ISC7 (BHU200/0). There were 610 informative SNPs with an ancestry probability threshold >0.4. The relative ratio of homozygous and heterozygous SNPs denoted the genetic relatedness of the strains. Samples possessing recent common ancestry with the representative strains had few homozygous SNPs but the same rate of heterozygous SNPs compared to other representative samples. Similarly, samples possessing recent common ancestry with multiple representative samples had fewer homozygous SNPs and no change in the heterozygous SNPs. This was apparent for highlighted ungrouped samples BHU815/0, BHU764A1, BHU274/0, BHU574cl4, BHU581cl2, BHU572cl3 (all ISC5/6/7: green and blue); and BHU744/0, BHU774/0 (ISC6/7: blue). In many groups haplotype segments were shared across isolates widely separated in sampling date. For example, in ISC3, 35 haplotype segments were shared between NEP123/6 (2004), BPK507/0 (December 2009), BPK515/0, BPK518/0 and BPK519/0 (all February 2010); 53 haplotype segments were shared between BPK157/0cl5 (October 2002) and BPK602/0 (February 2011) (both ISC9); 44 between BPK280/0 (August 2003) and BPK615/0 (March 2011; both ungrouped); 31 between BHU274/0 (ungrouped, August 2007) with BHU1199/0 (ISC9, November 2010). (m) Distribution of copying probabilities of SNPs in two representative isolates. Expected copying probabilities from Chromopainter assigning ancestry to SNPs from BHU581/0 (Ungrouped, left) and BPK275/0cl18 (ISC5, right) using probabilities derived from a model assigning SNP ancestry to representative samples from ISC2/3/4/5/6/7. Each column shows the percentage of SNPs assigned to each percentage probability of being donated from each population (eg ISC2) for a particular sample (eg BHU581/0). If many SNPs had high probabilities of being donated from a particular population, this implied that sample shared a recent ancestor with that population. BHU581/0 was a putative ISC5/6/7 hybrid, and had 9.4% of its SNPs with ancestry from ISC6 or ISC7, 7.2% from ISC7, and 10.9% from ISC5. In contrast, BPK275/0cl18 was the representative sample for ISC5, and 44.3% of its SNPs had confident ancestry from ISC5. No SNPs from any comparison were assigned probabilities between 50–95%, hence these rows are omitted from the table. These ancestry patterns indicated that BPK035/0cl1-BPK043/0cl2, BHU1011/0-BHU770/0cl1, BPK280/0-BPK615/0 and BPK158/0cl9 represented ISC4-related isolates but not mixtures. BHU774/0 and BHU744/0, showed evidence of both ISC6- and ISC7-derived segments. The six ungrouped samples (BHU274/0, BHU815/0, BHU764/0cl1, BHU574cl4, BHU572cl3 and BHU581cl2) were sampled between February 2007 and February 2010 and had high numbers of heterozygous SNPs. High heterozygosity could be symptomatic of genetically distinct parents in a hybrid. For example, if BHU572cl3 and BHU581cl2 (both May 2009) were mixes of ISC5 (BHU573/0, May 2009) and ISC7 (BHU200/0, July 2006) then only 16 out of 280 genotypes (BHU572cl3) and 15 out of 278 genotypes (BHU581cl2) would be unexplained. This was evaluated for different haplotype sharing options and matched the timing of samples (BHU572cl3, BHU574cl4 and BHU581cl2 in relation to BHU573/0; BHU815/0 in relation to BHU816/0). SNPs assigned to populations with a high probability in the 191x6 model provided a framework for predicting strain membership as well as admixture. Comparing the relative difference in distributions of homozygous and heterozygous SNPs between candidate parents within the sample set was sufficient to identify shared ancestry where related lineages were sampled, and visible in the aligned genotypes of the six strains with segments of both ISC5 and ISC6 ancestry. (n) Percentage of haplotype segment copies shared within and between groups. Recipients are shown as columns and donors as rows. Groups with complex ancestries received more segments than they donated: ISC8 from ISC5, ISC6 and ISC7; ISC3, ISC9 and ISC10 from ISC5. These 'recipient' groups were also associated with long external branch lengths in phylogenies. The expected probability per SNP in each population for the 191x191 co-ancestry analysis indicated that strains assigned to populations by Structure were assigned the same one with FineStructure (ISC2/3/4/5/6/7): ISC2 (copying probability >99%), ISC3 (>97%), ISC4 (>98%), ISC5 (>99%), ISC6 (>90%) and ISC7 (>97%). Genetically heterogeneous groups showed a higher rate of segments were received than donated from specific populations: ISC8 from ISC5 (11.3% vs 1.7%), ISC6 (16.6% vs 4.7%) and ISC7 (16.5% vs 6.2%). ISC9 (10.0% vs 0.8%) and ISC10 also received more from ISC5 (11.1% vs 2.5%). ISC5 also donated more to ISC3 (11.0% v 1.9%) suggesting possible ancestral genetic exchange, which may be associated with the long branch lengths leading to ISC3. (o) Four-gamete test for recombination. Four gamete tests showed evidence of recombination in the core population and ISC1. 2043 from 117,055 (1.75%) SNP pairs had a recombination signature in the core population. For the subset of 2,015 SNPs with no missing data R=0.7/Mb. The 8 hybrids were BHU815/0, BHU764A1, BHU274/0, BHU574cl4, BHU581cl2, BHU572cl3 (all ISC5+ISC6/7), BHU744/0 and BHU774/0 (both ISC6/7 only). Recombination was evident at 25% of SNPs in ISC1 (184,827 out of 739,581 SNP pairs) based on the four gamete test. This signature varied over 13-fold from chr31 (54.2% of pairs) to chr23 (4%). (p) SNP and indel association with SbV resistance. Fisher exact tests evaluated the association of SbV-resistance with SNPs (coded as the number of derived mutations: 0, 1 or 2) and indels (coded as the diploid number of inserted or deleted (-) basepairs). (q) SNP association with in-vitro SSG phenotype. SNPs with derived alleles in multiple ISC groups with 6+ non-reference genotypes for which the absolute difference in SSG-R and SSG-S allele frequency was >0.1 with the OR (odds ratio) >4 and Benjamini-Hochberg-Yekutieli corrected t-test p value <0.005. SNPs are ordered by chromosome and position. The SSG phenotypes were assessed as ln(EC50) values for the derived (D) and reference (R) alleles. To avoid arbitrarily categorising the sample in vitro SbV phenotypes as R and S using a given threshold EC50 value, we compared the log-scaled EC50 values of each allele pair at the 57 SNPs identified using t-tests (p<0.005): the SbV-R alleles had log-scaled EC50 values of 1.13–1.84 compared to S ones of 0.01–0.50 (Supplementary file 1). Consequently, the 57 SNPs together have power to differentiate SbV-R and SbV-S phenotypes. (r) SNP association with clinical outcome from SSG treatment. SNPs with derived alleles in multiple ISC groups with 6+ non-reference genotypes for which the absolute difference in SSG-cure and SSG-failure allele frequency was >0.1 with the OR (odds ratio) >4. The SNPs are ordered by chromosome and position. The SSG phenotypes were assessed for the derived (D) and reference (R) alleles. We examined lines for which the patient was treated with SSG and was either cured (n=21) or treatment failed (n=14: death, relapse, no response to treatment) to test a null hypothesis that SSG outcome and SNPs were unlinked. An alternative hypothesis was that SSG-cure patients could be infected with SbV-S parasites, and SSG-fail with SbV-R (Supplementary file 2). 32 SNPs were associated with SSG cure (OR>4), including 11 nonsynonymous SNPs, but none at a significant level due to the small sample size. (s) Relative strength of purifying selection between ISC populations. Relative strength of purifying selection between ISC populations (columns 1 and 2) as ascertained as the relative rate of accumulation of derived alleles computed as R for all (top), nonsynonymous (middle) and synonymous (bottom) sites. We tested for differential selective pressures among the ISC groups where observed values of R statistics deviated by at least four standard errors (SE) from the expected value of 1 (Do et al 2015). The genome-wide values are shown along with the weighted block jackknife confidence intervals (lower, median, upper). Derived alleles were more frequent in ISC4 than ISC5, suggesting stronger purifying selection in ISC5. Comparing ISC2 (Bangladesh) to ISC3-7 (Nepal/India), we found a genome-wide (RG) excess of derived alleles had accumulated in the latter (RG=0.74, 6.15*SE). This was not directly attributable to selection because the difference was present in both nonsynonymous (RN=0.59, 4.42*SE) or synonymous sites (RS=0.66, 10.82*SE). ISC3/4/5/6 shared genetic ancestors dating to approximately the same origin, so a comparison of their rates should indicate variation in the relative accrual of derived alleles. Despite possessing the largest sample size (n=52) of all ISC populations, ISC5 gained significantly fewer derived alleles genome-wide compared to ISC3 (RG=0.75, 11.39*SE), ISC4 (RG=0.65, 7.37*SE) and to a lesser extent, ISC6 (RG=0.82, 3.38*SE). This trend was present in both nonsynonymous and synonymous SNPs but not significant for most comparisons (an exception was comparing synonymous changes in ISC5 vs ISC4 (RS=0.37, 7.80*SE). Similarly, the rate of accumulation of homozygous derived SNPs (R2 values) was higher in ISC4 relative to ISC5 (R2G=0.90, 3.87*SE). Antimonial in vitro resistance was highest in ISC5 (ln[EC50]=1.38 ± 1.45) and lowest in ISC4 (0.13 ± 1.12). Consequently, the relative difference in the rate of accumulation of derived alleles in ISC5 that was predominantly isolated in India may reflect different exposure to drug selection compared to ISC4 (mainly from Nepal). Further work is required to quantify this in relation to variable antimonial resistance levels among samples in each population. ISC7 was a genetic subgroup of ISC6 that may be a recent radiation or have undergone a substantial population bottleneck. No genome-wide difference was evident in terms of selction, but fewer derived synonymous SNPs were found in ISC6 (RS=2.18, 4.44*SE), suggestive stronger hitchhiking of alleles in ISC7. (t) Multi-allelic sites. Sites with multiple derived alleles compared to reference genome sample BPK282/0cl4 in the core population (7, top) and ISC1 (10, bottom).