Genome-wide association study in quinoa reveals selection pattern typical for crops with a short breeding history

Abstract
Editor's evaluation
eLife digest
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Quinoa germplasm preserves useful and substantial genetic variation, yet it remains untapped due to a lack of implementation of modern breeding tools. We have integrated field and sequence data to characterize a large diversity panel of quinoa. Whole-genome sequencing of 310 accessions revealed 2.9 million polymorphic high confidence single nucleotide polymorphism (SNP) loci. Highland and Lowland quinoa were clustered into two main groups, with F_ST divergence of 0.36 and linkage disequilibrium (LD) decay of 6.5 and 49.8 kb, respectively. A genome-wide association study using multi-year phenotyping trials uncovered 600 SNPs stably associated with 17 traits. Two candidate genes are associated with thousand seed weight, and a resistance gene analog is associated with downy mildew resistance. We also identified pleiotropically acting loci for four agronomic traits important for adaptation. This work demonstrates the use of re-sequencing data of an orphan crop, which is partially domesticated to rapidly identify marker-trait association and provides the underpinning elements for genomics-enabled quinoa breeding.

Editor's evaluation

This is a comprehensive study of genomic and phenotypic diversity in the orphan crop quinoa. Based on whole genome resequencing of 310 accessions and field phenotyping of the same set of accessions for two years, the study identified the genetic basis of agronomically important traits. Based on this promising work, there will likely be scope for quick improvement of this orphan crop through breeding.

https://doi.org/10.7554/eLife.66873.sa0

eLife digest

As human populations grow and climate change tightens its grip, developing nutritious crops which can thrive on poor soil and under difficult conditions will become a priority. Quinoa, a harvest currently overlooked by agricultural research, could be an interesting candidate in this effort.

With its high nutritional value and its ability to tolerate drought, frost and high concentrations of salt in the soil, this hardy crop has been cultivated in the Andes for the last 5,000 to 7,000 years. Today its commercial production is mainly limited to Peru, Bolivia, and Ecuador. Pinpointing the genetic regions that control traits such as yields or flowering time would help agronomists to create new varieties better suited to life under northern latitudes and mechanical farming.

To identify these genes, Patiranage et al. grew 310 varieties of quinoa from all over the world under the same conditions; the genomes of these plants were also examined in great detail. Analyses were then performed to link specific genetic variations with traits relevant to agriculture, helping to pinpoint changes in the genetic code linked to differences in how the plants grew, resisted disease, or produced seeds of varying quality. Candidate genes likely to control these traits were then put forward.

The study by Patiranage et al. provides a genetic map where genes of agronomical importance have been precisely located and their effects measured. This resource will help to select genetic profiles which could be used to create new quinoa breeds better adapted to a changing world.

Introduction

Climate change poses a great threat to crop production worldwide. In temperate climates of the world, higher temperatures and extended drought periods are expected. Moreover, crop production in industrialized countries depends on only a few major crops resulting in narrow crop rotations. Therefore, rapid transfer of wild species into crops using genetic modification and targeted mutagenesis is currently discussed (Li et al., 2018; Stetter et al., 2017). Alternatively, orphan crops with a long tradition of cultivation but low breeding intensity can be genetically improved by genomics-assisted selection methods. Quinoa (Chenopodium quinoa Willd.) is a pseudocereal crop species with a long history of cultivation. It was first domesticated about 5000–7000 years ago in the Andean region. Quinoa was a staple food during the pre-Columbian era, and the cultivation declined after the introduction of crops like wheat and barley by the Spanish rulers. Owing to diversity, biotic and abiotic stress tolerance, and ecological plasticity, quinoa can adapt to a broad range of agroecological regions (González et al., 2015; Ruiz et al., 2013). Due to its high seed protein content and favorable amino acid composition, the nutritional value is even higher than beef, fish, and other major cereals (Abugoch James, 2009; Vega-Gálvez et al., 2010). These favorable characteristics contributed to the increasing worldwide popularity of quinoa among consumers and farmers.

A spontaneous hybridization event between two diploid species between 3.3 and 6.3 million years ago gave rise to the allotetraploid species quinoa (2n = 4x = 36) with a genome size of 1.45–1.5 Gb (nuclear DNA content 1C=1.49 pg) (Kolano et al., 2011; Palomino et al., 2008). A reference genome of the coastal Chilean quinoa accession PI 614886 has been published with 44,776 predicted gene models and whole-genome re-sequencing of Chenopodium pallidicaule and Chenopodium suecicum species, close relatives of the A and B subgenome donor species, respectively (Jarvis et al., 2017). The organellar genomes are originated from the A-genome ancestor (Maughan et al., 2019).

Quinoa belongs to the Amaranthaceae, together with some other economically important crops like sugar beet, red beet, spinach, and amaranth. It reproduces sexually after self-pollination. Quinoa accessions are typically homozygous inbred lines. Nonetheless, heterozygosity in some accessions has been reported, indicating cross-pollination (Christensen et al., 2007). The inflorescences are panicles, which are often highly branched. Florets are tiny, which is a significant obstacle for hand-crossing. However, routine protocols for F₁ seed production in combination with marker-assisted selection have been developed recently (Emrani et al., 2020; Peterson et al., 2015).

Systematic breeding of quinoa is still in its infancy compared to major crops. Until recently, breeding has been mainly limited to Bolivia (Gandarillas, 1979) and Peru (Bonifacio et al., 2013), the major growing quinoa areas. Therefore, quinoa can be regarded as a partially domesticated crop. Many accessions suffer from seed shattering, branching, and non-appropriate plant height (PH), typical domestication traits. The major breeding objectives are apart from these characters: grain yield and seed size, downy mildew resistance, synchronized maturity, stalk strength, and low saponin content (Gomez‐Pando, 2015). In the past years, activities have been intensified to breed quinoa genotypes adapted to temperate environments, for example, Europe, North America, and China (Murphy, 2018). Here, the major problem is adapting to long-day conditions because quinoa is predominantly a short-day plant due to its origin from regions near the equator.

There are only a few studies about the genetic diversity of quinoa. They were mainly based on phenotypic observations (Gandarillas et al., 1979; Wilson, 1988) and low throughput marker systems like random amplified polymorphic DNA (Ruas et al., 1999), amplification fragment length polymorphisms (Rodríguez and Isla, 2009), and microsatellites (Mason et al., 2005). Maughan et al., 2012, used five bi-parental populations to identify ca. 14,000 single nucleotide polymorphisms (SNPs), from which 511 KASP markers were developed. Genotyping 119 quinoa accessions gave the first insight into the population structure of this species (Maughan et al., 2012). Now, the availability of a reference genome enables genome-wide genotyping. Jarvis et al., 2017, re-sequenced 15 accessions and identified ca. 7.8 million SNPs. In another study, 11 quinoa accessions were re-sequenced, and 8 million SNPs and ca. 842,000 indels were identified (Zhang et al., 2017).

Our study aimed to analyze the population structure of quinoa and patterns of variation by re-sequencing a diversity panel encompassing germplasm from all over the world. Using millions of markers, we performed a genome-wide association study using multiple year field data. Here, we identified QTLs (quantitative trait loci) that control agronomically important traits important for breeding cultivars to be grown under long-day conditions. Our results provide information for further understanding the genetic basis of agronomically important traits in quinoa and will be instrumental for future breeding.

Results

Re-sequencing 310 quinoa accessions reveal high sequence variation

We assembled a diversity panel made of 310 quinoa accessions representing regions of major geographical distributions of quinoa (Figure 1—figure supplement 1). The diversity panel comprises accessions with different breeding histories (Supplementary file 1a). We included 14 accessions from a previous study, of which 7 are wild relatives (Jarvis et al., 2017). The mean mapped read depth ranged from 4.07 to 14.55, with an average of 7.78, indicating an adequate mapping quality required for accurate SNP calling despite the relatively modest sequencing depth. We mapped sequence reads to the reference genome V2 (CoGe id60716). Using mapping reads, we identified 45,330,710 unfiltered SNPs.

After filtering the initial set of SNPs, we identified 4.5 million SNPs in total for the base SNP set. We further filtered the SNPs for MAF >5% (HCSNPs). We obtained 2.9 million high confidence SNPs for subsequent analysis (Supplementary file 1b). Across the whole genome, the average SNP density was 2.39 SNPs/kb. However, SNP densities were highly variable between genomic regions and ranged from 0 to 122 SNPs/kb (Figure 1—figure supplement 2). We did not observe significant differences in SNP density between the two subgenomes (a subgenome 2.43 SNPs/kb; B subgenome 2.35 SNPs/kb). Moreover, we did not see any correlation between sequencing depth and heterozygosity (Figure 1—figure supplement 2b), which indicates an adequate mapping quality required for accurate SNP calling. In an additional analysis, we divided the filtered SNPs into homozygous and heterozygous SNPs for each sample. Then, we calculated the mean read depth (DP) and genotype quality (GQ) of each sample separately for the homozygous and heterozygous fraction of the genome (Figure 1—figure supplement 3). Mean GQ of the heterozygous SNP calls was 61.34, whereas the mean GQ of homozygous SNP calls was 21.19, indicating that a higher stringency was used for the heterozygous SNP calls. We also compared the DP with the GQ for both filtered and unfiltered SNPs. The results indicated that higher GQ values were used for low DP regions in order to ensure correct genotype calls. Then, we split the SNPs by their functional effects as determined by SnpEff (Cingolani et al., 2012a). Among SNPs located in non-coding regions, 598,383 and 617,699 SNPs were located upstream (within 5 kb from the transcript start site) and downstream (within 5 kb from the stop site) of a gene, whereas 114,654 and 251,481 SNPs were located within exon and intron sequences, respectively (Table 1). We further searched for SNPs within coding regions. We found 70,604 missense SNPs and 41,914 synonymous SNPs within coding regions of 53,042 predicted gene models.

Table 1

Summary statistics of genome-wide single nucleotide polymorphisms identified in 303 quinoa accessions.

Parameter	Type	All genotypes(quinoa only)	Highland population	Lowland population
SNP	Total	2,872,935	2,590,907	1,938,225
	Population-specific SNPs		1,512,301	859,619
	Intergenic	2,452,347	2,227,952	1,649,310
	Introns	251,481	101,546	172,692
	Exons	114,654	214,945	78,248
Nucleotide diversity			5.78 × 10^–4	3.56 × 10^–4
Tajima’s D			0.884	–0.384
Population divergences	F_ST (weighted average)		0.36

Linkage disequilibrium and population structure of the quinoa diversity panel

Across the whole genome, linkage disequilibrium (LD) decay between SNPs averaged 32.4 kb (at r²=0.2). We did not observe substantial LD differences between subgenome A (r²=0.2 at 31.9 kb) and subgenome B (r²=0.2 at 30.7 kb) (Figure 1—figure supplement 4). The magnitude of LD decay among chromosomes did not vary drastically except for chromosome Cq6B, which exhibited a substantially slower LD decay (Figure 1—figure supplement 4a b).

Then, we unraveled the population structure of the diversity panel. We performed principal component (PCA_(SNP)), population structure, and phylogenetic analyses. PCA_(SNP) showed two main clusters consistent with previous studies (Christensen et al., 2007). The first and second principal components (PC1_(SNP) and PC2_(SNP)) explained 23.35% and 9.45% of the variation, respectively (Figure 1a); 202 (66.67%) accessions were assigned to subpopulation 1 (SP1) and 101 (33.33%) to subpopulation 2 (SP2). SP1 comprised mostly Highland accessions, whereas Lowland accessions were found in SP2. PCA demonstrated a higher genetic diversity of the Highland population (Figure 1a). We also calculated PCs for each chromosome separately. For 16 chromosomes, the same clustering as for the whole genome was calculated. Nevertheless, two chromosomes, Cq6B and Cq8B, showed three distinct clusters (Figure 1—figure supplement 5). This is due to the split of the Lowland population into two clusters. We reasoned that gene introgressions on these two chromosomes from another interfertile group might have caused these differences. This is also supported by a slower LD decay on chromosome Cq6B (Figure 1—figure supplement 4b). This discrepancy also might arise due to the Lowland reference genome used for mapping the reads in this study (CoGe id60716), which may have structural differences compared to the genomes of Highland accessions.

Figure 1 with 9 supplements see all

Download asset Open asset

Genetic diversity and population structure of the quinoa diversity panel.

(a) Principal component analysis (PCA) of 303 quinoa accessions. PC1 and PC2 represent the first two analysis components, accounting for 23.35% and 9.45% of the total variation, respectively. The colors of dots represent the origin of accessions. Two populations are highlighted by different colors: Highland (light blue) and Lowland (pink). (b) Subpopulation-wise linkage disequilibrium (LD) decay in Highland (blue) and Lowland population (red). (c) Population structure is based on 10 subsets of SNPs, each containing 50,000 single nucleotide polymorphisms (SNPs) from the whole-genome SNP data. Model-based clustering was done in ADMIXTURE with different numbers of ancestral kinships (K=2 and K=8). K=8 was identified as the optimum number of populations. Left: Each vertical bar represents an accession, and color proportions on the bar correspond to the genetic ancestry. Right: Unrooted phylogenetic tree of the diversity panel. Colors correspond to the subpopulation.

Figure 1—source data 1 Principal component analysis (PCA) of 303 quinoa accessions.: https://cdn.elifesciences.org/articles/66873/elife-66873-fig1-data1-v2.xlsx
Download elife-66873-fig1-data1-v2.xlsx
Figure 1—source data 2 Subpopulation-wise linkage disequilibrium (LD) decay in Highland and Lowland population.: https://cdn.elifesciences.org/articles/66873/elife-66873-fig1-data2-v2.xlsx
Download elife-66873-fig1-data2-v2.xlsx
Figure 1—source data 3 Population structure is based on 10 subsets of single nucleotide polymorphisms (SNPs), each containing 50,000 SNPs from the whole-genome SNP data. Model-based clustering was done in ADMIXTURE with different numbers of ancestral kinships.: https://cdn.elifesciences.org/articles/66873/elife-66873-fig1-data3-v2.xlsx
Download elife-66873-fig1-data3-v2.xlsx

We also performed a population structure analysis with the ADMIXTURE software. We used 10 independent sets of 50,000 randomly chosen SNPs. Then, we performed ADMIXTURE analysis for each subset separately with a predefined number of genetic clusters K from 2 to 10 and different random seeds with 1000 bootstraps. The Q-matrices obtained were aligned with the greedy algorithm in the CLUMPP software (Jakobsson and Rosenberg, 2007). We used cross-validation to estimate the most suitable number of populations. Cross-validation error decreased as the K value increased, and we observed that after K=5, cross-validation error reached a plateau (Figure 1—figure supplement 6b). We observed allelic admixtures in some accessions, likely owing to their breeding history. The wild accessions were also clearly separated at the smallest cross-validation error of K=8, except two Chenopodium hircinum accessions (Figure 1c). This could be because C. hircinum is the closest crop wild relative; it also may have outcrossed with quinoa. The Highland population was structured into five groups, while the Lowland accessions were split into two subpopulations. The broad agro-climatic diversity of the Andean Highland germplasm might have caused a higher number of subpopulations.

For clustering accessions based on sequence polymorphism, we combined 10 subsets created for ADMIXTURE analysis and removed redundant SNPs among subsets. We analyzed the relationships between quinoa accessions using 434,077 SNPs. Constructing a maximum likelihood (ML) tree gave rise to five clades (Figure 2). We found that the placement of the wild quinoa species as distant outgroups was concordant with the previous reports confirming that quinoa was domesticated from C. hircinum (Jarvis et al., 2017). However, we found that the C. hircinum accession BYU 566 (from Chile) was placed at the base of both Lowland and Highland clades, which contrasts to Jarvis et al., 2017, where this accession was placed at the base of Lowland (coastal) quinoa. As expected, accessions from the USA and Chile are closely related because the USDA germplasm had been collected at these geographical regions.

Figure 2 with 1 supplement see all

Download asset Open asset

Maximum likelihood tree of 303 quinoa and 7 wild *Chenopodium* accessions from the diversity panel.

Colors depict the geographical origin of accessions.

Figure 2—source data 1 Maximum likelihood tree of 303 quinoa and 7 wild Chenopodium accessions from the diversity panel.: https://cdn.elifesciences.org/articles/66873/elife-66873-fig2-data1-v2.xlsx
Download elife-66873-fig2-data1-v2.xlsx

Genomic patterns of variations between Highland and Lowland quinoa

We were interested in patterns of variation in response to geographical diversification. We used PCA derived clusters and phylogenetic analysis to define two diverged quinoa populations (namely Highland and Lowland). These divergent groups are highly correlated with Highland and Lowland geographical origin. We used the base SNP set to analyze diversity statistics. To detect genomic regions affected by the population differentiation, we measured the level of nucleotide diversity using 10 kb non-overlapping windows (Varshney et al., 2017b; Figure 1—figure supplement 7). Then, we calculated the whole genome-wide LD decay across the two populations (Highland vs. Lowland); LD decayed more rapidly in Highland quinoa (6.5 kb vs. 49.8 kb, at r²=0.2) (Figure 1b). To measure nucleotide diversity, we scanned the quinoa genome with non-overlapping windows of 10 kb in length in both populations separately. The nucleotide diversity of the Highland population (5.78 × 10^–4) was 1.62-fold higher compared to the Lowland population (3.56 × 10^–4) (Table 1 and Figure 1—figure supplement 7). We observed left-skewed distribution and negative Tajima’s D value (–0.3883) in the Lowland populations indicating recent population growth (Table 1 and Figure 1—figure supplement 8). Genomic regions involved in adaptation to the Highlands should have much lower diversity in the Highland population than in the Lowland population, and genomic regions involved in adaptation to the Lowlands should have lower diversity in the Lowland population compared to the Highland population. Therefore, we calculated the nucleotide diversity ratios between Highland and Lowland to identify major genomic regions underlying the population differentiation (Figure 1—figure supplement 7). The F_ST value between populations was estimated to be 0.36, illustrating strong population differentiation. Concerning the regions of variants, exonic SNPs are substantially higher in the Highland population (Table 1 and Figure 1—figure supplement 7).

We conducted a local PCA to identify genomic regions with a strong population structure. The genome was divided into 50 kb non-overlapping windows, and PCA was calculated for each window using the lostruct program (Li and Ralph, 2019), which calculates a similarity score by comparing PCs obtained from each window. Similarity scores were then stored as a matrix and visualized using multidimensional scaling (MDS) transformation. Strong indications of the population structure are represented in the corners of the MDS analysis; usually, it follows a triangle, providing three corners (corner 1, corner 2, and corner 3) (Figure 1—figure supplement 9a). Candidate genomic regions were defined as the 1% of the MDS coordinates closest to each of the corners. They consist of the windows with the strongest genetic differentiation across the genome (Figure 1—figure supplement 9b). Then, we selected candidate genomic regions from each corner and calculated the PCs using SNPs present in those regions. SNPs from the candidate genomic regions of corner 1 structured the diversity panel into two clusters (Figure 1—figure supplement 9c). Corner 2 also resulted in two clusters, but clustering was not as strong as corner 1 regions (Figure 1—figure supplement 9d). Corner 3 separated accessions into three clusters similar to the PCA using the candidate genomic regions obtained from the nucleotide diversity ratio analysis (Figure 1—figure supplement 9f and g).

Then, we located the genes within candidate regions obtained from all three analyses. We identified 936, 953, and 546 candidate genes located within critical regions from the nucleotide diversity ratio (π (Highland/Lowland)), F_ST, and local PCA corner 1 (Figure 1—figure supplement 9h). Of these, only four genes were shared among all analyses, and 30 genes were shared between F_ST and genomic regions in corner 1 of the local PCA plot. Genomic regions in corner 3 of the local PCA plot and with a high nucleotide diversity ratio shared 102 genes (Figure 1—figure supplement 9i).

Mapping agronomically important trait loci in the quinoa genome

We evaluated 13 qualitative and 4 dichotomous traits on 350 accessions across 2 different years. At the time of the final harvest, 254 accessions did not yet reach maturity (senescence). However, all accessions produced seeds, therefore they could be investigated for seed-related traits. For all traits, substantial phenotypic variation among accessions was found. High heritabilities were calculated for all quantitative traits except for the number of branches (NoB) and stem lying (STL), which indicates that the phenotypic variation between the accessions is caused mainly by genetic variation (Supplementary file 1c). Trait correlations between years were also high (Figure 3—figure supplement 2), which is following the heritability estimates. We found the strongest positive correlation between days to maturity (DTM) and panicle length (PL), and PH, whereas the strongest negative correlation was found between DTM and thousand seed weight (TSW) (Figure 3—figure supplement 3). Then, a PCA was performed based on 12 quantitative traits (PCA_(PHEN)) to explore the phenotypic relationship among quinoa accessions. The first two PCs explained 62.12% of the phenotypic variation between the accessions. The score plot of the PCs showed a similar clustering pattern as the SNP-based PCA (PCA_(SNP)) (Figure 1 and Figure 3—figure supplement 4a). PCA_(PHEN) variables factor map indicated that most Lowland accessions were high yielding with high TSW and dense panicles. Moreover, these accessions are early flowering and early maturing, and they are short (Figure 3—figure supplement 4b). Phenotype-based PCA_(PHEN) also showed that the Lowland accessions are better adapted/selected for cultivation in long-day photoperiods than the Highland accessions. These results are in accordance with LD, nucleotide diversity, and Tajima’s D estimations, implying the Lowland accessions underwent a stronger selection during breeding.

Then, we calculated the best linear unbiased estimates (BLUE) of the traits investigated. In total, 294 accessions shared the re-sequencing information and phenotypes out of 350 phenotypically evaluated accessions. For GWAS analysis, we used ~2.9 million high confidence SNPs. We considered pairwise kinship value distribution to determine that all accessions could be used for GWAS analysis without conducting subpopulation-wise analyses (Figure 3—source data 4). In total, we identified 1480 significant (suggestive threshold: p<9.41e-7) SNP-trait associations (marker-trait association [MTAs]) for 17 traits (Figure 3—source data 4). The number of MTAs ranged from 4 (STL) to 674 (DTM) (Supplementary file 1d). In agreement with previous reports, we defined an MTA as ‘consistent’ when detected in both years (Varshney et al., 2019). We identified 600 consistent MTAs across 11 traits. TSW and DTM showed the highest number of ‘consistent’ associations. Among these, 143 MTAs are located within a gene, and 22 SNPs resulted in a missense mutation (Supplementary file 1e). MTAs for the duration from bolting to flowering (days to bolting to days to flowering), number of branching, seed yield, STL, and growth type (GT) were not ‘consistent’ between years (Figure 3—source data 4). This is also reflected by the low estimates of heritability for these traits, indicating considerably higher genotype × environment interactions. Using the SNPs not located in the repetitive regions, we identified 619 MTA across 11 traits, of which 291 MTAs are common between both analyses (Figure 3—figure supplement 7). Unique associations of 476 and 328 were identified from whole-genome SNPs and repeat masked SNPs, respectively. However, the comparison of GWAS results of PCs between whole-genome SNP set and repeat masked SNP set showed that highly significant associations could be identified even if the repetitive regions were excluded from the analysis (Figure 3—figure supplement 7b and c).

Candidate genes for agronomically important traits

First, we tested the resolution of our mapping study. We searched for candidate genes 50 kb down- and upstream of significant SNPs for two qualitative traits in quinoa, flower color and seed saponin content. To define candidate genes, we considered homologous genes that have already been functionally characterized in other species. We identified highly significant MTAs for stem color on chromosome Cq1B (69.72–69.76 Mb). Two genes (CqCYP76AD1 and CqDODA1) in this region exhibit high sequence homology to betalain synthesis pathway genes BvCYP76AD1 (Hatlestad et al., 2012) and BvDODA1 (Bean et al., 2018) from sugar beet (Figure 4—figure supplement 1). A significant MTA for saponin content on chromosome Cq5B between 8.85 and 9.2 Mb included 29 genes, of which the two BHLH25 were in LD with the significantly associated SNPs. BHLH25 genes were reported to control saponin content in quinoa (Jarvis et al., 2017; Figure 4—figure supplement 1b). This demonstrates that the marker density is high enough to narrow down to causative genes underlying a trait.

Then, we examined four quantitative traits. We obtained seven MTAs on chromosome Cq2A with the traits days to flowering, DTM, PH, and PL indicating pleiotropic gene action (Figure 3a and Supplementary file 1f). To further investigate genes that are pleiotropically active on different traits, we followed a multivariate approach (Solovieff et al., 2013). First, we performed a PCA of the four phenotypes (cross-phenotypes; genetically correlated traits). We found 89.94% of the variation could be explained by the first two PCs of the cross-phenotypes (PCA_(CP)) (Figure 3—figure supplement 5), which suggests that PCA _(CP) is suitable to reduce dimensions for a GWAS of cross-phenotypes. Since the PCA_(CP) revealed a similar clustering as PCA_(SNP), these analyses results provide preliminary indications that in quinoa, days to flowering, DTM, PH, and PL are highly associated with population structure and may reflect adaptation to diverse environments. Then, we performed a GWAS analysis using the first three PCs as traits (PC-GWAS) (Figure 3—figure supplement 5c). We identified strong associations on chromosomes Cq2A, Cq7B (PC1), and Cq8B (PC2) (Figure 3—figure supplement 6). Of 468 MTAs (PC1: 426 and PC2: 42) across the whole genome, 222 (PC1: 211 and PC2: 11) are located within 95 annotated genes. We found 14 SNPs that changed the amino acid sequence in 12 predicted protein sequences of associated genes (Supplementary file 1e). In the next step, we searched genes located 50 kb flanking to an MTA, considering a threshold that is below the genome-wide LD of the Lowland population. Altogether, 605 genes were identified (PC1: 520 and PC2: 85) (Supplementary file 1g).

Figure 3 with 7 supplements see all

Download asset Open asset

Genomic regions associated with important agronomic traits.

(a) Significant marker-trait associations (MTAs) for days to flowering (DTF), days to maturity (DTM), plant height (PH), and panicle density on chromosome Cq2A. Red color arrows indicate the single nucleotide polymorphism (SNP) loci pleiotropically acting on all four traits. (b) Boxplots showing the average performance for four traits over 2 years, depending on single nucleotide variation (C or G allele) within locus Cq2A_ 8093547. The P-values written above the boxplot are from Wilcoxon mean comparisons test (unpaired) between C and G allele. (c) Local Manhattan plot from region 8.04–8.14 Mb on chromosome Cq2A associated with PC1 of the DTF, DTM, PH, and panicle length (PL), and local linkage disequilibrium (LD) heatmap (bottom). The triangle below is the LD heatmap and the colors represent the pairwise correlation (r²) between individual SNPs. On top of the triangle, gene models are represented. Green color dots represent the strongest MTA (Cq2A_ 8093547).

Figure 3—source data 1 Genomic regions associated with important agronomic traits.: https://cdn.elifesciences.org/articles/66873/elife-66873-fig3-data1-v2.xlsx
Download elife-66873-fig3-data1-v2.xlsx
Figure 3—source data 2 Boxplots showing the average performance for four traits over 2 years.: https://cdn.elifesciences.org/articles/66873/elife-66873-fig3-data2-v2.xlsx
Download elife-66873-fig3-data2-v2.xlsx
Figure 3—source data 3 Local Manhattan plot from region 8.04–8.14 Mb on chromosome Cq2A associated with PC1 of the days to flowering, days to maturity, plant height, and panicle length.: https://cdn.elifesciences.org/articles/66873/elife-66873-fig3-data3-v2.xlsx
Download elife-66873-fig3-data3-v2.xlsx
Figure 3—source data 4 Manhattan plots from GWAS with data from 2018 (left), 2019 (center), and the mean of both years (right): The blue horizontal line indicates the suggestive threshold -log₁₀ (8.98E-7). The red horizontal line indicates the significant threshold (Bonferroni correction) -log₁₀(1.67e-8).: https://cdn.elifesciences.org/articles/66873/elife-66873-fig3-data4-v2.pdf
Download elife-66873-fig3-data4-v2.pdf
Figure 3—source data 5 Quantile-quantile plots of GWAS in 2 years, 2018 (left) and 2019 (center), and best linear unbiased estimates (right).: https://cdn.elifesciences.org/articles/66873/elife-66873-fig3-data5-v2.pdf
Download elife-66873-fig3-data5-v2.pdf

We found the region 8.05–8.15 Mb on chromosome Cq2A to be of special interest because it displays stable pleiotropic MTAs for days to flowering, DTM, PH, and PL. We identified five genes within this region and three of these were without known functions. The most significant SNP is located within the CqGLX2-2 gene, which encodes an enzyme of the glyoxalase family (Figure 3). The allele carrying a cytosine at the position with the most significant SNP is associated with early flowering, maturing, and short panicles and plants (Figure 3b). These traits are essential for the adaptation to long-day conditions.

TSW is an important yield component. We found a strong MTA between 63.2 and 64.87 Mb on chromosome Cq8B. Significantly associated SNPs were localized within two genes (Figure 4). One gene displays homology to PP2C, encoding a member of the phosphatase-2C (PP2C) family protein and the second gene encodes a member of the RING-type E3 ubiquitin ligase family. These genes were found to be involved in controlling seed size in soybean, maize, rice, soybean, and Arabidopsis (Li et al., 2019). We then checked haplotype variation and identified five and seven haplotypes for CqPP2C and CqRING genes, respectively. Accessions carrying PP2C_hap3 and RING_hap7 displayed larger seeds in both years (Figure 4 and Figure 4—figure supplement 2).

Figure 4 with 2 supplements see all

Download asset Open asset

Identification of candidate genes for thousand seed weight.

(a) Manhattan plot from chromosome Cq8B. Green and blue dots depict the *CqPP2C5* and the *CqRING* gene, respectively. (b) Top: Local Manhattan plot in the neighborhood of the *CqPP2C* gene. Bottom: Linkage disequilibrium (LD) heatmap. (c) Top: Local Manhattan plot in the neighborhood of the *CqRING* gene. Bottom: LD heatmap. Differences in thousand seed weight between five *CqPP2C* (d) and seven *CqRING* haplotypes (e). ANOVA was performed for each year separately to determine significance among haplotypes and grouping of pairwise multiple comparison was obtained using Tukey’s test. (blue and black colour letters are for different years).

Figure 4—source data 1 Manhattan plot from chromosome Cq8B. Green and blue dots are depicting the CqPP2C5 and CqRING.: https://cdn.elifesciences.org/articles/66873/elife-66873-fig4-data1-v2.xlsx
Download elife-66873-fig4-data1-v2.xlsx
Figure 4—source data 2 Manhattan plot from chromosome Cq8B. Green and blue dots are depicting the CqPP2C5.: https://cdn.elifesciences.org/articles/66873/elife-66873-fig4-data2-v2.xlsx
Download elife-66873-fig4-data2-v2.xlsx
Figure 4—source data 3 Differences in thousand seed weight between five CqPP2C (d) and seven CqRING haplotypes (e).: https://cdn.elifesciences.org/articles/66873/elife-66873-fig4-data3-v2.xlsx
Download elife-66873-fig4-data3-v2.xlsx

Downy mildew is one of the major diseases in quinoa, which causes massive yield damage. Notably, our GWAS identified strong MTA for resistance against this disease. The most significant SNPs are located in subgenome A (Figure 3—source data 4). Thus, the A-genome progenitor seems to be the donor of downy mildew resistance. In addition, we identified a candidate gene within a region 38.99–39.03 Mb on chromosome Cq2A, which showed the highest significant association (Figure 3—source data 4). This gene encodes a protein with an NBS-LRR (nucleotide-binding site leucine-rich repeat) domain often found in resistance gene analogs with a function against mildew infection (Zhang et al., 2019a).

Discussion

We assembled a diversity set of 303 quinoa accessions and 7 accessions from wild relatives. Plants were grown under northern European conditions, and agronomically important traits were studied. In total, 2.9 million SNPs were found after re-sequencing. We found substantial phenotypic and genetic variation. Our diversity set was structured into two highly diverged populations, and genomic regions associated with this diversity were localized. Due to a high marker density, candidate genes controlling qualitative and quantitative traits were identified. The high genetic diversity and rapid LD breakdown reflect the short breeding history of this crop.

We were aiming to assemble the first diversity set, which represents the genetic variation of this species. Therefore, we established a permanent resource that is genotypically and phenotypically characterized. We believe that this collection is important for future studies for the following reasons: we observed substantial phenotypic variation for all traits and high homogeneity within accessions. Moreover, low or absent phenotypic variation within accessions demonstrates homogeneity as expected for a self-pollinating species. Therefore, the sequence of one plant is representative of the whole accession, which is important for the power of the GWAS.

Today, over 16,000 accessions of quinoa are stored ex situ in seed banks in more than 30 countries (Rojas et al., 2015). Despite the enormous diversity, only a few accessions have been genotyped with molecular markers. We found a clear differentiation into Highland and Lowland quinoa. In previous studies, five ecotypes had been distinguished: Valley type, Altiplano type, Salar type, Sea level type, and Subtropical type (Murphy, 2018). Adaptation to different altitudes, tolerance to abiotic stresses such as drought and salt, and photoperiodic responses are the major factors determining ecotypes (Gomez‐Pando, 2015). In our study, we could further allocate the quinoa accessions to five Highland and two Lowland subpopulations. This demonstrates the power of high-density SNP mapping to identity finer divisions at higher K. The origin of accessions and ecotype differentiation could be meaningfully interpreted by combining the information from phylogenetic data and population structure. As we expected, North American accessions (accessions obtained from USDA), clustered with Chilean accessions, suggesting sequence-based characterization of ecotypes would be more informative and reproducible. Moreover, high-density SNP genotyping unveiled the origin of unknown or falsely labeled gene bank accessions, as recently proposed by Milner et al., 2019. The geographical origin of 52 accessions from our panel was unknown. We suggest using phylogenic data and admixture results to complement the available passport data. For instance, two accessions with origin recorded as Chile are closely related to Peruvian and Bolivian accessions, which suggests that they also originate from Highland quinoa.

What can we learn about the domestication of quinoa and its breeding history by comparing our results with data from other crops? LD decay is one parameter reflecting the intensity of breeding. LD decay in quinoa (32.4 kb) is faster than in most studies with major crop species, for example, rapeseed (465.5 kb) (Wu et al., 2019), foxtail millet (Setaria italica, 100 kb) (Jia et al., 2013), pigeon pea (Cajanus cajan, 70 kb) (Varshney et al., 2017a), soybean (150 kb) (Zhou et al., 2015), and rice (200 kb) (Mather et al., 2007). Although comparisons must be regarded with care due to different numbers of markers and accessions, different types of reproduction, and the selection intensity, the rapid LD decay in quinoa reflects its short breeding history and low selection intensity. Moreover, quinoa is a self-pollinating species where larger linkage blocks could be expected. However, cross-pollination rates in some accessions can be up to 17.36% (Silvestri and Gil, 2000), which is exploited by small Andean farmers who grow mixed quinoa accessions to ensure harvest under different biotic and abiotic stresses. This may facilitate a certain degree of cross-pollination and admixture providing opportunities for cultivar improvement.

Interestingly, the LD structure between Highland and Lowland populations is highly contrasting (6.5 vs. 49.8 kb), indicating larger LD blocks in the Lowland population. Low nucleotide diversity and negative Tajima’s D were also observed in the Lowland population compared to Highland quinoa. The population differentiation index and LD differences have been used to test the hypothesis of multiple domestication events. As an example, different domestication bottlenecks have been reported for japonica (LD decay: 65 kb) and indica rice (LD decay: 200 kb) (Xu et al., 2011). The estimated F_ST value from this study (0.36) is in the similar range of F_ST estimates in rice subspecies indica and japonica (0.55) (Huang et al., 2010) and melon (Cucumis melo) subspecies melo and agrestis (0.46) (Zhao et al., 2019). Two hypotheses have been proposed for the domestication of quinoa from C. hircinum: (1) one event that gave rise to Highland quinoa and subsequently to Lowland quinoa and (2) two separate domestication events giving rise to Highland and Lowland quinoa independently (Jarvis et al., 2017). However, our study does not follow the second hypothesis because C. hircinum accession BYU 566 was basal to both clades of the phylogenetic tree (Highland and Lowland). Moreover, our wild Chenopodium germplasm does not represent enough diversity for in-depth analysis of domestication events. Therefore, we propose three possible scenarios to explain significant differences in LD structure, nucleotide diversity, Tajima’s D and F_ST: (1) two independent domestication events with a substantial bottleneck on Lowland populations, (2) single domestication but strong population growth after adaptation of Lowland quinoa, or (3) strong adaptive selection after domestication. To understand the history and genetics of domestication, it will be necessary to sequence a large representative set of outgroup species such as C. berlandieri, C. hircinum, C. pallidicaule, and C. suecicum.

Apart from marker density and sample size, the power of GWAS depends on the quality of the phenotypic data. Plants were grown in Northern Europe. Therefore, the MTAs are, first of all, relevant for temperate long-day climates. The share of genetic variances and, thus, the heritabilities were high across environments. We expect higher genotype × environment interaction for flowering time, DTM, PH, and PL if short-day environments will be included because many accessions have a strong day-length response (data not shown). Differentiation of germplasm from different environments may be very strong and therefore bias the GWAS results. Nonetheless, all accessions reached the flowering stage and produced seeds. Therefore, bias toward maturity is a consequence of environmental adaptation or lack of adaptation to long-day conditions.

Furthermore, the genes controlling Mendelian traits precisely coincided or were in LD with significant SNP positions, as exemplified by the genes associated with saponin content and flower color. Hence, the diversity panel provides sufficient power to identify SNP-trait associations for important agronomic traits such as TSW and downy mildew tolerance. In different plant species, seed size is controlled by six different pathways (Li et al., 2019). We found two important genes associated with seed size from the Brassinosteroid (CqPP2C) and the ubiquitin-proteasome (CqRING) pathway. The non-functional allele of soybean PP2C1 is associated with small seeds (Lu et al., 2017). We detected a superior haplotype (PP2C_hap3), which is associated with larger seeds. CqRING encodes an E3 ubiquitin ligase protein. There are two RING-type E3 ubiquitins known as DA1 and DA2, involved in the seed size controlling pathway. They were found in Arabidopsis rice, maize, and wheat. Downy mildew is the most acute disease for quinoa, caused by the fungus Peronospora variabili (Choi et al., 2010). A recent study attempted the identification of genes based on a GWAS analysis in quinoa. However, no significant associations were found, probably due to the lack of power because of the low number of accessions used (61 and 88) (Colque-Little et al., 2021); nonetheless, the most significant SNPs of that study were comparable to the findings of the present study. In our study, a strong MTA provides preliminary indications that the NBS-LRR gene on chromosome Cq2A contributes to downy mildew resistance in quinoa. We propose using this sequence for marker-assisted selection in segregating F₂ populations produced during pedigree breeding, after the confirmation of its role in downy mildew resistance by functional analysis in quinoa.

In this study, the advantage of multivariate analysis of cross-phenotype association became obvious. We identified candidate genes with pleiotropic effects on days to flowering, DTM, PH, and PL. Genes involved in the flowering time pathway have been shown to have pleiotropic effects (Auge et al., 2019). However, flowering time is a major developmental stage that can influence other agronomical traits like PH and PL. Therefore, the correlation among traits may have led to the association of flowering time genes with multiple traits and should be further investigated. Interestingly, the most significant and consistent SNP association was residing within a putative GLX-2 ortholog. GLX genes, among other functions, have impacted cell division and proliferation in Amaranthus paniculatus (Chakravarty and Sopory, 1998). Therefore, the CqGLX-2 gene is one putative candidate for controlling day-length response. Furthermore, the genes identified within these regions that encode proteins with unknown functions should be subjected to future studies.

This study also has a major breeding perspective. We aimed to elucidate the potential of quinoa for cultivation in temperate climates. Evidently, many accessions are not adapted to northern European climate and photoperiod conditions because they flowered too late and did not reach maturity before October. Nevertheless, 48 accessions are attractive as crossing partners for breeding programs because they are insensitive to photoperiod or long-day responsive. Moreover, they are attractive due to their short PH, low tillering capacity, favorable inflorescence architecture, and high TSW. These are important characters for mechanical crop cultivation and combine harvesting. The MTA found in this study offers a perspective to use parents with superior phenotypes in crossing programs. We suggest a genotype-building strategy by pyramiding favorable alleles (haplotypes). In this way, also accessions from our diversity set, which are not adapted to long-day conditions but with favorable agronomic characters, will be considered. Then, favorable genotypes will be identified from offspring generations by marker-assisted selection using markers in LD with significant SNPs. Furthermore, the MTA from this study will be valid for allele mining in quinoa germplasm collections to identify yet unexploited genetic variation.

Materials and methods

Plant materials and growth conditions

Request a detailed protocol

We selected 350 quinoa accessions for phenotyping, and of these, 296 were re-sequenced in this study. Re-sequencing data of 14 additional accessions that had already been published (Jarvis et al., 2017) were also included in the study, together with the wild relatives (C. belandieri and C. hircinum) (Jarvis et al., 2017). These accessions represent different geographical regions of quinoa cultivation (Supplementary file 1a). Plants were grown in the field in Kiel, Northern Germany, in 2018 and 2019. (The local weather data is provided in Supplementary file 1h.) Seeds were sown in the second week of April in 35× multi-tray pots. Then, plants were transplanted to the field in the first week of May as single-row plots in a randomized complete block design with three blocks. The distances between rows and between plants were set to 60 and 20 cm, respectively. Each row plot contained seven plants per accession.

We recorded days to bolting (days to bolting) as BBCH51 and days to flowering as BBCH60 twice a week during the growth period. DTM was determined when plants reached complete senescence (BBCH94)(Stanschewski et al., 2021). If plants did not reach this stage, DTM was set as 250 days. In both years, plants were harvested in the second week of October. PH, PL, and the NoB were phenotyped at harvest. STL (Figure 3—figure supplement 1) was scored on a scale from 1 to 5, where score 1 indicates no STL. Similarly, panicle density was recorded on a scale from 1 to 7, where density 1 represents lax panicles, and panicle density 7 represents highly dense panicles. Flower color and stem color were determined by visual observation. Pigmented and non-pigmented plants were scored as 1 and 0, respectively. GT was classified into two categories and analyzed as a dichotomous trait as well. We observed severe mildew infection in 2019. Therefore, we scored mildew infection on a scale from 1 to 3, where 1 equals no infection, and 3 equals severe infection.

Statistical analysis

Request a detailed protocol

We calculated the BLUE of the traits across years by fitting a linear mixed model using the lme4 R package (Bates et al., 2015). We used the following model:

Genome sequencing and identification of genomic variations

Request a detailed protocol

For DNA extraction, two plants per genotype were grown in a greenhouse at the University of Hohenheim, and two leaves from a single 2-month-old plant were collected and frozen immediately. DNA was subsequently extracted using the AX Gravity DNA extraction kit (A&A Biotechnology, Gdynia, Poland) following the manufacturer’s instructions. The purity and quality of DNA were controlled by agarose gel electrophoresis and the concentration was determined with a Qubit instrument using SYBR green staining. Whole-genome sequencing was performed for 312 accessions at Novogene (China) using short-reads Illumina NovaSeq S4 Flowcell technology and yielded an average of 10 Gb of paired-end (PE) 2 × 150 bp reads with quality Q>30 Phred score per sample, which is equivalent to ~7× coverage of the haploid quinoa genome (~1.45 Gb). We then used an automated pipeline compiled based on the Genome Analysis Toolkit (Abrouk et al., 2020). Raw sequence reads were filtered with trimmomatic-v0.38 (Bolger et al., 2014) using the following criteria: LEADING:20; TRAILING:20; SLIDINGWINDOW:5:20; MINLEN:50. The filtered paired-end reads were then individually mapped for each sample against an improved version of the QQ74 quinoa reference genome (CoGe id60716) using BWA-MEM (v-0.7.17) (Li and Durbin, 2010) followed by sorting and indexing using samtools (v1.8) (Li et al., 2009). Duplicated reads were marked, and read groups were assigned using the Picard tools (http://broadinstitute.github.io/picard/). Variants were identified with GATK (v4.0.1.1) (McKenna et al., 2010; Van der Auwera et al., 2013) using the ‘--emitRefConfidence’ function of the HaplotypeCaller algorithm and ‘—heterozygosity’ value set at 0.005 to call SNPs and InDels for each accession. Individual g.vcf files for each sample were then compressed and indexed with tabix (v-0.2.6) (Li, 2011) and combined into chromosome g.vcf using GenomicsDBImport function of GATK. Joint genotyping was then performed for each chromosome using the function GenotypeGVCFs of GATK. To obtain high confidence variants, we excluded SNPs with the VariantFiltration function of GATK with the criteria: QD < 2.0; FS > 60.0; MQ < 40.0; MQRankSum < −12.5; ReadPosRankSum < −8.0 and SOR > 3.0. Then, SNP loci which contained more than 70% missing data, were filtered by VCFtools (Danecek et al., 2011) (v0.1.5), which resulted in our initial set of ~45 M SNPs for all the 332 accessions, including 20 previously re-sequenced accessions (Jarvis et al., 2017). All re-sequencing data are submitted to SRA under project id BioProject PRJNA673789.

Y_{i k j} = μ + {A c c e s s i o n}_{i} + {B l o c k}_{i} + {Y e a r}_{i} + (A c c e s s i o n \times B l o c k)_{i j} + (A c c e s s i o n \times Y e a r)_{i k} + {E r r o r}_{i j k}

where µ is the mean, Accession_i is the genotype effect of the ith accession, Block_j is the effect of the jth Block, Year_k is the effect of the kth year, (Accession × Block)_ij is the Accession-Block interaction effect, Accession × Year_ik is the accession-year interaction effect, Error_ijk is the error of the jth block in the kth year. We treated all items as random effects for heritability estimation, and for BLUE, accessions were treated as fixed effects. We analyzed the PCs of phenotypes using the R package FactoMineR (Lê et al., 2008).

In our panel, we had three triplicates for quality checking and nine duplicates between Jarvis et al., 2017, and 312 newly re-sequenced accessions. In order to remove duplicates, as a preliminary analysis, we removed SNP loci with a minimum mean depth < 5 across samples and SNP loci with more than 5% missing data. Then, we filtered SNPs with a minor allele frequency lower than 0.05 (MAF < 0.05). After these filtering steps, we obtained a VCF file that contained 229,017 SNPs. Then, we construct a ML tree. First, we used the modelFinder (Kalyaanamoorthy et al., 2017) in IQ-TREE v1.6.619 (Nguyen et al., 2015) to determine the best model for ML tree construction. We selected GTR + F + R8 (GTR: General time-reversible, F: Empirical base frequencies, R8: FreeRate model) as the best fitting model according to the Bayesian information criterion (BIC) estimated by the software. Next, we used 1000 replicates with ultrafast bootstrapping (UFboots) (Hoang et al., 2018) to check the reliability of the phylogenetic tree. To visualize the phylogenetic tree, we used the Interactive Tree Of Life tool (https://itol.embl.de/) (Letunic and Bork, 2016). Then, based on the phylogenetic tree, we removed duplicate accessions and accessions with unclear identities. After the quality control, we retained 310 accessions (303 quinoa accessions and 7 wild Chenopodium accessions).

Then, we used the initial SNP set and defined two subsets using the following criteria: (1) A base SNP set of 5,817,159 biallelic SNPs obtained by removing SNPs with more than 50% missing genotype data, minimum mean depth less than five, and minor allele frequency less than 1%. (2) A high confidence (HCSNP) set of 2,872,935 SNPs from the base SNP set was created by removing SNPs with a minor allele frequency of less than 5%. We used custom scripts for dividing homozygous and heterozygous SNP regions of each sample. Then, using custom bash scripts, mean DP and GQ calculations were carried out. The base SNP set was used for the diversity statistics, and the high confidence SNPs set was used for GWAS analysis. To obtain the number of population-specific SNPs, we applied 5% MAF separately for each population.

We annotated the high confidence SNP using SnpEff 4.3T (Cingolani et al., 2012a) and a custom database (Cingolani et al., 2012b) based on the QQ74 reference genome and annotation (CoGe id60716). Afterward, we extracted the SNP annotations using SnpSift (Cingolani et al., 2012a). Based on the annotations, SNPs were mainly categorized into five groups: (1) upstream of the transcript start site (5 kb), (2) downstream of the transcript stop site (5 kb), (3) coding sequence (CDS), (4) intergenic, and (5) intronic. We used SnpEff to categorize SNPs in coding regions based on their effects: synonymous, missense, splice acceptor, splice donor, splice region, start lost, start gained, stop lost, and spot retained.

Phylogenetic analysis and population structure analysis

Request a detailed protocol

For population structure analysis, we employed SNP subsets, as demonstrated in previous studies, to reduce the computational time (Wang et al., 2018). We created 10 randomized SNP sets, each containing 50,000 SNPs. First, the base SNP set was split into 5000 subsets of an equal number of SNPs to create subsets. Then, 10 SNPs from each subset were randomly selected, providing a total of 50,000 SNPs in a randomized set (randomized 50k set). We then repeated this procedure for nine more times and finally obtained 10 randomized 50k sets. Population structure analysis was conducted using ADMIXTURE (version 1.3) (Alexander et al., 2009). We ran ADMIXTURE for each subset separately with a predefined number of genetic clusters K from 2 to 10 and varying random seeds with 1000 bootstraps. Also, we performed the cross-validation procedure for each run. Obtained Q matrices were aligned using the greedy algorithm in the CLUMPP software (Jakobsson and Rosenberg, 2007). Population structure plots were created using custom R scripts. We then combined SNP from the 10 subsets to create a single SNP set of 434,077 unique SNPs for the phylogenetic analysis. We used the same method mentioned above to create the phylogenetic tree. Here, we selected the model GTR + F + R6 based on the BIC estimates. For the PCA, we used the high confidence SNP set and analysis was done in R package SNPrelate (Zheng et al., 2012). We estimated the top 10 PCs. The first (PC1) and second (PC2) were plotted using custom R scripts.

Genomic patterns of variation

Request a detailed protocol

Using the base SNP set, we calculated nucleotide diversity (π) for subpopulations and π ratios for Highland and Lowland population regions with the top 1% ratios of π _Highland/ π _Lowland candidate regions for population divergence. We also estimated Tajima’s D values for both populations to check the influence of selection on populations. F_ST values were calculated between Highland and Lowland populations using the 10 kb non-overlapping window approach. Nucleotide diversity, Tajima’s D, and F_ST calculations were carried out in VCFtools (v0.1.5) (Danecek et al., 2011). We also performed a local PCA using the template of the lostruct program (Li and Ralph, 2019) to identify genomic regions with strong population structures. The genome was divided into 50 kb non-overlapping windows, and a PCA was conducted for each window using lostruct. We obtained candidate regions based on the local PCA (top 1% threshold). Genes located in these regions were considered as candidate genes underlying Highland and Lowland population divergence.

LD analysis

Request a detailed protocol

First, we calculated LD in each population separately (Highland and Lowland). Then, LD was calculated in the whole population, excluding wild accessions. For LD calculations, we further filtered the high confidence SNP set by removing SNPs with >80% missing data (Varshney et al., 2019). Using a set of 2,513,717 SNPs, we calculated the correlation coefficient (r²) between SNPs up to 300 kb apart by setting -MaxDist 300 and default parameters in the PopLDdecay software (Zhang et al., 2019b). Pairwise distance at r²=0.20 is defined as the threshold for genome-wide LD decay. LD decay was plotted using custom R scripts based on the ggplot2 package.

Genome-wide association study

Request a detailed protocol

We used the BLUE of traits and high confidence SNPs for the GWAS analysis. Morphological traits were treated as dichotomous traits and analyzed using generalized mixed linear models with the lme4 R software package (Bates et al., 2015). We used population structure and genetic relationships among accessions to minimize false-positive associations. Population structure represented by the PC was estimated with the SNPrelate software (Zheng et al., 2012). Genetic relationships between accessions were represented by a kinship matrix calculated with the efficient mixed-model association expedited (EMMAX) software (Kang et al., 2010) using high confidence SNPs (Figure 2—figure supplement 1). Then, we performed an association analysis using the mixed linear model for the whole population, including K and P matrices in EMMAX. We estimated the effective number of SNPs (n=1,062,716) using the Genetic type I Error Calculator (GEC) (Li et al., 2012) and calculated quantile-quantile plots (Figure 3—source data 5). We set the significant p-value threshold (Bonferroni correction, 0.05 /n, -log₁₀(4.7e-08)=7.32) and suggestive significant threshold (1 /n, -log₁₀(9.41e-7)=6.02) to identify significant loci underlying traits. Next, we checked how repetitive sequences influence the GWAS analysis. Here, we used repeat masker to exclude all the SNPs that are located in repetitive regions of the genome. After removing SNPs on repetitive sequences, we obtained 1,906,734 SNPs and GWAS was carried out following the same method explained previously. We plotted SNP p-values on Manhattan plots using the qqman R package (Turner, 2014).

Data availability

The raw sequencing data have been submitted to the NCBI Sequence Read Archive (SRA) under the BioProject PRJNA673789. Quinoa reference genome version 2 is available at CoGe database under genome id 53523. Phenotype data and ready-use genotype data (vcf file) are available at https://doi.org/10.5061/dryad.zgmsbcc9m. A detailed description of the germplasm, phenotyping methods, and phenotypes are available at https://quinoa.kaust.edu.sa/#/ (Stanschewski et al., 2021). Seeds are available from the public gene banks such as IPK Gatersleben (https://www.ipk-gatersleben.de/en/genebank/) and the U.S. National Plant Germplasm System (https://npgsweb.ars-grin.gov/gringlobal/search). Custom scripts used for SNP calling are available on GitHub: https://github.com/IBEXCluster/IBEX-SNPcaller/blob/master/workflow.sh (copy archived at swh:1:rev:dad4095515fdb80f390bee8893bb6f4453e12a73). Additional information of other custom scripts used for making plots are available at https://github.com/DilanSarange/quinoaDPgwas, (copy archived at swh:1:rev:4e68ba3c64f3e67d156883c6c40dbd4d3701839c).

The following data sets were generated

1. Patiranage DSR
2. Rey E
3. Emrani N
4. Wellman G
5. Schmid K
6. Schmöckel SM
7. Tester M
8. Jung C
(2021) Dryad Digital Repository
Genome-wide association study in quinoa reveals selection pattern typical for crops with a short breeding history.

https://doi.org/10.5061/dryad.zgmsbcc9m
1. Jung C
(2022) NCBI BioProject
ID PRJNA673789. Chenopodium quinoa diversity genome re-sequencing.

https://www.ncbi.nlm.nih.gov/bioproject/PRJNA673789

The following previously published data sets were used

1. Jarvis DE
2. Ho YS
3. Lightfoot DJ
4. Schmöckel SM
5. Li B
6. Borm TJA
7. Ohyanagi H
8. Mineta K
9. Michell CT
10. Saber N
11. Kharbatia NM
12. Rupper RR
13. Sharp AR
14. Dally N
15. Boughton BA
16. Woo YH
17. Gao G
18. Schijlen E
19. Guo X
20. Momin AA
21. Negrão S
22. Al-Babili S
23. Gehring C
24. Roessner Ute
25. Jung C
26. Murphy K
27. Arold ST
28. Gojobori T
29. Gerard van der Linden C
30. van Loo EN
31. Jellen EN
32. Maughan PJ
33. Tester M
(2017) Phytozome database
ID chenopodiumdb. The genome of Chenopodium quinoa.

http://www.cbrc.kaust.edu.sa/chenopodiumdb/

References

1. Abrouk M
2. Ahmed HI
3. Cubry P
4. Šimoníková D
5. Cauet S
6. Pailles Y
7. Bettgenhaeuser J
8. Gapa L
9. Scarcelli N
10. Couderc M
11. Zekraoui L
12. Kathiresan N
13. Čížková J
14. Hřibová E
15. Doležel J
16. Arribat S
17. Bergès H
18. Wieringa JJ
19. Gueye M
20. Kane NA
21. Leclerc C
22. Causse S
23. Vancoppenolle S
24. Billot C
25. Wicker T
26. Vigouroux Y
27. Barnaud A
28. Krattinger SG
(2020) Fonio millet genome unlocks African orphan crop diversity for agriculture in a changing climate
Nature Communications 11:4488.

https://doi.org/10.1038/s41467-020-18329-4
- PubMed
- Google Scholar
1. Abugoch James LE
(2009) Quinoa (Chenopodium quinoa Willd.): composition, chemistry, nutritional, and functional properties
Advances in Food and Nutrition Research 58:1–31.

https://doi.org/10.1016/S1043-4526(09)58001-1
- PubMed
- Google Scholar
(2009) Fast model-based estimation of ancestry in unrelated individuals
Genome Research 19:1655–1664.

https://doi.org/10.1101/gr.094052.109
- PubMed
- Google Scholar
(2019) Pleiotropy in developmental regulation by flowering-pathway genes: is it an evolutionary constraint?
The New Phytologist 224:55–70.

https://doi.org/10.1111/nph.15901
- PubMed
- Google Scholar
1. Bates D
2. Mächler M
3. Bolker B
4. Walker S
(2015) Fitting linear mixed-effects models using lme4
J Stat Soft 67:48.

https://doi.org/10.18637/jss.v067.i01
- Google Scholar
1. Bean A
2. Sunnadeniya R
3. Akhavan N
4. Campbell A
5. Brown M
6. Lloyd A
(2018) Gain-of-function mutations in beet DODA2 identify key residues for betalain pigment evolution
The New Phytologist 219:287–296.

https://doi.org/10.1111/nph.15159
- PubMed
- Google Scholar
(2014) Trimmomatic: a flexible trimmer for Illumina sequence data
Bioinformatics 30:2114–2120.

https://doi.org/10.1093/bioinformatics/btu170
- PubMed
- Google Scholar
Report
(2013)
Quinoa breeding and modern variety development

State of the Art Report on Quinoa Around the World.
- Google Scholar
1. Chakravarty TN
2. Sopory SK
(1998) Blue light stimulation of cell proliferation and glyoxalase I activity in callus cultures of Amaranthus paniculatus
Plant Science 132:63–69.

https://doi.org/10.1016/S0168-9452(97)00264-1
- Google Scholar
1. Choi YJ
2. Danielsen S
3. Lübeck M
4. Hong SB
5. Delhey R
6. Shin HD
(2010) Morphological and molecular characterization of the causal agent of downy mildew on Quinoa (Chenopodium quinoa
Mycopathologia 169:403–412.

https://doi.org/10.1007/s11046-010-9272-y
- PubMed
- Google Scholar
1. Christensen SA
2. Pratt DB
3. Pratt C
4. Nelson PT
5. Stevens MR
6. Jellen EN
7. Coleman CE
8. Fairbanks DJ
9. Bonifacio A
10. Maughan PJ
(2007) Assessment of genetic diversity in the USDA and CIP-FAO international nursery collections of quinoa (Chenopodium quinoa Willd.) using microsatellite markers
Plant Genetic Resources 5:82–95.

https://doi.org/10.1017/S1479262107672293
- Google Scholar
1. Cingolani P
2. Patel VM
3. Coon M
4. Nguyen T
5. Land SJ
6. Ruden DM
7. Lu X
(2012a) Using Drosophila melanogaster as a model for genotoxic chemical mutational studies with a new program, SnpSift
Frontiers in Genetics 3:35.

https://doi.org/10.3389/fgene.2012.00035
- PubMed
- Google Scholar
1. Cingolani P
2. Platts A
3. Wang LL
4. Coon M
5. Nguyen T
6. Wang L
7. Land SJ
8. Lu X
9. Ruden DM
(2012b) A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff
Fly 6:80–92.

https://doi.org/10.4161/fly.19695
- Google Scholar
(2021) Genetic variation for tolerance to the downy mildew pathogen Peronospora variabilis in genetic resources of quinoa (Chenopodium quinoa
BMC Plant Biology 21:1–19.

https://doi.org/10.1186/s12870-020-02804-7
- PubMed
- Google Scholar
(2011) The variant call format and VCFtools
Bioinformatics 27:2156–2158.

https://doi.org/10.1093/bioinformatics/btr330
- PubMed
- Google Scholar
1. Dong SS
2. He WM
3. Ji JJ
4. Zhang C
5. Guo Y
6. Yang TL
(2021) LDBlockShow: a fast and convenient tool for visualizing linkage disequilibrium and haplotype blocks based on variant call format files
Briefings in Bioinformatics 22:bbaa227.

https://doi.org/10.1093/bib/bbaa227
- PubMed
- Google Scholar
1. Emrani N
2. Hasler M
3. Patiranage DSR
4. Nathaly M
5. Rey E
6. Jung C
7. Ordon F
(2020) An efficient method to produce segregating populations in quinoa ( Chenopodium quinoa
Plant Breeding 139:1190–1200.

https://doi.org/10.1111/pbr.12873
- Google Scholar
Book
1. Gandarillas H
(1979)
Botánica, Quinua y Kañiwa: Cultivos Andinos

Bogotá CIID.
- Google Scholar
Book
(1979)
Qinua y Kaniwa Cultivos Andinos

Instituto Interamericano de Ciencias Agrıcolas.
- Google Scholar
Book
1. Gomez‐Pando L
(2015)
Improvement and Sustainable Production

Quinoa breeding.
- Google Scholar
Book
1. González JA
2. Eisa S
3. Hussin S
4. Prado FE
(2015)
Quinoa: An Incan Crop to Face Global Changes in Agriculture

Quinoa.
- Google Scholar
(2012) The beet R locus encodes a new cytochrome P450 required for red betalain production
Nature Genetics 44:816–820.

https://doi.org/10.1038/ng.2297
- PubMed
- Google Scholar
(2018) UFBoot2: improving the ultrafast bootstrap approximation
Molecular Biology and Evolution 35:518–522.

https://doi.org/10.1093/molbev/msx281
- PubMed
- Google Scholar
1. Huang X
2. Wei X
3. Sang T
4. Zhao Q
5. Feng Q
6. Zhao Y
7. Li C
8. Zhu C
9. Lu T
10. Zhang Z
11. Li M
12. Fan D
13. Guo Y
14. Wang A
15. Wang L
16. Deng L
17. Li W
18. Lu Y
19. Weng Q
20. Liu K
21. Huang T
22. Zhou T
23. Jing Y
24. Li W
25. Lin Z
26. Buckler ES
27. Qian Q
28. Zhang Q-F
29. Li J
30. Han B
(2010) Genome-wide association studies of 14 agronomic traits in rice landraces
Nature Genetics 42:961–967.

https://doi.org/10.1038/ng.695
- PubMed
- Google Scholar
1. Jakobsson M
2. Rosenberg NA
(2007) CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure
Bioinformatics 23:1801–1806.

https://doi.org/10.1093/bioinformatics/btm233
- PubMed
- Google Scholar
1. Jarvis DE
2. Ho YS
3. Lightfoot DJ
4. Schmöckel SM
5. Li B
6. Borm TJA
7. Ohyanagi H
8. Mineta K
9. Michell CT
10. Saber N
11. Kharbatia NM
12. Rupper RR
13. Sharp AR
14. Dally N
15. Boughton BA
16. Woo YH
17. Gao G
18. Schijlen E
19. Guo X
20. Momin AA
21. Negrão S
22. Al-Babili S
23. Gehring C
24. Roessner U
25. Jung C
26. Murphy K
27. Arold ST
28. Gojobori T
29. Linden CG
30. van Loo EN
31. Jellen EN
32. Maughan PJ
33. Tester M
(2017) The genome of Chenopodium quinoa
Nature 542:307–312.

https://doi.org/10.1038/nature21370
- PubMed
- Google Scholar
1. Jia G
2. Huang X
3. Zhi H
4. Zhao Y
5. Zhao Q
6. Li W
7. Chai Y
8. Yang L
9. Liu K
10. Lu H
11. Zhu C
12. Lu Y
13. Zhou C
14. Fan D
15. Weng Q
16. Guo Y
17. Huang T
18. Zhang L
19. Lu T
20. Feng Q
21. Hao H
22. Liu H
23. Lu P
24. Zhang N
25. Li Y
26. Guo E
27. Wang S
28. Wang S
29. Liu J
30. Zhang W
31. Chen G
32. Zhang B
33. Li W
34. Wang Y
35. Li H
36. Zhao B
37. Li J
38. Diao X
39. Han B
(2013) A haplotype map of genomic variations and genome-wide association studies of agronomic traits in foxtail millet (Setaria italica
Nature Genetics 45:957–961.

https://doi.org/10.1038/ng.2673
- PubMed
- Google Scholar
(2017) ModelFinder: fast model selection for accurate phylogenetic estimates
Nature Methods 14:587–589.

https://doi.org/10.1038/nmeth.4285
- PubMed
- Google Scholar
1. Kang HM
2. Sul JH
3. Service SK
4. Zaitlen NA
5. Kong SY
6. Freimer NB
7. Sabatti C
8. Eskin E
(2010) Variance component model to account for sample structure in genome-wide association studies
Nature Genetics 42:348–354.

https://doi.org/10.1038/ng.548
- PubMed
- Google Scholar
(2011) Genome size variation in Chenopodium quinoa (Chenopodiaceae
Plant Systematics and Evolution 298:251–255.

https://doi.org/10.1007/s00606-011-0534-z
- Google Scholar
1. Lê S
2. Josse J
3. Husson F
(2008) FactoMineR: an R package for multivariate analysis
Journal of Statistical Software 25:1–18.

https://doi.org/10.18637/jss.v025.i01
- Google Scholar
1. Letunic I
2. Bork P
(2016) Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees
Nucleic Acids Research 44:W242–W245.

https://doi.org/10.1093/nar/gkw290
- PubMed
- Google Scholar
(2009) The sequence alignment/map format and samtools
Bioinformatics 25:2078–2079.

https://doi.org/10.1093/bioinformatics/btp352
- PubMed
- Google Scholar
1. Li H
2. Durbin R
(2010) Fast and accurate long-read alignment with Burrows-Wheeler transform
Bioinformatics 26:589–595.

https://doi.org/10.1093/bioinformatics/btp698
- PubMed
- Google Scholar
1. Li H
(2011) Tabix: fast retrieval of sequence features from generic TAB-delimited files
Bioinformatics 27:718–719.

https://doi.org/10.1093/bioinformatics/btq671
- PubMed
- Google Scholar
1. Li M-X
2. Yeung JMY
3. Cherny SS
4. Sham PC
(2012) Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets
Human Genetics 131:747–756.

https://doi.org/10.1007/s00439-011-1118-2
- PubMed
- Google Scholar
1. Li T
2. Yang X
3. Yu Y
4. Si X
5. Zhai X
6. Zhang H
7. Dong W
8. Gao C
9. Xu C
(2018) Domestication of wild tomato is accelerated by genome editing
Nature Biotechnology 36:1160–1163.

https://doi.org/10.1038/nbt.4273
- Google Scholar
1. Li H
2. Ralph P
(2019) Local pca shows how the effect of population structure differs along the genome
Genetics 211:289–304.

https://doi.org/10.1534/genetics.118.301747
- PubMed
- Google Scholar
1. Li N
2. Xu R
3. Li Y
(2019) Molecular networks of seed size control in plants
Annual Review of Plant Biology 70:435–463.

https://doi.org/10.1146/annurev-arplant-050718-095851
- PubMed
- Google Scholar
1. Lu X
2. Xiong Q
3. Cheng T
4. Li Q-T
5. Liu X-L
6. Bi Y-D
7. Li W
8. Zhang W-K
9. Ma B
10. Lai Y-C
11. Du W-G
12. Man W-Q
13. Chen S-Y
14. Zhang J-S
(2017) A pp2c-1 allele underlying a quantitative trait locus enhances soybean 100-seed weight
Molecular Plant 10:670–684.

https://doi.org/10.1016/j.molp.2017.03.006
- PubMed
- Google Scholar
(2005) Development and use of microsatellite markers for germplasm characterization in quinoa ( Chenopodium quinoa Willd)
Crop Science 45:1618–1630.

https://doi.org/10.2135/cropsci2004.0295
- Google Scholar
(2007) The extent of linkage disequilibrium in rice (Oryza sativa L)
Genetics 177:2223–2232.

https://doi.org/10.1534/genetics.107.079616
- PubMed
- Google Scholar
(2012) Single nucleotide polymorphism identification, characterization, and linkage mapping in quinoa
The Plant Genome 5:114.

https://doi.org/10.3835/plantgenome2012.06.0011
- Google Scholar
1. Maughan PJ
2. Chaney L
3. Lightfoot DJ
4. Cox BJ
5. Tester M
6. Jellen EN
7. Jarvis DE
(2019) Mitochondrial and chloroplast genomes provide insights into the evolutionary origins of quinoa (Chenopodium quinoa Willd
Scientific Reports 9:185.

https://doi.org/10.1038/s41598-018-36693-6
- PubMed
- Google Scholar
1. McKenna A
2. Hanna M
3. Banks E
4. Sivachenko A
5. Cibulskis K
6. Kernytsky A
7. Garimella K
8. Altshuler D
9. Gabriel S
10. Daly M
11. DePristo MA
(2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data
Genome Research 20:1297–1303.

https://doi.org/10.1101/gr.107524.110
- PubMed
- Google Scholar
1. Milner SG
2. Jost M
3. Taketa S
4. Mazón ER
5. Himmelbach A
6. Oppermann M
7. Weise S
8. Knüpffer H
9. Basterrechea M
10. König P
11. Schüler D
12. Sharma R
13. Pasam RK
14. Rutten T
15. Guo G
16. Xu D
17. Zhang J
18. Herren G
19. Müller T
20. Krattinger SG
21. Keller B
22. Jiang Y
23. González MY
24. Zhao Y
25. Habekuß A
26. Färber S
27. Ordon F
28. Lange M
29. Börner A
30. Graner A
31. Reif JC
32. Scholz U
33. Mascher M
34. Stein N
(2019) Genebank genomics highlights the diversity of a global barley collection
Nature Genetics 51:319–326.

https://doi.org/10.1038/s41588-018-0266-x
- PubMed
- Google Scholar
1. Murphy KM
(2018)
Quinoa breeding and genomics

Plant Breeding Reviews 42:257–320.
- Google Scholar
(2015) IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies
Molecular Biology and Evolution 32:268–274.

https://doi.org/10.1093/molbev/msu300
- Google Scholar
(2008) Nuclear genome size and chromosome analysis in Chenopodium quinoa and C - berlandieri subsp: nuttalliae
Euphytica: Netherlands Journal of Plant Breeding 164:221.

https://doi.org/10.1007/s10681-008-9711-8
- Google Scholar
(2015) A crossing method for quinoa
Sustainability 7:3230–3243.

https://doi.org/10.3390/su7033230
- Google Scholar
1. Rodríguez LA
2. Isla MT
(2009)
Comparative analysis of genetic and morphologic diversity among quinoa accessions (Chenopodium quinoa Willd.) of the South of Chile and highland accessions

Journal of Plant Breeding and Crop Science 1:210–216.
- Google Scholar
Book
1. Rojas W
2. Pinto M
3. Alanoca C
4. Gómez Pando L
5. León-Lobos P
6. Alercia A
7. Diulgheroff S
8. Padulosi S
9. Bazile D
(2015)
Quinoa Genetic Resources and Ex Situ Conservation

FAO/CIRAD.
- Google Scholar
1. Ruas PM
2. Ruas PM
3. Ruas PM
4. Bonifacio A
5. Bonifacio A
6. Ruas CF
7. Ruas CF
8. Fairbanks DJ
9. Andersen WR
(1999) Genetic relationship among 19 accessions of six species of Chenopodium L., by Random Amplified Polymorphic DNA fragments (RAPD
Euphytica 105:25–32.

https://doi.org/10.1023/A:1003480414735
- Google Scholar
(2013) Quinoa biodiversity and sustainability for food security under climate change: A review
Agronomy for Sustainable Development 34:349–359.

https://doi.org/10.1007/s13593-013-0195-0
- Google Scholar
Software
1. Silvestri V
2. Gil F
(2000)
Alogamia en quinua

Revista de La Facultad de Ciencias Agrarias.
(2013) Pleiotropy in complex traits: challenges and strategies
Nature Reviews. Genetics 14:483–495.

https://doi.org/10.1038/nrg3461
- PubMed
- Google Scholar
1. Stanschewski CS
2. Rey E
3. Fiene G
4. Craine EB
5. Wellman G
6. Melino VJ
7. S R Patiranage D
8. Johansen K
9. Schmöckel SM
10. Bertero D
11. Oakey H
12. Colque-Little C
13. Afzal I
14. Raubach S
15. Miller N
16. Streich J
17. Amby DB
18. Emrani N
19. Warmington M
20. Mousa MAA
21. Wu D
22. Jacobson D
23. Andreasen C
24. Jung C
25. Murphy K
26. Bazile D
27. Tester M
(2021) Quinoa phenotyping methodologies: An international consensus
Plants 10:1759.

https://doi.org/10.3390/plants10091759
- PubMed
- Google Scholar
(2017) How to make a domesticate
Current Biology 27:R896–R900.

https://doi.org/10.1016/j.cub.2017.06.048
- PubMed
- Google Scholar
1. Tang Y
2. Liu X
3. Wang J
4. Li M
5. Wang Q
6. Tian F
7. Su Z
8. Pan Y
9. Liu D
10. Lipka AE
11. Buckler ES
12. Zhang Z
(2016) Gapit version 2: An enhanced integrated tool for genomic association and prediction
The Plant Genome 9:120.

https://doi.org/10.3835/plantgenome2015.11.0120
- PubMed
- Google Scholar
Preprint
1. Turner SD
(2014) Qqman: An R Package for Visualizing GWAS Results Using Q-Q and Manhattan Plots
bioRxiv.

https://doi.org/10.1101/005165
- Google Scholar
1. Van der Auwera GA
2. Carneiro MO
3. Hartl C
4. Poplin R
5. del Angel G
6. Levy‐Moonshine A
7. Jordan T
8. Shakir K
9. Roazen D
10. Thibault J
11. Banks E
12. Garimella KV
13. Altshuler D
14. Gabriel S
15. DePristo MA
(2013) From fastq data to high‐confidence variant calls: The genome analysis toolkit best practices pipeline
Current Protocols in Bioinformatics 43:11–33.

https://doi.org/10.1002/0471250953.bi1110s43
- Google Scholar
1. Varshney RK
2. Saxena RK
3. Upadhyaya HD
4. Khan AW
5. Yu Y
6. Kim C
7. Rathore A
8. Kim D
9. Kim J
10. An S
11. Kumar V
12. Anuradha G
13. Yamini KN
14. Zhang W
15. Muniswamy S
16. Kim JS
17. Penmetsa RV
18. von Wettberg E
19. Datta SK
(2017a) Whole-genome resequencing of 292 pigeonpea accessions identifies genomic regions associated with domestication and agronomic traits
Nature Genetics 49:1082–1088.

https://doi.org/10.1038/ng.3872
- Google Scholar
1. Varshney RK
2. Shi C
3. Thudi M
4. Mariac C
5. Wallace J
6. Qi P
7. Zhang H
8. Zhao Y
9. Wang X
10. Rathore A
11. Srivastava RK
12. Chitikineni A
13. Fan G
14. Bajaj P
15. Punnuri S
16. Gupta SK
17. Wang H
18. Jiang Y
19. Couderc M
20. Katta M
21. Paudel DR
22. Mungra KD
23. Chen W
24. Harris-Shultz KR
25. Garg V
26. Desai N
27. Doddamani D
28. Kane NA
29. Conner JA
30. Ghatak A
31. Chaturvedi P
32. Subramaniam S
33. Yadav OP
34. Berthouly-Salazar C
35. Hamidou F
36. Wang J
37. Liang X
38. Clotault J
39. Upadhyaya HD
40. Cubry P
41. Rhoné B
42. Gueye MC
43. Sunkar R
44. Dupuy C
45. Sparvoli F
46. Cheng S
47. Mahala RS
48. Singh B
49. Yadav RS
50. Lyons E
51. Datta SK
52. Hash CT
53. Devos KM
54. Buckler E
55. Bennetzen JL
56. Paterson AH
57. Ozias-Akins P
58. Grando S
59. Wang J
60. Mohapatra T
61. Weckwerth W
62. Reif JC
63. Liu X
64. Vigouroux Y
65. Xu X
(2017b) Pearl millet genome sequence provides a resource to improve agronomic traits in arid environments
Nature Biotechnology 35:969–976.

https://doi.org/10.1038/nbt.3943
- Google Scholar
1. Varshney RK
2. Thudi M
3. Roorkiwal M
4. He W
5. Upadhyaya HD
6. Yang W
7. Bajaj P
8. Cubry P
9. Rathore A
10. Jian J
11. Doddamani D
12. Khan AW
13. Garg V
14. Chitikineni A
15. Xu D
16. Gaur PM
17. Singh NP
18. Chaturvedi SK
19. Nadigatla GVPR
20. Krishnamurthy L
21. Dixit GP
22. Fikre A
23. Kimurto PK
24. Sreeman SM
25. Bharadwaj C
26. Tripathi S
27. Wang J
28. Lee S-H
29. Edwards D
30. Polavarapu KKB
31. Penmetsa RV
32. Crossa J
33. Nguyen HT
34. Siddique KHM
35. Colmer TD
36. Sutton T
37. von Wettberg E
38. Vigouroux Y
39. Xu X
40. Liu X
(2019) Resequencing of 429 chickpea accessions from 45 countries provides insights into genome diversity, domestication and agronomic traits
Nature Genetics 51:857–864.

https://doi.org/10.1038/s41588-019-0401-3
- PubMed
- Google Scholar
(2010) Nutrition facts and functional potential of quinoa (Chenopodium quinoa willd.), an ancient Andean grain: a review
Journal of the Science of Food and Agriculture 90:2541–2547.

https://doi.org/10.1002/jsfa.4158
- PubMed
- Google Scholar
1. Wang W
2. Mauleon R
3. Hu Z
4. Chebotarov D
5. Tai S
6. Wu Z
7. Li M
8. Zheng T
9. Fuentes RR
10. Zhang F
11. Mansueto L
12. Copetti D
13. Sanciangco M
14. Palis KC
15. Xu J
16. Sun C
17. Fu B
18. Zhang H
19. Gao Y
20. Zhao X
21. Shen F
22. Cui X
23. Yu H
24. Li Z
25. Chen M
26. Detras J
27. Zhou Y
28. Zhang X
29. Zhao Y
30. Kudrna D
31. Wang C
32. Li R
33. Jia B
34. Lu J
35. He X
36. Dong Z
37. Xu J
38. Li Y
39. Wang M
40. Shi J
41. Li J
42. Zhang D
43. Lee S
44. Hu W
45. Poliakov A
46. Dubchak I
47. Ulat VJ
48. Borja FN
49. Mendoza JR
50. Ali J
51. Li J
52. Gao Q
53. Niu Y
54. Yue Z
55. Naredo MEB
56. Talag J
57. Wang X
58. Li J
59. Fang X
60. Yin Y
61. Glaszmann J-C
62. Zhang J
63. Li J
64. Hamilton RS
65. Wing RA
66. Ruan J
67. Zhang G
68. Wei C
69. Alexandrov N
70. McNally KL
71. Li Z
72. Leung H
(2018) Genomic variation in 3,010 diverse accessions of Asian cultivated rice
Nature 557:43–49.

https://doi.org/10.1038/s41586-018-0063-9
- PubMed
- Google Scholar
1. Wilson HD
(1988) Allozyme variation and morphological relationships of chenopodium hircinum (s.L
Systematic Botany 13:215.

https://doi.org/10.2307/2419100
- Google Scholar
1. Wu D
2. Liang Z
3. Yan T
4. Xu Y
5. Xuan L
6. Tang J
7. Zhou G
8. Lohwasser U
9. Hua S
10. Wang H
11. Chen X
12. Wang Q
13. Zhu L
14. Maodzeka A
15. Hussain N
16. Li Z
17. Li X
18. Shamsi IH
19. Jilani G
20. Wu L
21. Zheng H
22. Zhang G
23. Chalhoub B
24. Shen L
25. Yu H
26. Jiang L
(2019) Whole-genome resequencing of a worldwide collection of rapeseed accessions reveals the genetic basis of ecotype divergence
Molecular Plant 12:30–43.

https://doi.org/10.1016/j.molp.2018.11.007
- PubMed
- Google Scholar
1. Xu X
2. Liu X
3. Ge S
4. Jensen JD
5. Hu F
6. Li X
7. Dong Y
8. Gutenkunst RN
9. Fang L
10. Huang L
11. Li J
12. He W
13. Zhang G
14. Zheng X
15. Zhang F
16. Li Y
17. Yu C
18. Kristiansen K
19. Zhang X
20. Wang J
21. Wright M
22. McCouch S
23. Nielsen R
24. Wang J
25. Wang W
(2011) Resequencing 50 accessions of cultivated and wild rice yields markers for identifying agronomically important genes
Nature Biotechnology 30:105–111.

https://doi.org/10.1038/nbt.2050
- Google Scholar
1. Zhang T
2. Gu M
3. Liu Y
4. Lv Y
5. Zhou L
6. Lu H
7. Liang S
8. Bao H
9. Zhao H
(2017) Development of novel InDel markers and genetic diversity in Chenopodium quinoa through whole-genome re-sequencing
BMC Genomics 18:685.

https://doi.org/10.1186/s12864-017-4093-8
- PubMed
- Google Scholar
1. Zhang C
2. Dong SS
3. Xu JY
4. He WM
5. Yang TL
6. Schwartz R
(2019a) PopLDdecay: a fast and effective tool for linkage disequilibrium decay analysis based on variant call format files
Bioinformatics 35:1786–1788.

https://doi.org/10.1093/bioinformatics/bty875
- Google Scholar
1. Zhang R
2. Zheng F
3. Wei S
4. Zhang S
5. Li G
6. Cao P
7. Zhao S
(2019b) Evolution of disease defense genes and their regulators in plants
International Journal of Molecular Sciences 20:E335.

https://doi.org/10.3390/ijms20020335
- PubMed
- Google Scholar
1. Zhao G
2. Lian Q
3. Zhang Z
4. Fu Q
5. He Y
6. Ma S
7. Ruggieri V
8. Monforte AJ
9. Wang P
10. Julca I
11. Wang H
12. Liu J
13. Xu Y
14. Wang R
15. Ji J
16. Xu Z
17. Kong W
18. Zhong Y
19. Shang J
20. Pereira L
21. Argyris J
22. Zhang J
23. Mayobre C
24. Pujol M
25. Oren E
26. Ou D
27. Wang J
28. Sun D
29. Zhao S
30. Zhu Y
31. Li N
32. Katzir N
33. Gur A
34. Dogimont C
35. Schaefer H
36. Fan W
37. Bendahmane A
38. Fei Z
39. Pitrat M
40. Gabaldón T
41. Lin T
42. Garcia-Mas J
43. Xu Y
44. Huang S
(2019) A comprehensive genome variation map of melon identifies multiple domestication events and loci influencing agronomic traits
Nature Genetics 51:1607–1615.

https://doi.org/10.1038/s41588-019-0522-8
- PubMed
- Google Scholar
1. Zheng X
2. Levine D
3. Shen J
4. Gogarten SM
5. Laurie C
6. Weir BS
(2012) A high-performance computing toolset for relatedness and principal component analysis of SNP data
Bioinformatics 28:3326–3328.

https://doi.org/10.1093/bioinformatics/bts606
- PubMed
- Google Scholar
1. Zhou Z
2. Jiang Y
3. Wang Z
4. Gou Z
5. Lyu J
6. Li W
7. Yu Y
8. Shu L
9. Zhao Y
10. Ma Y
11. Fang C
12. Shen Y
13. Liu T
14. Li C
15. Li Q
16. Wu M
17. Wang M
18. Wu Y
19. Dong Y
20. Wan W
21. Wang X
22. Ding Z
23. Gao Y
24. Xiang H
25. Zhu B
26. Lee S-H
27. Wang W
28. Tian Z
(2015) Resequencing 302 wild and cultivated accessions identifies genes related to domestication and improvement in soybean
Nature Biotechnology 33:408–414.

https://doi.org/10.1038/nbt.3096
- PubMed
- Google Scholar

Article and author information

Author details

Dilan SR Patiranage

Plant Breeding Institute, Christian-Albrechts-University of Kiel, Kiel, Germany

Contribution
Data curation, Formal analysis, Investigation, Methodology, Visualization, Writing – original draft

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0001-6308-1838
Elodie Rey

King Abdullah University of Science and Technology (KAUST), Biological and Environmental Sciences & Engineering Division (BESE), Thuwal, Saudi Arabia

Contribution
Resources, Data curation, Formal analysis, Validation, Visualization, Writing – review and editing

Competing interests
No competing interests declared
Nazgol Emrani

Plant Breeding Institute, Christian-Albrechts-University of Kiel, Kiel, Germany

Contribution
Conceptualization, Supervision, Investigation, Visualization, Writing – review and editing

For correspondence
n.emrani@plantbreeding.uni-kiel.de

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0001-5673-3957
Gordon Wellman

King Abdullah University of Science and Technology (KAUST), Biological and Environmental Sciences & Engineering Division (BESE), Thuwal, Saudi Arabia

Contribution
Resources, Writing – review and editing

Competing interests
No competing interests declared
Karl Schmid

Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, Stuttgart, Germany

Contribution
Resources, Visualization, Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0001-5129-895X
Sandra M Schmöckel

Department of Physiology of Yield Stability, University of Hohenheim, Stuttgart, Germany

Contribution
Resources, Writing – review and editing

Competing interests
No competing interests declared
Mark Tester

King Abdullah University of Science and Technology (KAUST), Biological and Environmental Sciences & Engineering Division (BESE), Thuwal, Saudi Arabia

Contribution
Conceptualization, Resources, Supervision, Funding acquisition, Visualization, Project administration, Writing – review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-5085-8801
Christian Jung

Plant Breeding Institute, Christian-Albrechts-University of Kiel, Kiel, Germany

Contribution
Conceptualization, Supervision, Funding acquisition, Visualization, Writing – review and editing

For correspondence
c.jung@plantbreeding.uni-kiel.de

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0001-8149-7976

Funding

King Abdullah University of Science and Technology (OSR-2016-CRG5- 466 2966-02)

Dilan SR Patiranage

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank David Jarvis for providing the updated version of the quinoa reference genome. We thank Monika Bruisch, Brigitte Neidhardt-Olf, Elisabeth Kokai-Kota, Verena Kowalewski, and Gabriele Fiene for technical assistance. The financial support of this work was provided by the Competitive Research Grant (Grant No. OSR-2016-CRG5-466 2966-02) of the King Abdullah University of Science and Technology, Saudi Arabia, and baseline funding from KAUST to Mark Tester. We acknowledge financial support by DFG within the funding programme "Open Access Publikationskosten" to cover the article processing charges.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.