Data-driven, participatory characterization of farmer varieties discloses teff breeding potential under current and future climates

  1. Aemiro Bezabih Woldeyohannes
  2. Sessen Daniel Iohannes
  3. Mara Miculan
  4. Leonardo Caproni
  5. Jemal Seid Ahmed
  6. Kauê de Sousa
  7. Ermias Abate Desta
  8. Carlo Fadda
  9. Mario Enrico Pè
  10. Matteo Dell'Acqua  Is a corresponding author
  1. Center of Plant Sciences, Scuola Superiore Sant'Anna, Italy
  2. Amhara Regional Agricultural Research Institute, Ethiopia
  3. Digital Inclusion, Bioversity International, France
  4. Department of Agricultural Sciences, Inland Norway University of Applied Sciences, Norway
  5. Biodiversity for Food and Agriculture, Bioversity International, Kenya

Abstract

In smallholder farming systems, traditional farmer varieties of neglected and underutilized species (NUS) support the livelihoods of millions of growers and consumers. NUS combine cultural and agronomic value with local adaptation, and transdisciplinary methods are needed to fully evaluate their breeding potential. Here, we assembled and characterized the genetic diversity of a representative collection of 366 Ethiopian teff (Eragrostis tef) farmer varieties and breeding materials, describing their phylogenetic relations and local adaptation on the Ethiopian landscape. We phenotyped the collection for its agronomic performance, involving local teff farmers in a participatory variety evaluation. Our analyses revealed environmental patterns of teff genetic diversity and allowed us to identify 10 genetic clusters associated with climate variation and with uneven spatial distribution. A genome-wide association study was used to identify loci and candidate genes related to phenology, yield, local adaptation, and farmers’ appreciation. The estimated teff genomic offset under climate change scenarios highlighted an area around lake Tana where teff cropping may be most vulnerable to climate change. Our results show that transdisciplinary approaches may efficiently propel untapped NUS farmer varieties into modern breeding to foster more resilient and sustainable cropping systems.

Editor's evaluation

Teff (Eragrostis tef), a small-market domesticate native and commonly grown in Ethiopia and the Horn of Africa, is comprehensively characterized for genetic, ecological and phenotypic variation in this ambitious and interdisciplinary publication. Integration of small holder farmers in phenotyping the collection, with an emphasis on gender considerations, elevates the characterization of Ethiopian teff. This paper provides a solid foundation to accelerate teff breeding for a changing climate, and provides an excellent model for the characterization of novel and underused crops.

https://doi.org/10.7554/eLife.80009.sa0

eLife digest

Small farms support the livelihoods of about two billion people worldwide. Smallholder farmers often rely on local varieties of crops and use less irrigation and fertilizer than large producers. But smallholdings can be vulnerable to weather events and climate change. Data-driven research approaches may help to identify the needs of farmers, taking into account traditional knowledge and cultural practices to enhance the sustainability of certain crops.

Teff is a cereal crop that plays a critical role in the culture and diets of Ethiopian communities. It is also a super food appreciated on international markets for its nutritional value. Rural smallholder farmers in Ethiopia rely on the crop for subsistence and income and make up the bulk of the country’s agricultural system. Many grow local varieties with tremendous genetic diversity. Scientists, in collaboration with farmers, could tap that diversity to produce more productive or climate-resilient types of teff, both for national and international markets.

Woldeyohannes, Iohannes et al. produced the first large-scale genetic, agronomic and climatic study of traditional teff varieties. In the experiments, Woldeyohannes and Iohannes et al. sequenced the genomes of 366 Ethiopian teff varieties and evaluated their agronomic value in common gardens. The team collaborated with 35 local farmers to understand their preference of varieties and traits. They then conducted a genome-wide association study to assess the crops’ productivity and their adaptations to local growing conditions and farmer preferences. Genetic changes that speed up teff maturation and flowering time could meet small farmers’ needs to secure teff harvest. Woldeyohannes, Iohannes et al. also identified a region in Ethiopia, where local teff varieties may struggle to adapt to climate change. Genetic modifications may help the crop to adapt to frequent droughts that may be a common characteristic of future climates.

The experiments reveal the importance of incorporating traditional knowledge from smallholder farmers into data-driven crop improvement efforts considering genetics and climate science. This multidisciplinary approach may help to improve food security and protect local genetic diversity on small farms. It may also help to ensure that agricultural advances fairly and equitably benefit small farmers.

Introduction

Large-scale, high-yielding cropping systems rely on a remarkable small set of crops. Approximately half of the global farming land is devoted to maize, wheat, rice, and soybean (FAOSTAT, 2021) and the overall composition of worldwide food systems is rather uniform (Khoury et al., 2014). Yet, hundreds of neglected or underutilized species (NUS) are still actively cultivated in highly diversified, small-scale cropping systems, where they support the livelihoods of millions of people (Jamnadass et al., 2020). NUS benefited of scant research and breeding improvement. Their diversity is not only a proxy of pedoclimatic diversification of cropping systems, but also reflects socioeconomic diversity and cultural heritage of local farmers (Tadele, 2019). Rich NUS agrobiodiversity is still conserved in situ in smallholder farming systems, where the selection and cultivation conducted by local growers resulted in untapped local varieties that could provide useful adaptive traits (Iragaba et al., 2020). A comprehensive, transdisciplinary characterization of NUS farmer varieties that takes into consideration diversity, adaptation, and farmer–consumer preferences may unlock the potential of NUS for the sustainable intensification of farming systems (Dawson et al., 2018).

Crop scientists and breeders can now leverage the big data revolution to bridge the gap between NUS and 21st century agriculture (Dawson et al., 2019). Genomic tools enable a rapid and cost-effective characterization of large germplasm collections to unlock agrobiodiversity for breeding (Poland, 2015), identify genetic factors responsible for traits of agronomic interest (Mascher et al., 2019), and accelerate genetic gains (Juliana et al., 2019). Genomic data can be combined with climatic data to gain insights into locus-specific adaptation (Lasky et al., 2015) and to estimate genomic vulnerability under climate change scenarios (Aguirre-Liguori et al., 2021). Data-driven methods can also be applied to characterize the socioeconomic contexts in which crops are grown (van Etten et al., 2019a; van Etten et al., 2019b), generating insights that are critical to understand cropping dynamics in smallholder farming systems (Gomez y Paloma et al., 2020; Iragaba et al., 2020; Terlau et al., 2019). The agrobiodiversity maintained by farmers is the result of interactions between genetic, environmental, and societal factors, and has a large potential for varietal innovation when farmer preferences are factored into the selection process (Mokuwa et al., 2014). Participatory varietal selection (PVS) approaches have been developed to directly involve farmers in the evaluation of breeding materials (Ceccarelli and Grando, 2007). Previous studies showed that data-driven PVS can be combined with genomic data to identify genomic loci responsible for farmers’ appreciation (Kidane et al., 2017) and model local crop performance in farmer fields (de Sousa et al., 2021).

The Ethiopian highlands are a paradigm of challenging agricultural ecosystems where NUS farmer varieties are widely cultivated. In Ethiopia, 85 million people live in rural areas; most of them are subsistence-based smallholder farmers who are responsible for about 90% of the cultivated land and agricultural output of the country (Bachewe and Taffesse, 2018). Teff (Eragrostis tef) is an annual, self-pollinating, and allotetraploid grass of the Chloridoideae subfamily (Ketema, 1997). It is widely grown in Ethiopia as a staple crop, where it is valued for its nutritional and health benefits, resilience in marginal and semi-arid environments and cultural importance (D’Andrea, 2008; Ketema, 1997). Teff was likely domesticated in the northern Ethiopian Highlands from the allotetraploid Eragrostis pilosa, but the timing of its initial cultivation and the identity of its diploid ancestors remain poorly understood (D’Andrea, 2008; Ingram and Doyle, 2003; VanBuren et al., 2020). Nowadays, hundreds of local landraces are grown in Ethiopia in a wide range of agroecologies, displaying broad environmental adaptation and phenotypic diversity (Woldeyohannes et al., 2020). Breeding efforts on teff have been underway for decades, and segregant and mutagenized populations are available (Cannarozzi et al., 2018). Yet, teff yields today remain much lower than potentially attainable and substantially lower than those of other cereals grown in the region, and the full breeding potential of teff farmer varieties is still undisclosed (Girma et al., 2014; Woldeyohannes et al., 2020). Research in teff is rapidly evolving; a draft genome sequence (Cannarozzi et al., 2014) and a high-quality genome sequence (VanBuren et al., 2020) recently brought this species into the international genomics research spotlight. A data-driven research approach may unlock the full potential of teff agrobiodiversity and propel breeding for new improved varieties that meet local needs.

Here, we report a transdisciplinary data-driven approach to characterize NUS genetic, agronomic, and climatic diversity, using teff as a case study. We selected 321 teff farmer varieties derived from landraces and 45 teff improved lines, and we genotyped them with genome-wide molecular markers. Concurrently, we characterized their agronomic performance at two locations in Ethiopia. Experienced teff farmers (15 women and 20 men) were asked to evaluate the teff genotypes, providing qualitative information that enabled us to prioritize better adapted genetic materials. Additionally, we derived current and projected climate data at the sampling locations of teff accessions and used them to estimate genomic offset under climate change scenarios. We combined all these data in a genome-wide association study (GWAS) framework to identify genomic loci and candidate genes with relevance for adaptation, performance, and farmers’ preferences. We discuss the potential of data-driven participatory approaches to characterize NUS diversity and to disclose their potential for the sustainable intensification of farming systems.

Results and discussion

Teff farmer varieties harness broad genetic diversity

We assembled a representative collection of teff cultivated in Ethiopia, hereafter named Ethiopian Teff Diversity Panel (EtDP). The EtDP comprises 321 farmer varieties spanning the entire geographical and agroecological range of teff cultivation in Ethiopia, from the submoist lowlands of Tigray in the North to the moist lowlands of Oromia in the South, and from the subhumid lowlands of Benishangul and Gumuz in the West to the subhumid mid-highlands of Oromia in the East (Figure 1—figure supplement 1, Supplementary file 1A; MoA, 2000). The EtDP also includes 45 improved varieties released since the first breeding efforts on teff and up to the moment of the EtDP assembly. A selection of seven Eragrostis spp., putative wild relatives of teff, was included as an outgroup. The genomic diversity of the EtDP was assessed starting from 12,153 high-quality, genome-wide single-nucleotide polymorphisms (SNPs) that were derived from double digest restriction site-associated sequencing (ddRAD-seq) of individual accessions, and were pruned for linkage disequilibrium (LD). The EtDP genomes could be grouped in 1240 haplotype blocks (Supplementary file 1B). EtDP chromosomes showed consistently higher pericentromeric LD, with localized LD peaks in telomeric regions (Figure 1—figure supplement 2). The A and B subgenomes showed comparable yet different LD profiles, possibly due to their specificity in terms of dominance, transposable elements content, and overall limited homoeologous exchange (VanBuren et al., 2020).

Population structure analysesshowed that EtDP accessions displayed varying degrees of genetic admixture (Figure 1A) and could be best summarized by 10 discriminant analysis of principal component (DAPC) clusters (Figure 1—figure supplement 3). About 15% of the genetic variability in the EtDP collection could be explained by the first three principal components (PCs) of LD-pruned SNP data (Figure 1B, C). Teff accessions from Tigray, in Northern Ethiopia, were markedly separated from the rest, and mainly belonged to cluster 7 (Figure 1D). Teff breeding lines showed a narrower genetic base compared to farmer varieties and could not summarize the broad diversity available within the EtDP (Figure 1—figure supplement 4). They belonged predominantly to clusters 2, 8, and 10 (Figure 2D, E). Genetic clusters are a proxy of teff landraces diversity and may support breeding efforts by improving the identification of parent lines for genomic selection to counter the depletion of allelic diversity (Heffner et al., 2009), or establishing breeding groups to explore heterotic potential in teff (Boeven et al., 2016). The preservation and valorization of teff genetic clusters are instrumental to respond to future breeding needs in light of changing climates and dynamic consumer preferences (Woldeyohannes et al., 2020; Araya et al., 2011).

Figure 1 with 6 supplements see all
Genetic diversity of teff in Ethiopia.

(A) ADMIXTURE results for the pruned single-nucleotide polymorphisms (SNPs) dataset. Each vertical bar represents an individual, colored according to one of the 20 groups reported by the analysis. Bars are ordered according to the 10 genetic clusters identified by discriminant analysis of principal component (DAPC), as reported by numbers on the x-axis. (B, C) Principal component analysis of genome-wide SNPs. Taxa are colored according to their DAPC genetic cluster, as indicated in the legend. About 10.98% of the genetic diversity in the panel can be explained by the first two principal components, which clearly separate cluster 7 from clusters 2 and 4. Open and close circles represent farmer varieties and improved varieties, respectively. (D) Distribution of Ethiopian Teff Diversity Panel (EtDP) georeferenced landraces (N = 314) across the altitudinal map of Ethiopia, color coded as in panel (B). (E) Altitude distribution across the DAPC genetic clusters,, with letters on top of boxplots denoting significance levels based on a pairwise Wilcoxon rank sum test and Bonferroni correction for multiple testing. (F) Distribution of genetic clusters across agroecological zones of Ethiopia, with color coding as in panel B. SM2, warm submoist lowlands; SM3, tepid submoist mid-highlands; SM4, cool submoist mid-highlands; M2, warm moist lowlands; M3, tepid moist mid-highlands; M4, cool moist mid-highlands; SH2, warm subhumid lowlands; SH3, tepid suphumid mid-highlands; SH4, cool subhumid mid-highlands; H3, tepid humid mid-highlands; H4, cool humid mid-highlands. This figure has six figure supplements.

Figure 2 with 2 supplements see all
Teff diversity on the landscape.

(A) Principal component analysis of bioclimatic diversity in the Ethiopian Teff Diversity Panel (EtDP). Dots represent teff farmer varieties belonging to genetic clusters, colored according to legend. Vectors represent the scale, verse, and direction of bioclimatic drivers of teff differentiation. (B) Linear regression of Fst values in relation to geographic distance of accessions in the EtDP. Accessions were grouped by local district of sampling. (C) Linear regression of Fst values in relation to environmental distance of accessions, also grouped by district of sampling. (D) Pairwise Fst values between teff accessions grouped by local districts of sampling, as in (C) and (D). Local districts, that is subregional groups, are ordered by administrative regions according to legend. This figure has two figure supplements.

The distribution of teff genetic variation is associated with geographic and environmental factors

Landraces evolve at the interface of natural and anthropogenic selection (Casañas et al., 2017). Hence, we hypothesized that teff genetic clusters might be associated with local environmental conditions. The EtDP showed limited genetic stratification, as evidenced by the high levels of admixture and low genetic variance explained by the principal components analysis (PCA) (Figure 1—figure supplement 5; Figure 1B, C). Yet, some genetic clusters could be associated to agroecological zones characterized by different temperature and precipitation conditions (Figure 1—figure supplement 6; Figure 3A, B; Figure 3—figure supplement 1). Cluster 4 and 7 are cultivated in warmer sampling locations, while cluster 6 comes from colder and wetter areas (Figure 3—figure supplement 1). Extant landraces diversity is not only contributed by climate, but also by seed circulation. Studies have shown that cultural and social dynamics, such as belonging to similar or different ethnolinguistic groups, are key factors in shaping seed exchange networks (Labeyrie et al., 2016; Samberg et al., 2007). In Ethiopia, regional districts are markers of cultural and historical diversity, and smallholder farmers have limited capacity for long-distance travel. Thus, we hypothesized that regional distinctions could impact seed exchanges and spatial patterns of teff genetic diversity.

Figure 3 with 2 supplements see all
Phenotypic diversity in the Ethiopian Teff Diversity Panel (EtDP).

(A) Pearson’s correlations between agronomic traits (y-axis) and farmer preference traits, by gender (x-axis). Correlation values are expressed in color shades, as indicated in the the legend. (B, C) Top ranking genotypes (90th percentile of the OA distribution) selected by men (blue) and women (purple) farmers, overlaid to the principal component analysis (PCA) of the EtDP genetic diversity (B) and phenotypic diversity (C). Genotypes that were not selected are reported as gray dots. When men and women select the same genotype, the corresponding point appears in dark violet. Trait distribution across discriminant analysis of principal component (DAPC) clusters, for overall appreciation (D) and panicle length (E). Farmer varieties are represented by open circles, improved lines are represented by full circles. (F) Alluvial plot reporting the consistency of farmers’ choice by quartiles of the OA distribution. Each vertical bar represents a combination of location (Adet, Akaki) and gender (M, W). EtDP accessions are ordered on the y-axis according to their OA score in each combination. Alluvial flows are colored according to OA quartiles combined across gender and across location according to the legend (q1, q2, q3, and q4). (G) Venn diagram reporting farmer varieties having values superior to the 75th percentile of the trait distribution of improved varieties (lower than the 25th percentile in the case of GFP). Each area of the Venn diagram reports the corresponding number of farmer varieties, as in the legend. DH, days to heading; DM, days to maturity; PH, plant height; PL, panicle length; PBPM, number of primary branches per main shoot panicle; TT, total tillers; CLF, first culm length; CDF, first culm diameter; PW, panicle weight; PY, panicle yield; GY, grain yield; BY, biomass yield; HI, harvest index; GFP, grain filling period; GFR, grain filling rate; BPR, biomass production rate; OA, overall appreciation; PA, panicle appreciation. This figure has two figure supplements.

The district of sampling of teff landraces in the EtDP was used to aggregate accessions and calculate the fixation index (Fst) as a measure of genetic distances. Although our experimental design does not allow to fully untangle the effect of geography, environment, and social factors, we found a pattern of genetic variation distribution that could be associated to an isolation by distance and adaptation process, where Fst values are significantly associated with geographic distance (Mantel r = 0.31; p = 9e−04) and environmental distance (Mantel r = 0.352; p = 0.0137; Figure 3C, D). Accessions from East Tigray (Misraquawi) showed the highest separation from the collection, followed by East Oromia (Misraq Harerge) and West Amhara (Agew Awi; Figure 3E). Tigray is believed to be the center of teff domestication, as earlier reports identified there much of its diversity (Costanza et al., 2007) and archaeobotanical excavations reported its cultivation there since the Pre Aksumite period (before I Century CE; D’Andrea, 2008). However, limited archeological information is available from other parts of the country, which may conceal other sites of early domestication of teff. When integrating the putative teff wild relatives Eragrostis curvula and E. pilosa in our teff landraces’ phylogeny, we found that they did not group with any specific DAPC cluster (Figure 3—figure supplement 2).

Participatory evaluation of the teff diversity prioritizes genetic materials for breeding

The EtDP was phenotyped for agronomic and farmer appreciation traits during the main cropping season in two locations in Ethiopia (Figure 1D) representing areas with high potential for teff cultivation. Experienced teff farmers including 15 women and 20 men provided their evaluations on overall appreciation (OA) and panicle appreciation (PA) across the EtDP. Women and men farmers were selected because they were both teff growers, and their responses were analyzed separately to highlight distinct trait evaluation criteria. Previous studies conducted in Ethiopia showed that agricultural traits relevant to marketability, food and drink preparations, animal feed, and construction are all perceived differently by women and men farmers (Assefa et al., 2014; Mancini et al., 2017). Regardless of gender, farmer evaluations were highly heritable and fully comparable to agronomic traits commonly targeted by breeding (Supplementary file 1C). OA provided by farmers across genders and across locations had a broad-sense heritability (H2) of 0.81 (Table 1). H2 of grain yield combined across the same two locations was 0.42 (Table 2). A significant portion of the variance for production traits could be explained by differences in location, including biomass (75%) and yield (67%). Conversely, phenology was mostly explained by genotype: this is the case of days to heading (92%) and days to maturity (86%; Supplementary file 1C). Depending on the trait, we found different effects of genetic background, location, and gender, a proxy of genotype by environment (G × E) interactions in determining teff performance and appreciation in tested locations. Genetic clusters had a significant effect on all traits, and location was important for yield and yield components (Supplementary file 1D). PA, plant height, panicle length, culm diameter, and panicle weight were affected by the interaction of location and genetic cluster. Gender and gender by location interactions always had significant effects for PVS traits (Supplementary file 1D).

Table 1
Heritabilities (H2) for farmers’ participatory variety selection traits.

H2 values are given for each trait, type, and location combination. PA, panicle appreciation; OA, overall appreciation; M, men; W, women; ALL, measures combined by either type or location.

TraitTypeLocationH2
PAALLALL0.48
PAALLAdet0.50
PAALLAkaki0.65
PAMALL0.43
PAWALL0.45
PAMAdet0.35
PAWAdet0.40
PAMAkaki0.25
PAWAkaki0.34
OAALLALL0.81
OAALLAdet0.71
OAALLAkaki0.87
OAMALL0.74
OAWALL0.74
OAMAdet0.53
OAWAdet0.66
OAMAkaki0.88
OAWAkaki0.62
Table 2
Heritabilities (H2) for agronomic traits.

H2 values are given for each trait, and location combination. DH, days to heading; DM, days to maturity; PH, plant height; PL, panicle length; PBPM, number of primary branches per main shoot panicle; TT, total tillers; CLF, first culm length; CDF, first culm diameter; PW, panicle weight; PY, panicle yield; GY, grain yield; BY, biomass yield; HI, harvest index; GFP, grain filling period; GFR, grain filling rate; BPR, biomass production rate; ALL, measures combined by location.

TraitLocationH2
DHALL0.99
DHAdet0.96
DHAkaki0.97
DMALL0.98
DMAdet0.90
DMAkaki0.89
PHALL0.16
PHAdet0.92
PHAkaki0.90
PLALL0.37
PLAdet0.88
PLAkaki0.81
PBPMALL0.64
PBPMAdet0.83
PBPMAkaki0.83
TTALL0.25
TTAdet0.65
TTAkaki0.60
CLFALL0.13
CLFAdet0.88
CLFAkaki0.85
CDFALL0.15
CDFAdet0.95
CDFAkaki0.96
PWALL0.27
PWAdet0.90
PWAkaki0.88
PYALL0.25
PYAdet0.92
PYAkaki0.91
GYALL0.42
GYAdet0.92
GYAkaki0.93
BYALL0.34
BYAdet0.84
BYAkaki0.80
HIALL0.78
HIAdet0.68
HIAkaki0.76
GFPALL0.96
GFPAdet0.79
GFPAkaki0.83
GFRALL0.52
GFRAdet0.90
GFRAkaki0.92
BPRALL0.32
BPRAdet0.80
BPRAkaki0.77

The high H2 achieved by farmers’ OA may be because farmers, in providing their overall evaluation, not only consider yield but also yield components with high heritability (Table 3). Farmers’ appreciation of teff genotypes was strongly associated with yield (p < 0.001) and its components regardless of gender, most notably plant height (p < 0.001) and grain filling period (GFP) (p < 0.05) (Figure 2A, Figure 2—figure supplement 1, Table 3). Plant height and GFP were very important for men, while women OA was highly correlated with biomass yield and panicle weight (Table 3). Different dynamics of trait prioritization by men and women are expected, and may reflect gender roles in the value chain (Weltzien et al., 2019). Studies across smallholder farming systems in Africa showed that gender-differentiated roles in agriculture may result in different trait preferences. In Ethiopian wheat, men are mostly endowed with field work, while women are concerned with marketability of grains (Mancini et al., 2017). Both in cassava (Teeken et al., 2018) and maize (Voss et al., 2021), women are mostly concerned with use traits, while men prioritize agronomic trats. The underlying causes of gender differentiation in teff trait preference need to be further characterized, also in relation to the multipurpose teff cultivation as food and feed. Still, men and women participating in the PVS were all expert teff growers, as reflected by the consistency of their evaluations (Figure 2). Previous studies showed that farmers’ appreciation has a genetic basis in the evaluated crop, and may be used to perform genomic prediction (de Sousa et al., 2021) and identify genome-wide associations (Kidane et al., 2017).

Table 3
Plackett–Luce estimates from farmer’s overall appreciation (OA) of genotypes associated with genotypes’ agronomic metrics, DM, days to maturity; PH, plant height; PW, panicle weight; GY, grain yield; BY, biomass yield; GFP, grain filling period.

The rankings were analyzed for the whole group (All) and in subsets among gender to assess differences in traits linkages within men and women farmers. * 0.05 < p < 0.01, ** 0.01 < p < 0.001, *** p < 0.001.

GroupEstimateStd. errorz valuePr(>|z|)
All(Intercept)−7.09
GY0.0007640.0001355.661.51E−08***
DM0.006250.004021.560.12
PH−0.01230.00275−4.468.17E−06***
PW−0.05570.0716−0.7770.437
BY−3.6E−052.72E−05−1.320.186
GFP0.0110.004842.270.0233*
Men(Intercept)−7.04
GY0.0007650.0001754.361.29E−05***
DM0.005420.005311.020.308
PH−0.01140.00361−3.170.00155**
PW−0.08180.0932−0.8780.38
BY−2E−050.000036−0.5620.574
GFP0.009190.006371.440.149
Women(Intercept)15.6
GY−0.000840.00028−3.010.00262**
DM−0.1860.0101−18.5<2e−16***
PH−0.1390.00661−20.9<2e−16***
PW4.40.19522.6<2e−16***
BY0.0003465.43E−056.381.74E−10***
GFP−0.1360.0104−13.1<2e−16***

The top ranking teff accessions according to men and women farmers captured different genetic backgrounds (Figure 2B), but the same trait combinations (Figure 2C). This suggested that farmers were consistently preferring the same teff ideotype regardless of their gender and regardless of the genetic background and geographic provenance of the accession. High yielding, high biomass, and fast maturing varieties were preferred. Individually, teff improved varieties showed high OA and high panicle length, which is a key component of OA (Figure 2D, E). This is supported by the fact that the evaluation fields were high-potential areas for teff cultivation. Genetic cluster 2, that is representative of most breeding materials, is associated with longer days to heading and maturity, higher plant heights and panicle lengths, greater number of total tillers and higher yields (Figure 2—figure supplement 2). Still, several traditional farmer varieties from different genetic backgrounds recorded comparable, at times superior, performance than improved lines (Figure 2G). By selecting farmer varieties that outperform improved varieties’ performances in target breeding traits, it may be possible to prioritize landrace accessions for teff improvement (Figure 2D, E) or even immediately make them available to farmers, as suggested by previous experiences in wheat (Fadda et al., 2020). In areas exposed to terminal drought, common in Ethiopia, short maturation time is paramount to achieve harvestable yield (Mengistu and Mekonnen, 2012), and it is expectedly a major component of farmers’ OA (Figure 2A). The time of maturation is therefore an obvious target trait for teff breeding, although challenging to be combined with higher attainable yield. We found that many farmer varieties had a shorter GFP than most improved lines, and that genotype EBI 9551 combined this trait with higher yield and farmers’ appreciation (Figure 2G).

Participatory, climatic, and agronomic information to support teff breeding

The data deriving from the transdisciplinary characterization of the EtDP was integrated in a GWAS framework to identify genomic loci associated with agronomic performance, local adaptation, and farmer preferences. These loci could be then used in marker assisted selection, genomic selection, or new breeding technologies to support teff improvement. The GWAS led to the identification of a total of 91 unique quantitative trait nucleotides (QTNs) (Supplementary file 1E). Of these, 43 QTNs were associated with bioclimatic variables at sampling sites. Given the low-density genotyping, we did not expect causal loci in the SNP set, yet it is possible to leverage local LD to target candidate genes in the vicinity of associations. Both in the case of trait values and bioclimatic variables, associations may be confounded by underlying LD structure driven by drift processes contributing to teff landraces’ differentiation. Teff genetic clusters (Figure 1) show different distributions in regards to bioclimatic variables and traits values, including phenology, suggesting that the expression of adaptive traits may be confounded by underlying structure (Figure 3—figure supplement 2, Figure 2—figure supplement 2). The GWAS analysis was run with varying numbers of genetic covariates, so to optimize model fit in consideration of trait-genotype covariance. Visual assessment of QQ plots was used as a guidance to interpret significant associations, and a stringent threshold was employed so to minimize Type II errors (Figure 5—figure supplement 2).

We then focused on LD blocks associated to QTNs to identify homology of predicted teff proteins with protein sequences of Arabidopsis thaliana and Zea mays (Supplementary file 1F). Among others, we identified a locus on chromosome 2A that was associated with grain yield, grain filling rate and OA (lcl|2A-14415768) (Figure 5; Figure 5—figure supplement 2). The LD region targeted by this QTN harbors 39 gene models including two 60s ribosomal subunits and several homologs of maize genes with suggestive function in relation to yield determination. Et_2A_015515, in this block, is a homolog of a maize serine/threonine protein kinase 3, belonging to a broad class of proteins that was associated with inflorescence development (McSteen et al., 2007) and grain yield (Jia et al., 2020) in maize. These findings, although preliminary, may support the development of markers for teff improvement as well as directing local and international research toward loci of relevance for teff improvement.

We further explored the EtDP data using a gradient forest (GF) machine-learning algorithm to calculate the contribution of environmental gradients to genomic variation in teff, and to estimate its genomic offset, or vulnerability, under projected climates. We found that the turnover of allele frequencies across the cropping area was best predicted by geographic variables (Moran’s eigenvector map variables, MEM), confirming the importance of nonadaptive processes in teff differentiation and the resulting relevance of structure in interpreting associations. MEMs were sided by precipitation indicators, particularly precipitation of the coldest quarter (bio19) and precipitation of the wettest month (bio13; Figure 4—figure supplement 1). The GF allowed us to model climate-driven genomic variation patterns across the landscape (Figure 4A, B). Approximately a quarter of the SNPs (3,049 in 521 LD blocks) were predictive of the GF model: of these, 176 showed Fst values in the 99th percentile of the distribution (Supplementary file 1G).

Figure 4 with 7 supplements see all
Teff genomic offset.

(A) Geographic distribution of climate-driven allelic variation under current climates across the teff cropping area, with colors representing the three principal component (PC) dimensions reported in panel (B). (C) Genomic vulnerability across the teff cropping area based on the RCP8.5 climate projections. The color scale indicates the magnitude of the mismatch between current and projected climate-driven turnover in allele frequencies according to legend. Phenotyping locations are shown with yellow diamonds. This figure has seven figure supplements.

Figure 5 with 2 supplements see all
Manhattan plots reporting the genome-wide association study (GWAS) result for.

(A) Overall appreciation, (B), panicle length, (C) grain yield, and (D) grain filling period. On the x-axis, the genomic position of markers. The y-axis reports the strength of the association signal. Single-nucleotide polymorphisms (SNPs) are ordered by physical position and grouped by chromosome. Quantitative trait nucleotides (QTNs), for example SNPs surpassing a threshold based on a false discovery rate of 0.05, are highlighted in red. A strong signal on chromosome 2A matches across participatory varietal selection (PVS) and metric traits. This figure has two supplements.

The GF outcome was then linked to the GWAS to identify genomic loci associated to climate, agronomic performance, farmers’ preference, and adaptive potential of teff (Figure 4—figure supplement 2). We found that phenology QTNs were relevant in supporting GF prediction, although they could not suffice in explaining the geographic distribution of teff diversity (Figure 4—figure supplement 3). The importance of phenology in teff adaptation was already reported (Woldeyohannes et al., 2020), although it may be confounded by underlying population structure. On chromosome 1A at 32.3 Mb, precipitation of driest month (bio14) and seasonality of precipitations (bio15) identify a QTN (Figure 4—figure supplement 1; Figure 5—figure supplement 2), whose local LD block harbors 176 gene models, among which four with predicted proteins with high homology with maize and Arabidopsis proteins (Supplementary file 1F). The product of Et_1A_007229 is predicted to be homologous to a phosphatidylinositol kinase that is involved in flower development and has been shown to influence floral transition in condition of abiotic stress (Akhter et al., 2016). This locus is in the vicinity of a QTN for days to maturity, reinforcing this interpretation. Days to maturity was also associated with a LD block at 20.8 Mb on chromosome 6A (Figure 4—figure supplement 1; Figure 4—figure supplement 2), in the vicinity of associations for days to heading and for bioclimatic diversity (Supplementary file 1E). In this block, nine gene products share homology with Arabidopsis and maize (Supplementary file 1F). The protein encoded by Et_6A_046800 is homologous to an ETO1-like protein involved in the regulation of ethylene synthesis in Arabidopsis (Wang et al., 2004). Ethylene is a key plant hormone that has been shown to be related to spike development and senescence (Valluru et al., 2017). These candidate genes have not been validated yet, and their evaluation must be cautious especially for those signals that may be confounded by background genetic structure (Figure 5—figure supplement 2). With an increasing refinement of teff gene annotations, corroborated by reverse genetic approaches (Zhu et al., 2012), it will be possible to validate genes underlying traits of interest. Teff breeding could then fully benefit from targeted editing (Lemmon et al., 2018) to speed up the development of new varieties with improved yield, local adaptation and adherence to local preferences.

The genome-wide teff adaptive potential across the landscape varied in magnitude and distribution according to different predicted climate scenarios for 2070 (Figure 4—figure supplement 4). We used these data to compute the genomic-adaptive offset between current and future climate scenarios to identify vulnerable areas (Figure 4C, Figure 4—figure supplement 5). In all representative concentration pathways (RCPs), the highest offset was predicted in the north-western highlands of the Amhara region, south of lake Tana (Figure 4C, Figure 4—figure supplement 4). Compared to other regions of the country, we found a decreasing trend of rainfall change in this region across all emission scenarios (Figure 4—figure supplement 5; Figure 4—figure supplement 6). In this area, hot nights are projected to increase more quickly than hot days, with the most marked increases expected to be experienced in the July, August, September season (Figure 4—figure supplement 7). Decreasing trends of rainfall during the main growing season are predicted in all projected scenarios, suggesting that seasonality might critically impact teff development stages (Figure 4—figure supplement 6). In light of these results, a valid adaptation strategy could be the assisted migration of teff genetic backgrounds from areas of different vulnerability (Rhoné et al., 2020). However, crop migration and varietal replacement strategies will need to take into account ecological and socioeconomic factors, including the impacts on existing ecosystems and on farmers’ adoption of migrated varieties (Sloat et al., 2020).

Conclusion

A comprehensive interpretation of crop performance is key to a sustainable intensification that embraces cultural and agricultural diversity of cropping systems. While significant successes and even a plateau might have been reached in optimal growing environments where most common crops are cultivated, there is ample opportunity to enhance productivity in marginal growing environments (Godfray et al., 2010). The success of crop varieties is not only determined by yield performance, but also by adaptation to local environments and cultural needs (Weltzien et al., 2019). The integration of genomic, climatic, and phenotyping diversity in a participatory framework may help tailor varietal development for local adaptation. The involvement of farmers in varietal evaluation is increasingly utilized in a quantitative framework to guide breeding choices in combination with genomic data (Annicchiarico et al., 2019; de Sousa et al., 2021).

Transdisciplinary methods may support the integration of smallholder farmers in modern breeding and agricultural value chains. Modern data-driven research may then efficiently harness the genetic diversity generated in farmers’ fields to project NUS, such as teff, into modern breeding. Genebank genomics may be systematically used to map its large agrobiodiversity (Woldeyohannes et al., 2020), fully disclosing the opportunity to use next generation breeding technologies (Varshney et al., 2021) and large-scale genomic selection (Poland, 2015). Multilocation trials are needed to capture the range of G × E interactions that influence agronomic performance of teff, which we could only partially characterize here. Decentralized varietal evaluation approaches can be used to scale up the testing of teff genetic resources (de Sousa et al., 2021; van Etten et al., 2019a) to better capture G × E, produce new varieties with higher local adaptation, and resulting in higher farmers’ varietal adoption. An extensive socioeconomic characterization of teff cropping, including extensive farmers interviews (Labeyrie et al., 2016), may further unveil the dynamics of teff seed exchange and associated flow of information and knowledge related to local cropping (Occelli et al., 2021). Public sector breeding beyond NUS may further enhance the combination of data-driven research with participatory approaches to improve customer and product profiling to achieve social impact as cost effective as possible.

The Intergovernmental Panel on Climate Change (IPCC) reports indicate that East Africa will experience an increase in aridity and agricultural droughts, with a substantially higher frequency of hot days and nights (IPCC, 2017). Temperature increases are also expected to result in more intense heat waves and higher evapotranspiration rates, which, coupled with the altered rainfall patterns, may affect agricultural productivity. Enhancing NUS farmer varieties offers promising opportunities to tackle food insecurity resulting from climate change in smallholder farming settings and beyond. NUS have enormous untapped potential for improvement that is hampered by lack of tools and knowledge (Yerima and Achigan-Dako, 2021), but our analyses show that their characterization is at hand. As NUS proceed toward mainstream breeding, the collaborative effort of scientists, breeders, and farmers will unlock their full potential for sustainable intensification of farming systems.

Materials and methods

Plant materials

Request a detailed protocol

The EtDP used in this study was derived from a larger teff collection of 3850 accessions held at the Ethiopian Biodiversity Institute (EBI; Addis Abeba, Ethiopia), which represents the world’s largest active teff collection, and was amplified and characterized by Woldeyohannes et al., 2020. Criteria for selecting the EtDP from the larger EBI collection were the following: (1) visible morphological variation for panicle traits, (2) geographical and agroecological representativeness, (3) apparent grain yield potential, (4) presence of different maturity groups, and (5) presence of associated traditional names of specific landraces. The EtDP includes 321 farmer varieties, sided by all 45 E. teff improved varieties released since the beginning of teff breeding program in Ethiopia and until the assembly of the EtDP. Improved varieties were obtained by Ethiopian agricultural research centers. Seven accessions of teff wild relatives E. pilosa and E. curvula were also included in the collection (Supplementary file 1A). Landraces were purified selecting and reproducing one single panicle representative of each accession. The ITPGRFA defines a crop variety as ‘[…] defined by the reproducible expression of its distinguishing and other genetic characteristics’ (Ho, 2011). For the scope of this paper, we define farmer varieties as uniform genotypes derived from the purification of ex situ accessions (i.e., landraces) collected from local farmers. Farmer varieties are therefore a proxy of landraces originally collected in farmer fields and are discussed as such.

Sequencing and variant calling

Request a detailed protocol

Seeds of the EtDP were germinated in pots at the EBI in 2018, and at least three seedlings were harvested and pooled per accession. Genomic DNA was extracted from pooled seedlings at the EBI laboratories using the GenElute Plant Genomic DNA Miniprep Kit (Sigma-Aldrich, St. Louis, MO, USA) following the manufacturer’s instructions. DNA quality was checked by electrophoresis on 1% agarose gel and using a NanoDrop ND-1000 spectrophotometer and sent to IGATech (Udine, Italy) for sequencing. Genomic libraries were produced using SphI and MboI restriction enzymes in a custom protocol for the production of double digestion restriction site-associated DNA markers (ddRAD; Peterson et al., 2012). In short, ddRAD is based on a double restriction of the target DNA, typically using a rare cutter and a frequent cutter, followed by sequencing of restriction fragments to reduce the complexity of the target genome. ddRAD libraries were sequenced with V4 chemistry on Illumina HiSeq2500 sequencer (Illumina, San Diego, CA) with 125 cycles in a paired-end mode. Reads were demultiplexed using the process_radtags utility included in Stacks v2.0 (Catchen et al., 2013) and analyzed for quality control with the FastQC tool (v.0.11.5). High-quality paired-end reads of each individual were mapped against the E. tef reference genome (version 3, available from CoGe under ID 50954; VanBuren et al., 2020) with BWA (Burrows-Weeler-Aligner v.0.7.12) using the MEM algorithm with standard parameters (Li, 2013). Alignments were sorted and indexed with PicardTools (http://broadinstitute.github.io/picard/) and samtools (Li et al., 2009). Single-nucleotide variants were identified with GATK (McKenna et al., 2010) HaplotypeCaller algorithm (version 4.2.0), run in per-sample mode followed by a joint genotyping step completed by GenotypeVCFs tool. Raw variants were filtered out using the VariantFiltration and SelectVariants GATK functions with the following criteria: monomorphic or multiallelic sites, QUAL <30; QD <2.0; MQ <40.0; AF <0.01; DP <580; SNP clusters defined as three or more variants located within windows of 5 bp. For each accession, SNPs with a total read count of <3 were set to NA. Variants were discarded if located on unanchored contigs, InDels, missing data >20%, heterozygosity >15%.

Bioclimatic characterization

Request a detailed protocol

GPS coordinates of EtDP teff landraces were derived from EBI passport data and projected onto the map of Ethiopia using R/raster (Hijmans and Etten, 2012). Teff landraces were projected onto the agroecological zones map of Ethiopia provided by the Ethiopian Institute of Agricultural Research (EIAR; MoA, 2000), which subdivides Ethiopia into different zones according to altitudinal ranges and temperature and rainfall patterns. Altitudes were assigned to each landrace based on GPS coordinates, using the CGIAR SRTM database at 90 m resolution (Reuter et al., 2007). Current climate data (1970–2000 averages) relative to teff landraces’ sampling sites were retrieved from the WorldClim 2 database of global interpolated climate data (Fick and Hijmans, 2017) at the highest available spatial resolution of 2.5′ (approximately 4 × 4 km). Collinearity among historical bioclimatic variables was previously checked with the ensemble.VIF() function in R/BiodiversityR (Kindt and Coe, 2005). Only variables with a variation inflation factor (VIF) below 10 were retained, namely bio2, bio3, bio4, bio9, bio13, bio14, bio15, bio18, and bio19. The Hadley Centre Global Environmental Model 2-Earth System (HadGEM2-ES; Jones et al., 2011) under the fifth phase of the Coupled Model Intercomparison Project (CMIP5) protocols simulations was used to retrieve future climate scenarios at the following RCPs (RCP2.6, RCP4.5, RCP6.0, and RCP8.5).

Phenotyping and participatory variety selection

Request a detailed protocol

The EtDP was phenotyped in common garden experiments in two high-potential teff growing locations in Ethiopia, Adet (Amhara, 11°16′32″ N, 37°29′30″ E) and Akaki (Oromia, 8°50′07.6″ N, 38°49′58.3″ E), under rainfed conditions during the main cropping season of 2018 (July–November). Adet is the main experimental site of the Amhara Regional Agricultural Research Institute, at an altitude of 2240 meter above sea level. Soil type is vertisol, climate is moist cool, and average annual rainfall is 1250 mm. Akaki is a subsite of the Debre Zeit Agricultural Research Center, with an altitude of 2200 meter above sea level. Also in this case, soil type is vertisol, climate is moist cool and average annual rainfall is 1055 mm. Accessions were planted in two replications per site using an alpha lattice design, in plots consisting of three rows of 1 m in length and 0.2 m interrow distance. Three phenological traits, days to 50% heading (DH), days to 90% maturity (DM), and GFP were recorded on whole plots in each environment. Morphology and agronomic traits were recorded from five randomly selected teff plants per plot: plant height (PH, cm), panicle length (PL, in cm), number of primary branches per main shoot panicle (PBPM), number of total tillers (TT), first culm length (CLF, in cm), first culm internode diameter (CDF, in mm), panicle weight (PW, in g), panicle yield (PY, in g), grain yield (GY, ton/ha), biomass yield (BY, ton/ha), harvest index (HI), grain yield filling rate (GFR, kg/ha/day), and biomass production rate (BPR, kg/ha/days). Qualitative data were sourced from the characterization performed by Woldeyohannes et al., 2020.

A PVS was conducted in the two locations, involving 35 experienced teff farmers: 15 men and 10 women in Adet, five men and five women in Akaki. PVS was conducted close to physiological maturity in each location so to maximize variation between plots. Participating farmers were selected with the help of agricultural officers. They all had experience on teff production and were recognized as a local agricultural expert in teff cultivation. Being local farmers, they spoke different languages (Amharic and Oromo) and local interpreters were employed. Farmers were engaged in focus group discussions prior to the PVS to discuss most relevant traits in teff cultivation and to attend training on how to perform the PVS. During the evaluation, farmers were divided into gender-homogeneous groups with five people each. Groups were conducted across the field from random entry points and asked to evaluate two traits: PA and OA. PVS traits were given on a Likert scale from 1 (poor) to 5 (excellent), in a way answering a question in the form of: ‘how much do you like the [panicle appearance/overall appearance] of this plot from one to five?’. Farmers provided their scores simultaneously so that within-group scoring bias was reduced. Each farmers’ score was recorded individually. We used Plackett–Luce model (Turner et al., 2019) to analyze farmers’ OA. Agronomic traits from tested genotypes were linked to farmer’s OA with a linear model using an Alternating Directions Method of Multipliers (ADMM) algorithm proposed by Yıldız et al., 2020.

Phenotypic data analyses

Request a detailed protocol

We used an analysis of variance (ANOVA) to describe trait differences conditional to teff genetic groups, locations, and, in the case of PVS, to gender with a procedure similar to that used in Mokuwa et al., 2013. Best linear unbiased predictions (BLUPs) of agronomic and PVS traits were computed with R/ASReml (Gilmour et al., 2014). BLUPs for agronomic traits were derived from the general model in Equation S1:

(S1) yik=μ+gi+lk+glik+e

where the observed phenotypic value is yik , µ is the overall mean of the population, gi is the random effect for the ith genotype g, lk is the fixed effect for the kth location, glik is the random effect interaction between genotype and location, and e is the error. For calculation of BLUPs with a single location, the data were subset by location and the model in Equation (S1) was simplified accordingly. Broad-sense heritability (H2) of agronomic traits was derived from the variance component estimates deriving from Equation (S1) as follows:

(S2) H2=σgσg+σglnloc+σenrep*nloc

In Equation (S2), σg is the variance component of genotypes, σgl is the genotype by location variance, and σe is the error variance. nlocnrep are the number of locations and replications, respectively. For calculation of H2 within locations (i.e., repeatability), Equation (S2) was simplified accordingly.

The derivation of PVS BLUPs and H2 was like that used for agronomic traits except for the fact that gender of farmers was considered. BLUPs for PVS were obtained from the model in Equation (S3):

(S3) yikm=μ+gi+lk+pm+glik+gpim+plmk+e

where yikm is the observed PVS score, and µ, gi , lk , and glik are as in Equation (S1) and pm is the random effect for farmer gender. Accordingly, gpim is the random effect of the interaction between genotype and gender and plmk is the random interaction between gender m and the kth location. For calculation of BLUPs specific for gender, location, and gender by locations, Equation (S3) was simplified accordingly. H2 for PVS traits was derived from the following formula:

(S4) H2=σgσg+σglnloc+σgmngender+σenrep*nloc*ngender*nfarmer

In Equation (S4), σg is the variance component of genotypes, σgl is the genotype by location variance, σgm is the genotype by gender variance, and σe is the error variance. nloc , ngender , and nrep are the number of locations, genders, and replications, respectively. For calculation of H2 (i.e., repeatability) by gender and by location, Equation (S4) was simplified accordingly. The 90th percentile of the OA distribution was considered to identify top ranking accessions for men and women. Farmer varieties were benchmarked with the fourth quartile of the distribution of improved lines for all scored traits.

Genetic diversity

Request a detailed protocol

Phylogenetic relationships in the EtDP were assessed on a pruned set of SNP markers with MAF >0.05. Pruning was performed with the PLINK (Purcell et al., 2007) indep-pairwise function on a 100 SNPs window moving in 10 SNP steps with an LD r2 threshold of 0.3. Pairwise identity by descent (IBS) was calculated by PLINK and visualized with custom scripts in R (R Development Core Team, 2018). PCA and DAPCs were performed with R/adegenet (Jombart, 2008). The optimal number of clusters (K) for the DAPC was identified using find.cluster(). Bayesian information criterion (BIC) statistics were computed at increasing values of K to measure the goodness of fit at each K using 365 PCs and default settings. ADMIXTURE (Alexander and Lange, 2011) was run testing 2–25 K clusters using the default termination criterion. Each iteration was run using different random seeds, and parameter standard errors were estimated using 2000 bootstrap replicates. A fivefold cross-validation procedure was used to identify the most likely value of K. The correlation of the residual difference between the true genotypes and the genotypes predicted by the model was estimated using EvalAdmix (Garcia-Erill and Albrechtsen, 2020). A neighbor-joining (NJ) tree was developed computing genetic distances using the Tajima–Nei method (Tajima and Nei, 1984) and performing 500 bootstrap resampling, using MEGA X (Kumar et al., 2018). Different NJ tree visualizations were produced using R/ggtree (Yu et al., 2017). Genotypic data of putative teff wild relatives E. curvula and E. pilosa were integrated in the NJ phylogeny. A set of putative SNPs shared between wild relatives and cultivated teff was derived as described for the EtDP.

LD analyses were performed on SNPs with MAF >0.05. Average pairwise r2 for all markers within a window of ±5 Mb was estimated using R/LDheatmap (Shin et al., 2007). The LD was plotted against physical positions, averaging pairwise r2 values for each chromosome over sliding window considering portions equal to 5% of each chromosome’s physical length. LD decay was then estimated for each of the chromosomes (Hill and Weir, 1988) using a threshold of r2 = 0.3. Haplotype blocks were estimated using the PLINK-blocks function with default settings and following the interpretation of Gabriel et al., 2002.

Climatic diversity and GF

Request a detailed protocol

Agroecological and bioclimatic variation analyses were performed on georeferenced materials of the EtDP. The distribution of the DAPC genetic clusters across agroecological zones was mapped via R/raster. After aggregating teff georeferenced accessions in Ethiopian administrative regions at the second level (districts), pairwise Fst (Weir and Cockerham, 1984) was calculated across all SNP markers for all areas accounting at least five individuals. Centroid coordinates of the accessions within each district were used to estimate geographic distances, while environmental distances were calculated by averaging the value of noncorrelated historical bioclimatic variables and altitude. A measure of environmental distance between each accession was thus calculated as pairwise Euclidean differences between locations. A Mantel test with a Monte Carlo method (9999 replications) was implemented in R/ade4 (Dray and Dufour, 2007) to check associations between linearized Fst (Fst/1 − Fst) and geographic and environmental distances.

The teff cropping area was defined by the union of all polygons representing agroecological zones in which at least two teff landraces were sampled. Significant associations between genetic clusters and agroecological zones and administrative regions were assessed using Pearson’s chi-squared test of independence. Pairwise Wilcoxon rank sum test was used to test the significance (p < 0.05) of differences in bioclimatic variables among DAPC clusters. A GF machine-learning approach implemented in R/gradientForest (Ellis et al., 2012; Fitzpatrick and Keller, 2015) was used to map the turnover in allele frequencies using nonlinear functions of environmental gradients with historical and projected climates. The GF was developed using historical noncollinear bioclimatic variables and MEM variables representing climatic and geographic diversity in the sample, respectively. MEM variables were derived from geographic coordinates at sampling locations of the landraces in the EtDP (Dray et al., 2006; Griffith and Peres-Neto, 2006) and were calculated with (dbmem) in R/adespatial (Stéphane Dray et al., 2021). A function was built for each response variable (SNPs) using 500 regression trees. An aggregate function was created for all SNPs, whereas the bioclimatic variables and MEMs were used as predictors. The model was then run to predict teff genetic–geographic–climatic distribution on the teff cultivation range in Ethiopia. The GF model was also run using and projected climate data under different RCP scenarios.

Climate projections for areas of interest were analyzed to assess trends in rainfall and temperature. The 12 models best performing in the East Africa region according to IPCC (IPCC, 2017) were used to develop and ensemble projection of rainfall and temperature indices with Climate Data Operators (CDO) (Schulzweida, 2017) and custom R scripts. Projected data were compared with historical data to derive indices change in the interannual variability for the regions of interest.

Genome-wide association studies

Request a detailed protocol

QTNs were mapped in a GWAS. GWAS was performed with R/rMVP (Yin et al., 2019) using the Fixed and random model Circulating Probability Unification (FarmCPU) method (Liu et al., 2016) that incorporates corrections for population cryptic relatedness (Kinship). The first 10 genetic PCs were used as covariates to account for population structure. The Kinship was estimated using the method implemented by VanRaden, 2008. Both kinship and PCA were calculated using the subset of LD-pruned markers used for population genetics analysis. GWAS was run on bioclimatic variables, agronomic traits, and PVS traits. After the first round of mapping with 10 PCs, individual QQ plot was visually surveyed for inflation in the p value distributions, as these could be caused by suboptimal correction for population structure. When inflation was detected, the corresponding GWAS scan was run again using 25 genetic PCs as covariates. QTN was called when association surpassed a multiple testing correction with false discovery rate of 5% using R/q value (Storey et al., 2021). QTNs were assigned to the previously defined haplotype blocks. Blocks were extended by the chromosome-specific LD decay distance upstream and downstream and used as windows to search for candidate genes. The LD blocks thus obtained were combined with Fst and GF results to identify intersections across methods.

Teff gene annotations were retrieved from CoGe under id50954 (VanBuren et al., 2020). Nucleotide sequences of putative candidate genes were translated into the corresponding proteins and used as queries against Araport11 (Cheng et al., 2017) and the Maize reference proteome, available from UniProt (https://www.uniprot.org/) under the ID UP000007305. E value of 10−20 and percentage of identity of 50% were used as threshold to retain blast hits on Arabidopsis and maize.

Data availability

Teff accessions are available upon request from the Ethiopian Biodiversity Institute (EBI, http://www.ebi.gov.et/). Raw DNA sequencing reads are available on the Short Read Archive (https://www.ncbi.nlm.nih.gov/sra/) under BioProject accession number PRJNA758057. All scripts used for data analysis are available on GitHub at https://github.com/mdellh2o/TeffDiversityPanel, (copy archived at swh:1:rev:4935b0ef76f4c1631d323d39777ec375912e2f82).

The following data sets were generated
    1. Dell'acqua M
    (2021) NCBI BioProject
    ID PRJNA758057. Genetic characterization of Ethiopian Teff diversity panel.

References

  1. Book
    1. Gomez y Paloma S
    2. Riesgo L
    3. Louhichi K
    4. Fan S
    5. Rue C
    (2020) Nutrition
    In: Gomez y Paloma S, editors. The Role of Smallholder Farms in Food and Nutrition Security. Cham: Springer International Publishing. pp. 13–28.
    https://doi.org/10.1007/978-3-030-42148-9
  2. Book
    1. Ketema S
    (1997)
    Eragrostis Tef Zucc. Trotter: Promoting the Conservation and Use of Underutilized and Neglected Crops 12
    Italy: International Plant Genetic Resources Institute.
  3. Software
    1. R Development Core Team
    (2018) R: A language and environment for statistical computing
    R Foundation for Statistical Computing, Vienna, Austria.
  4. Website
    1. Schulzweida U
    (2017) CDO User’ s Guide
    Accessed August 25, 2021.

Decision letter

  1. Kelly Swarts
    Reviewing Editor; Gregor Mendel Institute of Molecular Plant Biology, Austria
  2. Detlef Weigel
    Senior Editor; Max Planck Institute for Biology Tübingen, Germany
  3. Bela Teeken
    Reviewer; CGIAR, Nigeria

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Decision letter after peer review:

[Editors’ note: the authors submitted for reconsideration following the decision after peer review. What follows is the decision letter after the first round of review.]

Thank you for submitting the paper "Data-driven, participatory characterization of farmer varieties discloses teff breeding potential under current and future climates" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by a Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Bela Teeken (Reviewer #2); Laura Morales (Reviewer #3).

Comments to the Authors:

We are sorry to say that, after consultation with the reviewers, we have decided that this work will not be considered further for publication by eLife.

There was a lot of enthusiasm for the work in principle, but the common judgment was that revision would be considerably more extensive than commonly expected for eLife. Nevertheless, while we cannot formally invite revision, we remain very interested in the work. If you resubmit a suitably revised version, it would be treated as a new submission, but we would aim to recruit the same editors and reviewers to critique the work.

All of the reviewers were enthusiastic about the comprehensive scope of the manuscript but they raised concerns regarding a number of the analyses. The reviewers make specific recommendations regarding these concerns that should be addressed in a future submission. Critically, two of the reviewers highlight methodological issues with the choice of 6 clusters that were used for the diversity analyses that underlie a number of the subsequent analyses.

Reviewer #1:

Overall summary

"Data-driven, participatory characterization of farmer varieties discloses teff breeding potential under current and future climates" by Woldeyohannes and colleagues is a fascinating, interdisciplinary approach for understanding genetic, ecological and phenotypic variation in teff (Eragrostis tef), a small-market domesticate native and commonly grown in Ethiopia and the Horn of Africa. They assess genetic variation in traditional (landrace) and improved breeding varieties from a germplasm collection representing the diverse ecological zones of Ethiopia, attempting to link genetic with environmental variation. The collection was evaluated in two diverse locations by the authors for breeder phenotypes but also by smallholder teff farmers. Heritability for farmer traits is high and also highly correlated with breeder traits. Interestingly, men and women, while they mostly agree on the best varieties, have slightly different preferences. The authors conduct a GWAS but present very few results, mostly based on a few randomly chosen QTN. Finally, the authors present an interesting analysis using a gradient forest predictive model to understand the impact of environmental gradients of genomic variation, which will help identify vulnerable areas under climate change. This is an important topic that is outside of my field.

While the data collection, methodology and aims of the paper are very compelling, it suffers from some important interpretation and analytical issues in the characterization of genetic diversity. Specifically, clustering analysis used throughout the paper was not appropriate based on the data presented. The authors present some tantalizing results associating genetic diversity of environmental diversity, but these analyses could be clarified, tightened and streamlined. The authors attempt characterize genetic diversity in terms of social interactions but I did not find the proxy used for social interaction convincing. The phenotypic analysis, especially with respect to farmer selections was very clear and compelling but it would have been nice to see more follow-up on the GWAS analyses associated with these traits.

Introduction

The authors do a good job of introducing the market importance of teff and its role in small-holder agriculture but I would have liked more information on teff biology. Line 89 states that it's a tetraploid, but is it auto or allo? Is the ancestor also tetraploid? What was it domesticated from? Naturally outcrossing or inbreeding? What were historical effective population sizes (in the landraces and the breeding material)?

Results

Teff farmer varieties harness broad genetic diversity

This section showcases the genetic relationships between accessions and is associated with figure 1. The genetic material includes landraces, breeding lines and outgroup wild relatives on RADseq markers, which is only clear from looking at the associated methods and a summary here would be welcome. While the Admixture and PCA analyses are performed correctly, there are serious issues with the interpretation of outputs.

Admixture results (K = 2-10) are presented as well as a PCA plot colored based on the DAPC clustering for the same PCA. The DAPC clustering results are then shown in geographic space, according to altitude and to agroecological zone.

Buried in supplementary material 4 are the metrics for evaluating model fit for the admixture as well as the DAPC clustering. The lowest cross-validated error rate (best predictive accuracy in leave-one-out) is for K = 20, but the authors state the K = 6 is the "best" K. Likewise, the lowest BIC (model fit) value for the DAPC clustering is 10, not 6 as the authors state.

The authors decision to promote six clusters is not supported and the main figure 1 and all subsequent uses will need to be updated to reflect the best fit values (DAPC 10; admixture 20). It would also be informative to contrast breeding material with landraces as this is one of the main aims of the study.

The distribution of teff genetic variation is associated with geographic and environmental factors.

This section, associated with Figure 2, seeks to contextualize teff genetic variation within environmental and social variation. Overall, the aim of contextualizing genetic variation ecologically and socially is a good one and the ecological associations are interesting and appropriate but I don't find the social proxies convincing and some of the analyses – the neighbor joining plot with outgroups in S2, and perhaps the FST analysis in 2E – are out of place.

2A is a PCA of bioclimatic variation from interpolated weather data assigned to each landrace and there are interesting patterns with respect to DAPC cluster groups (which will need to up updated), which are highlighted in the text. 2-C and D show FST values, presumably based on populations assigned by regional political district as in part E but this is not explained, against geographic and environmental (the average of orthogonal bioclimatic variables and altitude) difference. The regional districts are supposed to be a proxy for social connections and markets (lines 185-187) but there is no citation or presented evidence for why this is a reasonable proxy.

The main conclusion (lines 188-192), that both isolation by geographic distance and environmental distance contribute to genetic differentiation, is not surprising to since the populations used for FST were defined geographically and the geographical and environmental distance are likely fairly highly correlated. 2E shows the F¬ST matrix, which shows that samples from Tigray are the most genetically isolated, while those from Amhara and Oromia are relatively more admixed. F2-S1 are boxplots of the bioclimatic values clustered by the DAPC groups, which, with updated DAPC clustering, are an interesting look at genetic and spatial differentiation, especially the bottom row with the values for the bioclimatic PCs.

Participatory evaluation of the teff diversity prioritizes genetic materials for breeding

This section, associated with figure 3, explores the participatory evaluation results, comparing scores between men and women and between the farmer and breeder evaluations. There was generally very high agreement between breeders and farmers and some intriguing insight into gender-based differences in evaluation criteria. The phenotypic models were appropriate and data presented and textual discussion was overall clear and well-supported.

Participatory, climatic, and agronomic diversity identify candidate loci for teff breeding

The approaches used for GWAS analysis are appropriate based on that reported in the methods but there are no supporting figures. At the least, I would expect Q-Q plots to assess population structure correction and Manhattan plots in the supplemental. For the specific genes that the authors focus on it would be good to also see the local Manhattan and LD information. However, the litanies of linked gene models could also probably be dropped or minimized in the text (it may still be an important resource for future studies to have in the public record). I think the authors could have done more with the GWAS results.

Teff cultivation is vulnerable to climate change

This section, associated with figure 4, uses a gradient forest predictive model to understand the impact of environmental gradients of genomic variation, which will help identify vulnerable areas under climate change. These are not analyses I am familiar with and I do not feel qualified to review this section in detail.

Conclusion

The conclusions are pertinent and help motivate future directions.

Methods

The methods are generally well described. As the methods are situated after the results, it would be useful to reiterate some of the key points from the methods under results to help situate the reader.

Specific comments and suggestions

It would be good to give the whole manuscript a copy edit. There were a number of small grammatical and word choice issues I noted, a few of which are below, along with additional suggestions.

127 what is the difference between moist and humid? Citation?

132 what kind of libraries? There should be a short statement of library choice and associated biases.

163-165 This statement is probably true if you're deeply familiar with the germplasm and the environments, but it's not clear to someone with no prior knowledge based on the figure. It's actually better supported by figure 2A.

174-175 Figure S1-5 does not support this statement. Beyond the fact that the K was chosen at random, as were the DAPC clusters, admixture is not quite the right evidence. A better line of support for this statement would be something along the lines of "Limited population stratification, as evidenced by less than 15% of the total genetic variance explained by the first three principal components (Figure 1-B-C)…".

200-204 You can refer back to other figures to connect dots – it would make more sense for the pop gen to be presented together.

207 What environments are these trial locations in? More description.

218-220 Define shorthand phenotypes here (the ones used in figure 3)

Figure 1: It is not completely clear from the legend in Figure 1, but I believe these figures include the landraces and breeding material (except panel D – this must be clarified). I would suggest that rather than the current panels it might be more informative to just see admixture (K = 20) ordered by breeding vs landrace, altitude or agroecological zone rather than based on unsupervised clustering (admixture). Likewise PCA plots colored based on independent values (altitude, improvement status, etc) may be more informative than using the DAPC cluster assignments (K = 10), although I like Figure S1-6 and the boxplots in 1E are probably more informative than just coloring based on altitude would be. There is nothing about improvement status in this figure – it would also be interesting to highlight the genetic relationships between the breeding and landrace material (S3 suggests that much of the breeding material clusters)

Figure 2: I would rethink the social proxy (market analyses? Ethnicity?) and clarify the analyses performed for the bioclimatic associations. I think S1 (the last row) would be a good analysis to include in the main figure. I would put S2 with Figure 1, maybe I'm missing something but I'm not sure why it's there. I also don't understand why the correlations between the climate variables and the PCs are presented in B – this is clearly visible from the eigenvectors in A – but if it is informative I would suggest more discussion.

Figure 3: Very interesting. It might be interesting to see the correlation plot in A separated by men and women.

What happened to the GWAS analysis?

Methods:

Plant Materials: seems reasonable and sufficiently described.

Sequencing and Variant Calling: I'm not very familiar with RADseq calling pipelines but it seems reasonable. This is a well-established approach though and if the authors are following a previously published processing pipeline it would be good to cite.

Spatial and bioclimatic characterization: It's not clear to me when the different datasets are used, and why you would use one over the other.

470 Why use the highest resolution for something like a landrace that has a km rather than meter resolution?

More information on the grow-out locations would be welcome.

Reviewer #2:

In the context of teff production and consumption in Ethiopia the authors advocate breeding initiatives to strengthen this crop that is important for the livelihoods of the many small holders in Ethiopia by highlighting the limited yields, the limited breeding efforts made in this crop and the anticipated impactful climate change. By using genetic, climatic, geographic and participatory variety selection, gene pools are identified that would best inform parents to use for future breeding initiatives. This goes even further and shows the possibility to identify crucial genetic markers needed for breeding focus.

With convincing strength, the authors employ genetic analysis of landraces collected all over the country and link this to the different agro-ecological regions and niches as well as the different ethnic traditions and to the climatic data including a forecast of climatic factors based on an extrapolation of historic climate data. Although overwhelming at first, the large amount of figures well illustrate the argument and well satisfy if one wants to get some deeper understanding (supplementary figures). A great innovative strength is the proposition of a standard toolbox to combine the different data to provide product profiles for teff breeding.

The authors rightly state that the suitability of varieties does not only depend on ecological, climatic and product data but also on socio-cultural variables and they take this into account by stating that the regions coincide with the different ethnic groups in the country. These ethnic groups could however be named and their differences with regard to teff related practices and preferences highlighted and related to the two different groups of farmers chosen for the participatory variety selection. The selection strategy of the farmers could also be provided.

A little more discussion could be provided on the representativeness of the two PVS and phenotyping sites and the farmers evaluating those. In the conclusion the authors cite the TRICOT approach as a way forward to decentralize PVS and get better systematic and representative testing of the genetic resources. The limitation of the only two sites in which the genetic collections were grown and the possible G x E influence on the phenotyping of the varieties could also be highlighted. However, the fact that with the two locations such great results were obtained in linking all the datasets shows the potential of the approach taken.

321 landraces from 3850 Provide were selected for the study. It will be important to provide the selection strategy used to get to the 321 varieties. This selection strategy could also be highlighted outside of the methodology. The selection strategy might also be related to the time of collection of the varieties. Are these 321 chosen in a way to simply represent the largest genetic diversity in the collection or is the time of collection of the samples also included: to represent the varieties that are currently still cultivated?

With regards to choosing men and women and comparing them separately there is not discussion on why one would expect differences. Only in the conclusion Weltzien et al. is cited in this respect but this needs more attention and reference to some more literature, especially in relation to the Ethiopian context. What is especially needed is an explanation of the different tasks (roles) of men and women in Ethiopia with regards to teff cultivation and possibly processing. This should be provided in general and in relation to the two study sites. This could also provide the background to a short discussion to interpret the similarities and the difference observed (appreciation being more related to biomass yield and panicle weight for women while for men it was plant height and grain filling period, could this be explained by women using plant parts for animal feed or any other activity in which they more feature?). I know that this is not the main focus of the paper but the fact that men and women's preferences were analyzed and discussed separately demands a discussion with regards to the results and the strategy used to select the men and women.

This work is not only relevant to under NUS crops but also for breeding in general and especially for public sector breeding. In this respect the authors could open up this approach to be more generally applied to other crops and could highlight the importance of their approach for customer and product profiling and especially for customer profiling with which especially public breeding is tasked to achieve social impact as cost effective as possible.

Figures

With regards to figures 1 and 2: it is great to have all these figures close to each other for good cross referencing, however in that case it will be very important to have extremely good resolution. A solution could be to arrange the figures in a way that the now smaller figures in Figure 1 become bigger and the larger figure (Figure 1 D) become a little smaller so that the PCs are better visible, this is minor work to arrange. Also, the legenda with the colored dots indicating the DAPC clusters could be made much bigger for easier determination of the colors. The same counts for figure 2.

Please verify the Y axe on figure 3E. Should this not be PA instead of PL? It was panicle appreciation that was evaluated with the farmers not panicle length?

Reviewer #3:

The authors have conducted an interdisciplinary characterization of a valuable teff diversity panel using well-supported methods, except for the selection of genetic clusters. Although the methodology itself is not novel, the interdisciplinary nature of the study sheds light on a wide range of Ethiopian teff germplasm characteristics, from environmental adaption to gender preference.

Strengths:

The incorporation of farmer's knowledge and gender preferences in the genotypic and phenotypic analysis of the germplasm was well done and nicely presented. This kind of information is absolutely necessary for the improvement and intensification of NUS, especially in smallholder farmer systems. Few germplasm characterization studies, both on NUS and staple crops, have reported such a broad range of characteristics.

Weaknesses:

Although many of the analyses rely on the differentiation of genetic clusters, the authors have inappropriately selected the optimal number of clusters (K). The authors conducted a cross-validated error analysis of various ADMIXTURE K values and a similar analysis of BIC statistics of various DAPC K values. Figure 1—figure supplement 4 clearly demonstrates that K=20 for ADMIXTURE and K=10 for DAPC, as these values of K have the lowest cross-validated error and BIC, respectively. However, the authors claim to have chosen K=6 because (a) there is a slight flattening of the ADMIXTURE error curve after K=6 and (b) K=6 has the lowest DAPC BIC statistic, neither of which are supported by the results.

Although the authors reported high broad-sense heritability (H2) for several agronomically important traits, variance components for the terms (genotype, environment, GxE, gender, error, etc.) used to estimate H2 were not shown. A table including H2, variance components, and the numbers of levels and observations for each term is necessary to assess the validity of these results. This information would also shed light on the relative influence of genetics, environment, gender, etc. on trait variation.

As stated above the methods used to select K=6 as the optimal number of clusters was statistically inappropriate. Although K=20 had the lowest cross-validated ADMIXTURE error, this value seems too high. I would suggest using K=10, as this value of K had the lowest DAPC BIC, showed a more clear flattening in the ADMIXTURE error curve, and is closer to your desired value of K=6. Did K=6 correspond to some prior knowledge on the diversity panel, for example from breeders' knowledge? If so, perhaps this could be stated or incorporated in some way to support your choice of K=6.

[Editors’ note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Data-driven, participatory characterization of farmer varieties discloses teff breeding potential under current and future climates" for further consideration by eLife. Your revised article has been evaluated by Detlef Weigel (Senior Editor) and a Reviewing Editor.

The manuscript has been improved but there are some remaining issues that need to be addressed, as outlined below:

Essential revisions:

1) Further explicate the relative contributions of genotype, environment and GxE for the traits evaluated, per the more detailed suggestions of reviewers 1 and 3. This may be supported by the Oryza research suggested by reviewer 2.

2) Reassess the discussion of the GWAS results in the text and supporting material, particularly DM and PC2_bio. Per comments by reviewer 1, the QQ plots show that these traits are highly impacted by residual population structure and thus subject to an excess of false positives.

3) Provide a bit more discussion of gendered roles in agricultural production, taking into account comments by reviewer 2.

Reviewer #1:

The authors in their responses have addressed most of my concerns and I think the paper reads much more clearly now. I only have two additional comments:

With respect to the GWAS results, these can now be evaluated with the inclusion of the QQ plots. All traits except bio14 are somewhat confounded with population structure (genome-wide deviation from expectation of no association, or deviation from the line in the QQ plot) and DM and PC2_bio are very correlated with structure and the QQ plot suggests that neither of these traits have truly associated loci in this study. The authors discuss results from especially DM a great deal and I think the discussion of these results should be reevaluated.

The authors also state in the conclusion that "Multi-location trials are needed to capture the range of genotype by environment (G x E) interactions that influence agronomic performance of teff, which we could only partially characterize" but they do have two locations and are able to model GxE (although, more locations would certainly lead to better estimates). For a genomic resource paper, it would be very beneficial to report the impact of GxE directly (and how the landraces and improved varieties may differ, as has been found in other crops).

Reviewer #2:

The authors have done a great job in revising the manuscript, which is much easier to read and figures and tables. The organisation has also improved. As far as my expertise allows the authors have also well approved the technical analysis part of the manuscript.

The sampling of the PVS participants has been described satisfactorily and the reason why to include gender has been articulated. I can imagine that the authors did not find any literature or resources to explain the difference observed between women and men. I highlighted that a reason could be that women are tasked with animal feeding and look at the plant's vegetative parts also as a resource of animal food while men would not focus on that so much because it might be out of their gendered roles. The authors could quickly consider this issue and see if there is any probability in this being the case or any other reason that could be mentioned to explain the differences observed between women and men. Any little clue or hypothetical phrase would give the gender dimension just a little more depth and clue for further investigation, rather than just saying that differences are to be expected.

The authors nicely made the link to general breeding and the development of customer and product profiles in (public) breeding that are more and more expected to deliver on social impact in line with the sustainable development goals

The issue I raised with regards to the representativeness of the PVS trial locations has also been resolved satisfactorily.

This is now a really strong paper and needs to be published so I will be able to share it as a good example of transdisciplinary data driven research.

Please correct on line 161-162: reference is made to two figures (as the word 'and' is written) but only figure 6 is mentioned.

Furthermore, I would like to point the authors to the following publications based on work on the under researched African rice (Oryza glaberrima) and how farmers long trajectories of selection resulted in selecting 'robust' varieties that are adapted to change and dynamics (social as well as climatic) rather than narrow local localities. Any reference to this body of work could be made, although the manuscript stands strong as it is without referring to this body of work.

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0034801

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0085953

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0007335

https://link.springer.com/article/10.1007/s10745-012-9528-x

Reviewer #3:The authors have done a nice job revising the manuscript using 10 genetic clusters, following the recommendations of the reviewers.I would suggest incorporating the information in Supplementary file 1C into Tables 1 and 2. For example, columns for variance explained by genotype, location, rep, error, etc. could be added to these tables.

I also recommend discussing the results in Supplementary file 1C (or Tables 1-2, if the authors choose to merge this information) further. The experimental design appears to have allowed for high estimation of heritability, which the authors have already stated in the first and latest versions of the manuscript. However, as a plant breeder, I would like to know more about the relative contributions of genotype, environment, GxE on the different traits measured in this germplasm. For example, it is interesting that (a) for BPR, location explains 75% of the variance, while genotype explains a 18% of the variance and there is no GxE effect, which contrasts to (b) CDF, which has large GxE variance (66%) and relatively smaller proportions of the variance are explained by genotype (7%) and location (19%). There are likely some trends/comparisons among traits that can be discussed, such as which traits tend to have high GxE vs traits that have high location effects vs traits with high genetic variance.

[Editors’ note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Data-driven, participatory characterization of farmer varieties discloses teff breeding potential under current and future climates" for further consideration by eLife. Your revised article has been evaluated by Detlef Weigel (Senior Editor) and a Reviewing Editor.

The manuscript has been improved but there are some remaining issues that need to be addressed, as outlined below:

1) Both reviewers 1 and 3 are still concerned with the interpretation of the GWAS. Please see especially reviewer 1's lengthy explanation to help clarify interpretation. Please take these concerns into account when interpreting GWAS results and reporting significant loci.

2) All of the reviewers have made small, specific comments that should be addressed.

3) A final copy-edit after changes have been made.

Reviewer #1:

The paper reads much better and I am satisfied with the incorporation of GxE analysis. I still have concerns with the interpretation of the GWAS results and would recommend a final copy edit to fix the grammatical errors introduced in the editing process.

Statistically significant p-values are detected as deviations from the expectation of no-association based on statistical linkage between the causal association and the tested SNP (there are likely no causal loci tested in this limited marker set). The foundational assumption in GWAS is that these associations will be in local LD based on limited recombination in the region surrounding the causal polymorphism, allowing the researcher to zoom in on the loci underlying trait variation. This is visualized on an QQ plot as most dots on the diagonal of no-association with a small deviation associated with only the most significant SNPs (bio14 is a good example of this). On a Manhattan plot, these will for a localized peak around causal loci, characterized by the extent of the haplotype block associated with the causal polymorphism and the sampled marker density.

However, linkage can also exist between causal polymorphism and loci across the genome, even on other chromosomes, because selection is not random or evenly applied across a species or population. When populations differ due to drift processes, and selection is nested within randomly varying genetic structure, GWAS tests for the trait under selection will also find associations with the variants associated with differentiation between populations. For example, one population lives in a cold, highland environment and one lives in a tropical, lowland context. The highland population has a shorter growing season and consequently flowers earlier to ensure that the grain can mature in time before frost. The populations were already a bit different due to distance and chance, but now that the flowering window is non-overlapping for these populations it ensures that gene flow is basically stopped and that the populations will move further apart. If one were to do a GWAS for days to flowering or maturity (or for temperature or growing season variation) across these two populations the resulting QQ and Manhattan plot would look like the ones in the analysis for PC_bio2 or for DM; an early a persistent deviation from the expectation of no association in the QQ that localizes all over the genome in a Manhattan plot. This is not to say that there are not real associations (the peak on 6A is likely associated with a real causal locus for DM), but they are confounded with underlying structure and "statistically significant" associations are no longer a good way to identify truly associated loci. A better way to think of GWAS with these qualities is that the top SNPs are enriched for linked causal associations but any given association is suspect.

With this in mind, I would suggest a bit more discussion of confounded structure. Perhaps a way to frame it is in the context of local adaptation, tying in the GF models and variance analysis of GxE for the various traits.

331-336: The accepted associations for DM and PC2_bio are still very lax and based on the QQ plots heavily confounded with underlying population structure. This underlying structure is almost certainly driving association with the gradient forest model and I think that the conclusion that this association "support the importance of phenology in teff adaptation and geographic distribution" is reasonable, but not because of genetics underlying days to maturity per se. If the authors were to include a statement to this effect the overall interpretation is reasonable.

341-343: Again, when two traits are both heavily confounded with underlying structure it is not surprising that they would colocalize (via the third variable of structure), and I would be especially sceptical of colocalized regions between these traits.

349-355: Agree that this peak is likely real because of the very strong local LD. Not convinced of anything else

Reviewer #3:

The authors have done a nice job including more information about GxE, genetic variance, etc.

I am still not convinced of the GWA results for DM and PC2_bio, which show a very large deviation from the expected p-value distribution. The abundance of false positives cannot merely be dismissed with the authors' statement that "some of the associated traits…showed some statistical inflation on QQ plots likely contributed by residual population structure". A large proportion of all SNPs tested were deemed significant (assuming that the significance cut-off was approximately -log10p = 3.5) for DM and PC2_bio. I would suggest that the authors use a secondary threshold based on a visual assessment of the QQ plots. For DM, I would suspect that the ~5 SNPs with -log10p > 4.8 are truly significant (QTL on chromosome 6A). Similarly for PC2_bio, the 2 SNPs at the tail of the distribution are likely significant.

With respect to the gender aspects of this manuscript, I would recommend including references from similar work done by the NextGen Cassava project. Here are two examples:

https://doi.org/10.1007/s12231-018-9421-7

https://doi.org/10.1002/csc2.20152

https://doi.org/10.7554/eLife.80009.sa1

Author response

[Editors’ note: the authors resubmitted a revised version of the paper for consideration. What follows is the authors’ response to the first round of review.]

Reviewer #1:

Introduction

The authors do a good job of introducing the market importance of teff and its role in small-holder agriculture but I would have liked more information on teff biology. Line 89 states that it's a tetraploid, but is it auto or allo? Is the ancestor also tetraploid? What was it domesticated from? Naturally outcrossing or inbreeding? What were historical effective population sizes (in the landraces and the breeding material)?

We added more details about teff biology in the Introduction section, adding two new references (see below). Being an autogamous species, the effective population size Ne can be computed as half the number of individuals.

“Teff (Eragrostis tef) is an annual, self-pollinating, and allotetraploid grass of the Chloridoideae subfamily (Ketema, 1997). It is widely grown in Ethiopia as a staple crop, where it is valued for its nutritional and health benefits, resilience in marginal and semi-arid environments and cultural importance (D’Andrea, 2008; Ketema, 1997). Teff was likely domesticated in the northern Ethiopian Highlands from the allotetraploid Eragrostis pilosa, but the timing of its initial cultivation and the identity of its diploid ancestors remain poorly understood (D’Andrea, 2008; Ingram and Doyle, 2003; VanBuren et al., 2020).”

Results

Teff farmer varieties harness broad genetic diversity

This section showcases the genetic relationships between accessions and is associated with figure 1. The genetic material includes landraces, breeding lines and outgroup wild relatives on RADseq markers, which is only clear from looking at the associated methods and a summary here would be welcome. While the Admixture and PCA analyses are performed correctly, there are serious issues with the interpretation of outputs.

We revised and clarified this section so to aid readability:

“The EtDP comprises 321 farmer varieties spanning the entire geographical and agroecological range of teff cultivation in Ethiopia, from the sub-moist lowlands of Tigray in the North to the moist lowlands of Oromia in the South, and from the sub-humid lowlands of Benishangul and Gumuz in the West to the sub-humid mid-highlands of Oromia in the East (Figure 1 —figure supplement 1, Supplementary file 1A) (MoA, 2000). The EtDP also includes 45 improved varieties released since the first breeding efforts on teff and up to the moment of the EtDP creation. A selection of seven Eragrostis spp., putative wild relatives of teff, was included as outgroup. The genomic diversity of the EtDP was assessed starting from 12,153 high-quality, genome wide single nucleotide polymorphisms (SNPs) derived from double digest restriction-site associated sequencing (ddRAD-seq) of individual accessions, followed by filtering for variant call quality and linkage disequilibrium (LD) pruning.”

We revised Figure 1, Figure 3 and all Figures were relevant so to highlight the separation between farmer varieties and breeding materials with different symbols.

Admixture results (K = 2-10) are presented as well as a PCA plot colored based on the DAPC clustering for the same PCA. The DAPC clustering results are then shown in geographic space, according to altitude and to agroecological zone.

Buried in supplementary material 4 are the metrics for evaluating model fit for the admixture as well as the DAPC clustering. The lowest cross-validated error rate (best predictive accuracy in leave-one-out) is for K = 20, but the authors state the K = 6 is the "best" K. Likewise, the lowest BIC (model fit) value for the DAPC clustering is 10, not 6 as the authors state.

The authors decision to promote six clusters is not supported and the main figure 1 and all subsequent uses will need to be updated to reflect the best fit values (DAPC 10; admixture 20).

The reviewer is right in that the best interpretation for the DAPC clustering is 10. The reason why we initially settled on DAPC 6 is that this was the most parsimonious interpretation that could be provided to the clustering procedure. However, we take the point, and we have changed our clustering interpretation throughout the whole manuscript, discussing the existence 10 clusters. This resulted in update figures and updated text, as well as in a change in interpretation of the results. All changes, that are too many to be reported in this file, are highlighted in the attached revised manuscript text

It would also be informative to contrast breeding material with landraces as this is one of the main aims of the study.

Indeed. To address this point, we added a more thorough discussion of the clustering outcome for improved varieties and landraces and changed the figures to better depict the type of genetic materials. Changes are widespread in text and include:

“Breeding lines showed a narrower genetic base compared to the diversity maintained by Ethiopian farmers; these materials could not represent the broad diversity available within the EtDP (Figure 1 —figure supplement 4), and predominantly belonged to clusters 2, 8 and 10 (Figure 3D-E).”

The distribution of teff genetic variation is associated with geographic and environmental factors.

This section, associated with Figure 2, seeks to contextualize teff genetic variation within environmental and social variation. Overall, the aim of contextualizing genetic variation ecologically and socially is a good one and the ecological associations are interesting and appropriate but I don't find the social proxies convincing and some of the analyses – the neighbor joining plot with outgroups in S2, and perhaps the FST analysis in 2E – are out of place.

We agree that a clear-cut social interpretation of teff diversity is not supported by the available, and we removed the outgroups in the NJ plot in S2. We maintained however the Fst plot, as it may be useful for audience knowledgeable of the Ethiopian geography (e.g. teff breeders) to support prioritization of genetic materials by local breeders. The point that the reviewer raises is critical: genetic variation in smallholder farming systems must be contextualized in ecologic and social dimensions. Our approach is speculative, but supported by previous literature now cited and extensively discussed:

“Extant landraces diversity is not only contributed by climate, but also by seed circulation. Studies have shown that cultural and social dynamics, such as belonging to similar or different ethnolinguistic groups, are key factors in shaping seed exchange networks (Labeyrie et al., 2016; Samberg et al., 2013). In Ethiopia, regional districts are markers of cultural and historical diversity, and smallholder farmers have limited capacity for long-distance travel. Thus, we hypothesized that regional distinctions could impact seed exchanges and spatial patterns of teff genetic diversity. The district of sampling of teff landraces in the EtDP was therefore used to aggregate accessions and calculate genetic distances as a measure of fixation index (Fst). Although our experimental design does not allow to fully untangle the effect of geography, environment, and social factors, we found a pattern of genetic variation distribution that can be associated to an isolation by distance and adaptation process, where Fst values are significantly associated with geographic distance (Mantel r = 0.31; p = 9e-04) and environmental distance (Mantel r = 0.352; p = 0.0137) (Figure 2C-D).”

We rephrased the Conclusion paragraph to better discuss perspectives in teff improvement, stressing the need for a socioeconomic characterization of seed systems

“An extensive socioeconomic characterization of teff cropping, including extensive farmers interviews (Labeyrie et al., 2016), may further unveil the dynamics of teff seed exchange and associated flow of information and knowledge related to local cropping (Occelli et al., 2021). Public sector breeding beyond NUS may further enhance the combination of data-driven research with participatory approaches to improve customer and product profiling to achieve social impact as cost effective as possible.”

2A is a PCA of bioclimatic variation from interpolated weather data assigned to each landrace and there are interesting patterns with respect to DAPC cluster groups (which will need to up updated), which are highlighted in the text. 2-C and D show FST values, presumably based on populations assigned by regional political district as in part E but this is not explained, against geographic and environmental (the average of orthogonal bioclimatic variables and altitude) difference.

Figure 2A has been updated with the new cluster interpretation. It is now clarified in text and legend how the Fst was computed for 2C and 2D, that is aggregating samples by local political district. The Figure legend was updated as follows:

“Figure 2. Teff diversity on the landscape. (A) Principal Component Analysis of bioclimatic diversity in the EtDP. Dots represent teff farmer varietiesf belonging to genetic clusters, colored according to legend. Vectors represent the scale, verse, and direction of bioclimatic drivers of teff differentiation. (B) Linear regression of Fst values in relation to geographic distance of accessions in the EtDP. Accessions were grouped by local district of sampling. (C) Linear regression of Fst values in relation to environmental distance of accessions, also grouped by district of sampling. (E) Pairwise Fst values between teff accessions grouped by local districts of sampling, as in (C) and (E). Local districts, i.e. sub-regional groups, are ordered by administrative regions according to legend. This figure has two figure supplements.”

The regional districts are supposed to be a proxy for social connections and markets (lines 185-187) but there is no citation or presented evidence for why this is a reasonable proxy.

This is a very good point and worth expanding in the text. We now provide a further explanation of the concept based on Labeyrie et al. (2016), who report how the belonging of ethnolinguistic groups impact seed exchange. In Ethiopia, studies shown that these same patterns involve different crops (Samberg et al. 2013) and we believe it is reasonable to assume that local culture, combined with limited capacity for long-distance travel, may affect local agrobiodiversity. This is expanded in text, as indicated above.

The main conclusion (lines 188-192), that both isolation by geographic distance and environmental distance contribute to genetic differentiation, is not surprising to since the populations used for FST were defined geographically and the geographical and environmental distance are likely fairly highly correlated.

It is true that the Fst was computed on populations that were geographically defined, but the patterns of gene flow are not always obvious. As discussed in other studies, there might be different gene flow scenarios, including IBD, IBE, counter-gradient gene flow, unrestricted gene flow, and limited gene flow (e.g. in Sexton, J. P., Hangartner, S. B., & Hoffmann, A. A. (2014) http://www.jstor.org/stable/24032844). We expanded this discussion in text, highlighting the reviewer criticism:

“Although our experimental design does not allow to fully untangle the effect of geography, environment, and social factors, we found a pattern of genetic variation distribution that can be associated to an isolation by distance and adaptation process, where Fst values are significantly associated with geographic distance (Mantel r = 0.31; p = 9e-04) and environmental distance (Mantel r = 0.352; p = 0.0137) (Figure 2C-D). Accessions from East Tigray (Misraquawi) showed the highest separation from the collection, followed by East Oromia (Misraq Harerge) and West Amhara (Agew Awi) (Figure 2E).”

2E shows the F¬ST matrix, which shows that samples from Tigray are the most genetically isolated, while those from Amhara and Oromia are relatively more admixed. F2-S1 are boxplots of the bioclimatic values clustered by the DAPC groups, which, with updated DAPC clustering, are an interesting look at genetic and spatial differentiation, especially the bottom row with the values for the bioclimatic PCs.

All figures, including the ones mentioned here, were updated considering 10 DAPC clusters

Participatory evaluation of the teff diversity prioritizes genetic materials for breeding

This section, associated with figure 3, explores the participatory evaluation results, comparing scores between men and women and between the farmer and breeder evaluations. There was generally very high agreement between breeders and farmers and some intriguing insight into gender-based differences in evaluation criteria. The phenotypic models were appropriate and data presented and textual discussion was overall clear and well-supported.

In this section, we updated figures and text to respond to reviewers’ remarks, specifically in relation to cluster number and relevance of improved varieties VS farmer varieties.

Participatory, climatic, and agronomic diversity identify candidate loci for teff breeding

The approaches used for GWAS analysis are appropriate based on that reported in the methods but there are no supporting figures. At the least, I would expect Q-Q plots to assess population structure correction and Manhattan plots in the supplemental. For the specific genes that the authors focus on it would be good to also see the local Manhattan and LD information. However, the litanies of linked gene models could also probably be dropped or minimized in the text (it may still be an important resource for future studies to have in the public record). I think the authors could have done more with the GWAS results.

Indeed, we performed a full GWAS analysis, with all outputs including Q-Q plots and Manhattan plots. In the first version of the manuscript, we left it aside as not to overload the items associated to the manuscript, however we accept the reviewer criticism and therefore we added new items and text describing the GWAS output in detail. We added a figure in the main text reporting a GWAS scan across PVS traits and metric traits, showing the co-mapping of a QTN for OA, yield and flowering time (now Figure 5). Associated to Figure 5, we included two supplementary figures, one depicting a GWAS scan focusing on environmental data, and one reporting Q-Q plots for reported associations.

For what concerns the description of QTNs in the main text: while the QTN that we focus on are a subset of all the QTNs detected, the selection is not made at random. Rather, we selected those QTNs that are more relevant for teff breeding in the light of previous studies on similar crops. We believe that this description may be useful for further studies exploring this and other teff gene pools, and thus we would prefer to keep it in the text. However, we revised the text to shorten and streamline the description of these gene models as suggested by the reviewer.

Teff cultivation is vulnerable to climate change

This section, associated with figure 4, uses a gradient forest predictive model to understand the impact of environmental gradients of genomic variation, which will help identify vulnerable areas under climate change. These are not analyses I am familiar with and I do not feel qualified to review this section in detail.

Conclusion

The conclusions are pertinent and help motivate future directions.

Methods

The methods are generally well described. As the methods are situated after the results, it would be useful to reiterate some of the key points from the methods under results to help situate the reader.

The text was revised to aid readability, reiterating relevant points in the Results section. The modifications are widespread in text and cannot be entirely reported here.

Specific comments and suggestions

It would be good to give the whole manuscript a copy edit. There were a number of small grammatical and word choice issues I noted, a few of which are below, along with additional suggestions.

The manuscript was copy edited before resubmission.

127 what is the difference between moist and humid? Citation?

“Moist” has higher rainfall than “humid” and is found at higher altitudes. These are standard definition of agroecological zonation of Ethiopia, and all info is found in the cited document (Agro-Ecological zones of Ethiopia, 1998)

132 what kind of libraries? There should be a short statement of library choice and associated biases.

Fixed.

163-165 This statement is probably true if you're deeply familiar with the germplasm and the environments, but it's not clear to someone with no prior knowledge based on the figure. It's actually better supported by figure 2A.

The description for germplasm and environments have been improved

174-175 Figure S1-5 does not support this statement. Beyond the fact that the K was chosen at random, as were the DAPC clusters, admixture is not quite the right evidence. A better line of support for this statement would be something along the lines of "Limited population stratification, as evidenced by less than 15% of the total genetic variance explained by the first three principal components (Figure 1-B-C)…".

Fixed.

200-204 You can refer back to other figures to connect dots – it would make more sense for the pop gen to be presented together.

Fixed increasing cross references to figures.

207 What environments are these trial locations in? More description.

fixed adding a full description of the experimental sites in the Materials and methods section:

“The EtDP was phenotyped in common garden experiments in two high potential teff growing locations in Ethiopia, Adet (Amhara, 11° 16′32″ N, 37° 29′30″ E) and Akaki (Oromia, 8°50'07.6"N, 38°49'58.3"E), under rainfed conditions during the main cropping season of 2018 (July-November). Adet is the main experimental site of the Amhara Regional Agricultural Research Institute, at an altitude of 2,240 meter above sea level. Soil type is vertisol, climate is moist cool, and average annual rainfall is 1,250 mm. Akaki is a sub site of the Debre Zeit Agricultural Research Center, with an altitude of 2,200 meter above sea level. Also in this case, soil type is vertisol, climate is moist cool and average annual rainfall is 1,055 mm.”

218-220 Define shorthand phenotypes here (the ones used in figure 3)

Fixed.

Figure 1: It is not completely clear from the legend in Figure 1, but I believe these figures include the landraces and breeding material (except panel D – this must be clarified). I would suggest that rather than the current panels it might be more informative to just see admixture (K = 20) ordered by breeding vs landrace, altitude or agroecological zone rather than based on unsupervised clustering (admixture). Likewise PCA plots colored based on independent values (altitude, improvement status, etc) may be more informative than using the DAPC cluster assignments (K = 10), although I like Figure S1-6 and the boxplots in 1E are probably more informative than just coloring based on altitude would be. There is nothing about improvement status in this figure – it would also be interesting to highlight the genetic relationships between the breeding and landrace material (S3 suggests that much of the breeding material clusters)

We revised figure 1 according to these comments. The updated figure features the following: (A) ADMIXTURE results for the pruned SNPs dataset at values of K = 20 and now grouped as from the result of DAPC (10 DAPC clusters); In this figure the order of samples refers to results of complete linkage agglomerative clustering, based on pairwise identity-by-state (IBS) distance. (B) and (C) the PCA is now colored according to the DAPC cluster assignments (K = 10), varieties and landraces are represented with different shapes and the relationships between breeding materials and landraces can easily be observed. (D), (E) and (F) were amended and represented with 10 genetic clusters.

Figure 2: I would rethink the social proxy (market analyses? Ethnicity?) and clarify the analyses performed for the bioclimatic associations. I think S1 (the last row) would be a good analysis to include in the main figure. I would put S2 with Figure 1, maybe I'm missing something but I'm not sure why it's there. I also don't understand why the correlations between the climate variables and the PCs are presented in B – this is clearly visible from the eigenvectors in A – but if it is informative I would suggest more discussion.

We removed panel B as suggested by the reviewer. The revision and clarification of associated text was described above in more detail

Figure 3: Very interesting. It might be interesting to see the correlation plot in A separated by men and women.

The updated figure features the following: (A) the correlation plot is now separated by men and women; panel (D) and (E) were amended and now grouped into 10 DAPC clusters.

What happened to the GWAS analysis?

We included a new figure for GWAS as described above

Methods:

Plant Materials: seems reasonable and sufficiently described.

Sequencing and Variant Calling: I'm not very familiar with RADseq calling pipelines but it seems reasonable. This is a well-established approach though and if the authors are following a previously published processing pipeline it would be good to cite.

We now expanded the RADseq description in the Materials and methods:

“In short, ddRAD is based on a double restriction of the target DNA, typically using a rare cutter and a frequent cutter, followed by sequencing of restriction fragments to reduce the complexity of the target genome.”

Spatial and bioclimatic characterization: It's not clear to me when the different datasets are used, and why you would use one over the other.

We now realize that it was a bad choice to refer to MEMs as spatial characterization and we revised this definition. The section “Spatial and bioclimatic characterization” has been renamed “Bioclimatic characterization". For “spatial” characterization we meant characterization at the sampling locations in terms of geographical positions, from which we derived Moran Eigenvector Map’s, which are non-correlated (distance-based) geographical features of the collection. This set of eigenvectors was then used to build the GF function along with non-collinear bioclimatic variables. The manuscript has been amended accordingly throughout.

470 Why use the highest resolution for something like a landrace that has a km rather than meter resolution?

The climatic resolution used is 2.5 minutes, meaning that we used cells that roughly measure 16 square kilometers (4x4 km) resulting from downscaling of climate models. This is now clarified in text:

“Current climate data (1970-2000 averages) relative to teff landraces’ sampling sites were retrieved from the WorldClim 2 database of global interpolated climate data (Fick and Hijmans, 2017) at the highest available spatial resolution of 2.5’ (approximately 4 x 4 km). “

To use a lower resolution (bigger cells) would have affected our power to distinguish geographically close locations characterized by different bioclimatic features.

More information on the grow-out locations would be welcome.

More information about the test locations were included in the Materials and methods section.

Reviewer #2:

In the context of teff production and consumption in Ethiopia the authors advocate breeding initiatives to strengthen this crop that is important for the livelihoods of the many small holders in Ethiopia by highlighting the limited yields, the limited breeding efforts made in this crop and the anticipated impactful climate change. By using genetic, climatic, geographic and participatory variety selection, gene pools are identified that would best inform parents to use for future breeding initiatives. This goes even further and shows the possibility to identify crucial genetic markers needed for breeding focus.

With convincing strength, the authors employ genetic analysis of landraces collected all over the country and link this to the different agro-ecological regions and niches as well as the different ethnic traditions and to the climatic data including a forecast of climatic factors based on an extrapolation of historic climate data. Although overwhelming at first, the large amount of figures well illustrate the argument and well satisfy if one wants to get some deeper understanding (supplementary figures). A great innovative strength is the proposition of a standard toolbox to combine the different data to provide product profiles for teff breeding.

The authors rightly state that the suitability of varieties does not only depend on ecological, climatic and product data but also on socio-cultural variables and they take this into account by stating that the regions coincide with the different ethnic groups in the country. These ethnic groups could however be named and their differences with regard to teff related practices and preferences highlighted and related to the two different groups of farmers chosen for the participatory variety selection. The selection strategy of the farmers could also be provided.

We revised the text to streamline the definition of the socioeconomic implications on teff distribution. We cannot go as far as defining the ethnicities of the farmers conducting the PVS, although they come from different regions and speak different languages. We tried to convey the point that we do not expect to have differences in evaluation of teff varieties (as shown by the collinearity of farmer evaluations across locations), but rather reduced seed exchange across regions due to social dynamics (local markets) and limited travel capacity. See answer to reviewer#1 for a detailed description of the rationale of grouping accessions by local administrative areas. The description of the farmer selection strategy was included:

“Participating farmers were selected with the help of agricultural officers. They all had experience on teff production and were recognized as a local agricultural expert in teff cultivation. Being local farmers, they spoke different languages (Amharic and Oromo) and local interpreters were employed. Farmers were engaged in focus group discussions prior to the PVS to discuss most relevant traits in teff cultivation and to attend training on how to perform the PVS.”

A little more discussion could be provided on the representativeness of the two PVS and phenotyping sites and the farmers evaluating those.

More information about the sites and their representativeness was given in the Materials and methods section (see answer above)

In the conclusion the authors cite the TRICOT approach as a way forward to decentralize PVS and get better systematic and representative testing of the genetic resources. The limitation of the only two sites in which the genetic collections were grown and the possible G x E influence on the phenotyping of the varieties could also be highlighted. However, the fact that with the two locations such great results were obtained in linking all the datasets shows the potential of the approach taken.

This point is well taken, and we rephrased the conclusion to acknowledge it (including the limitation on GxE):

“Transdisciplinary methods may support the integration of smallholder farmers in modern breeding and agricultural value chains. Modern data-driven research may then efficiently harness the genetic diversity generated in farmers’ fields. Teff is rapidly emerging from the NUS status and is projected towards modern breeding. Genebank genomics may be systematically used to map its large agrobiodiversity (Woldeyohannes et al., 2020), fully disclosing the opportunity to use next generation breeding technologies (Varshney et al., 2021) and large-scale genomic selection (Poland, 2015). Multi-location trials are needed to capture the range of genotype by environment (G x E) interactions that influence agronomic performance of teff, which we could only partially characterize. Decentralized varietal evaluation approaches can be used to scale up the testing of teff genetic resources (de Sousa et al., 2021; van Etten et al., 2019) to better capture G x E, produce new varieties with higher local adaptation, and resulting in higher farmers’ varietal adoption. An extensive socioeconomic characterization of teff cropping, including extensive farmers interviews (Labeyrie et al., 2016), may further unveil the dynamics of teff seed exchange and associated flow of information and knowledge related to local cropping (Occelli et al., 2021). Public sector breeding beyond NUS may further enhance the combination of data-driven research with participatory approaches to improve customer and product profiling to achieve social impact as cost effective as possible.”

321 landraces from 3850 Provide were selected for the study. It will be important to provide the selection strategy used to get to the 321 varieties. This selection strategy could also be highlighted outside of the methodology. The selection strategy might also be related to the time of collection of the varieties. Are these 321 chosen in a way to simply represent the largest genetic diversity in the collection or is the time of collection of the samples also included: to represent the varieties that are currently still cultivated?

An updated discussion of the relevance of breeding materials / landraces was added in text (see previous responses to reviewers). The selection of the core collection was better detailed in the Materials and methods:

“The E. teff diversity panel (EtDP) used in this study was derived from a larger teff collection of 3,850 accessions held at the Ethiopian Biodiversity Institute (EBI) (Addis Abeba, Ethiopia), which represents the world's largest active teff collection and was amplified and characterized by Woldeyohannes and collaborators (Woldeyohannes et al., 2020). Criteria for selecting the EtDP from the larger EBI collection were the following: (i) visible morphological variation for panicle traits, (ii) geographical and agroecological representativeness, (iii) apparent grain yield potential, (iv) presence of different maturity groups, and (v) presence of associated traditional names of specific landraces (e.g. Bunninye, Murri, Fesho). The EtDP includes 321 farmer varieties, sided by all 45 E. teff improved varieties released since the beginning of teff breeding program and until the selection of the EtDP. Improved varieties were obtained by a selection conducted by Ethiopian agricultural research centers. Seven accessions of teff wild relatives E. pilosa and E. curvula were also included in the collection (Supplementary file 1A).”

With regards to choosing men and women and comparing them separately there is not discussion on why one would expect differences. Only in the conclusion Weltzien et al. is cited in this respect but this needs more attention and reference to some more literature, especially in relation to the Ethiopian context. What is especially needed is an explanation of the different tasks (roles) of men and women in Ethiopia with regards to teff cultivation and possibly processing. This should be provided in general and in relation to the two study sites. This could also provide the background to a short discussion to interpret the similarities and the difference observed (appreciation being more related to biomass yield and panicle weight for women while for men it was plant height and grain filling period, could this be explained by women using plant parts for animal feed or any other activity in which they more feature?). I know that this is not the main focus of the paper but the fact that men and women's preferences were analyzed and discussed separately demands a discussion with regards to the results and the strategy used to select the men and women.

An improved description of farmers’ selection was included in the Materials and methods section:

“A participatory variety selection (PVS) was conducted in the two locations, involving 35 experienced teff farmers: 15 men and 10 women in Adet, five men and five women in Akaki. PVS was conducted close to physiological maturity in each location so to maximize variation between plots. Participating farmers were selected with the help of agricultural officers. They all had experience on teff production and were recognized as a local agricultural expert in teff cultivation. Being local farmers, they spoke different languages (Amharic and Oromo) and local interpreters were employed.”

An explanation of gender difference in trait evaluations was added in the Results section, citing relevant literature:

“Both women and men farmers were selected because they were teff growers, and their responses were analyzed separately to highlight distinct criteria for evaluating traits. Previous studies conducted in Ethiopia showed that agricultural traits relevant to marketability, food and drink preparations, animal feed and construction are all perceived differently by women and men farmers (Assefa et al., 2014; Mancini et al., 2017).”

And:

“Different dynamics of trait prioritization by men and women are expected, and may reflect gender roles in the value chain (Weltzien et al., 2019): while men are mostly endowed with field work, women are concerned with marketability of grains (Mancini et al., 2017). However, men and women alike were expert teff growers, as reflected by the consistency of their evaluations (Figure 3).”

This work is not only relevant to under NUS crops but also for breeding in general and especially for public sector breeding. In this respect the authors could open up this approach to be more generally applied to other crops and could highlight the importance of their approach for customer and product profiling and especially for customer profiling with which especially public breeding is tasked to achieve social impact as cost effective as possible.

Thank you for your comment. We reshaped the conclusion section so to reflect a more general interpretation of our results:

“An extensive socioeconomic characterization of teff cropping, including extensive farmers interviews (Labeyrie et al., 2016), may further unveil the dynamics of teff seed exchange and associated flow of information and knowledge related to local cropping (Occelli et al., 2021). Public sector breeding beyond NUS may further enhance the combination of data-driven research with participatory approaches to improve customer and product profiling to achieve social impact as cost effective as possible.”

Figures

With regards to figures 1 and 2: it is great to have all these figures close to each other for good cross referencing, however in that case it will be very important to have extremely good resolution. A solution could be to arrange the figures in a way that the now smaller figures in Figure 1 become bigger and the larger figure (Figure 1 D) become a little smaller so that the PCs are better visible, this is minor work to arrange. Also, the legenda with the colored dots indicating the DAPC clusters could be made much bigger for easier determination of the colors. The same counts for figure 2.

Figure 1 and Figure 2 were amended as follows:

Figure 1: panel (A) ADMIXTURE results for the pruned SNPs dataset at values of K = 20 grouped according to DAPC clustering results and IBS-based hierarchical clustering (hclust()), Panel (B) and (C) are now larger and the shape of the dots allows to distinguish between teff landraces and varieties. (D) is smaller, while (E) and (F) were kept the same size. The figures were amended and grouped using 10 DAPC clusters instead of 6. The legend is now bigger, we think that the suggestions made the overall results much better summarized, thank you.

Please verify the Y axe on figure 3E. Should this not be PA instead of PL? It was panicle appreciation that was evaluated with the farmers not panicle length?

Thanks for the comment. Good point, it was not consistent with the result section, so now Panicle Appreciation (PA) is shown in 3A.

Reviewer #3:

The authors have conducted an interdisciplinary characterization of a valuable teff diversity panel using well-supported methods, except for the selection of genetic clusters. Although the methodology itself is not novel, the interdisciplinary nature of the study sheds light on a wide range of Ethiopian teff germplasm characteristics, from environmental adaption to gender preference.

Strengths:

The incorporation of farmer's knowledge and gender preferences in the genotypic and phenotypic analysis of the germplasm was well done and nicely presented. This kind of information is absolutely necessary for the improvement and intensification of NUS, especially in smallholder farmer systems. Few germplasm characterization studies, both on NUS and staple crops, have reported such a broad range of characteristics.

Weaknesses:

Although many of the analyses rely on the differentiation of genetic clusters, the authors have inappropriately selected the optimal number of clusters (K). The authors conducted a cross-validated error analysis of various ADMIXTURE K values and a similar analysis of BIC statistics of various DAPC K values. Figure 1—figure supplement 4 clearly demonstrates that K=20 for ADMIXTURE and K=10 for DAPC, as these values of K have the lowest cross-validated error and BIC, respectively. However, the authors claim to have chosen K=6 because (a) there is a slight flattening of the ADMIXTURE error curve after K=6 and (b) K=6 has the lowest DAPC BIC statistic, neither of which are supported by the results.

This criticism is well taken and was addressed by reconsidering results of the DAPC as well as of ADMIXTURE. Now results are interpreted according to K=10 for the DAPC and K=20 for ADMIXTURE; all figures, including those of the supplement, were amended accordingly

Although the authors reported high broad-sense heritability (H2) for several agronomically important traits, variance components for the terms (genotype, environment, GxE, gender, error, etc.) used to estimate H2 were not shown. A table including H2, variance components, and the numbers of levels and observations for each term is necessary to assess the validity of these results. This information would also shed light on the relative influence of genetics, environment, gender, etc. on trait variation.

This data was included as a separate supplementary table (Supplementary file 1C)

As stated above the methods used to select K=6 as the optimal number of clusters was statistically inappropriate. Although K=20 had the lowest cross-validated ADMIXTURE error, this value seems too high. I would suggest using K=10, as this value of K had the lowest DAPC BIC, showed a more clear flattening in the ADMIXTURE error curve, and is closer to your desired value of K=6. Did K=6 correspond to some prior knowledge on the diversity panel, for example from breeders' knowledge? If so, perhaps this could be stated or incorporated in some way to support your choice of K=6.

To address this and the other reviewer’s remark, we decided to focus on the best interpretation (10 clusters) in the main text and in all parts depending on this interpretation. Every figure in the main text as well as in the supplement were amended accordingly.

[Editors’ note: what follows is the authors’ response to the second round of review.]

The manuscript has been improved but there are some remaining issues that need to be addressed, as outlined below:

Essential revisions:

1) Further explicate the relative contributions of genotype, environment and GxE for the traits evaluated, per the more detailed suggestions of reviewers 1 and 3. This may be supported by the Oryza research suggested by reviewer 2.

We expanded on the relative contribution of G, E, and G x E in traits determination. To this end, we revised a supplemental table, we added a new supplemental table, and expanded the text in the Results section. See response to reviewers for details

2) Reassess the discussion of the GWAS results in the text and supporting material, particularly DM and PC2_bio. Per comments by reviewer 1, the QQ plots show that these traits are highly impacted by residual population structure and thus subject to an excess of false positives.

We carefully reconsidered GWAS results. In some instances, we believe that residual population structure is not harming the interpretation of the results. We discuss these instances while adding notes of caution on the interpretation of the results. See response to reviewers for details

3) Provide a bit more discussion of gendered roles in agricultural production, taking into account comments by reviewer 2.

We expanded the discussion including new literature and citing the interpretation of the reviewer as speculation.

Reviewer #1:

The authors in their responses have addressed most of my concerns and I think the paper reads much more clearly now. I only have two additional comments:

With respect to the GWAS results, these can now be evaluated with the inclusion of the QQ plots. All traits except bio14 are somewhat confounded with population structure (genome-wide deviation from expectation of no association, or deviation from the line in the QQ plot) and DM and PC2_bio are very correlated with structure and the QQ plot suggests that neither of these traits have truly associated loci in this study. The authors discuss results from especially DM a great deal and I think the discussion of these results should be reevaluated.

It is true that some residual structure is present in most GWAS scan and that as a result there is an increased chance of type II errors. This is now clarified at the beginning of the GWAS result paragraph:

“Some of the associated traits, notably days to maturity and PC2 of bioclimatic variables, showed some statistical inflation on QQ plots likely contributed by residual population structure (Figure 5—figure supplement 2).”

We tried to compensate this using a stringent multiple testing-corrected statistical threshold, which considers p-values distributions specific for each trait. In the following discussion, we argue that this structure does not hinder the significance of our results, which focus on the highest associations to compensate for inflation. Indeed, p-value inflation is also determined by the local LD at associated loci, and this is the case of the QTL on chr 6A for DM. As visible in the corresponding Manhattan plot, high LD in the region in casing many SNPs to be associated to the QTN, and this contributes to the inflation in the QQ plot. However, the underlying cause is not genome-wide structure but rather localized LD. This is now clarified in text:

“Days to maturity were also associated with a LD block at 20.8 Mb on chromosome 6A that was predictive for the GF model (Figure 4 —figure supplement 2; Figure 4 —figure supplement 3). The corresponding peak, the higher for days to maturity, groups many SNPs (Figure 5—figure supplement 1) and contributes to the inflation observed in the QQ plot for this trait (Figure 5—figure supplement 2).”

When a similar interpretation is not supported, we included a more critical discussion of the associations, e.g. here:

“On chromosome 1A at 32.3 Mb, three QTNs for days to maturity colocalized with a large significance peak for precipitation of driest month (bio14) and PC2 of bioclimatic variables, representing seasonality (Figure 4 —figure supplement 2; Figure 5 —figure supplement 1; Figure 5 —figure supplement 2). Although both days to maturity and PC2 of bioclimatic variables show some statistical inflation, the co-mapping of this locus across different traits reinforces its significance.”

And here:

“These candidate genes have not been yet validated, and their evaluation must be cautious especially for those signals that may confound to background genetic structure (Figure 5—figure supplement 2).“

The authors also state in the conclusion that "Multi-location trials are needed to capture the range of genotype by environment (G x E) interactions that influence agronomic performance of teff, which we could only partially characterize" but they do have two locations and are able to model GxE (although, more locations would certainly lead to better estimates). For a genomic resource paper, it would be very beneficial to report the impact of GxE directly (and how the landraces and improved varieties may differ, as has been found in other crops).

In relation to this and to considerations by Reviewer#3, we decided to expand the discussion on GxE. To this end, we followed the approach suggested by the Editor and presented in one of the papers mentioned by Reviewer#2 and included a new supplementary material providing a synthetic representation of the effect of location (i.e. environment), genetics, and gender on trait values. Supplementary Table 1D reports p-values for two-way and three-way ANOVA exploring the interactions in each of the trait. We refer to this in the main text:

“Depending on the trait, we found different effects of genetic background, location, and gender, a proxy of genotype by environment (G x E) interactions in determining teff performance and appreciation in tested locations. Genetic clusters had a significant effect on all traits and location was important for yield and yield components (Supplementary file 1D). Panicle appreciation, plant height, panicle length, culm diameter and panicle weight were affected by the interaction of location and genetic cluster. Gender and gender by location interactions always had significant effects for PVS traits (Supplementary File 1D).”

We also modified Supplementary Table 1C to include a new column reporting the percent variance explained by each component of the model as per suggestion of Reviewer#3.

Reviewer #2:

The authors have done a great job in revising the manuscript, which is much easier to read and figures and tables. The organisation has also improved. As far as my expertise allows the authors have also well approved the technical analysis part of the manuscript.

The sampling of the PVS participants has been described satisfactorily and the reason why to include gender has been articulated. I can imagine that the authors did not find any literature or resources to explain the difference observed between women and men. I highlighted that a reason could be that women are tasked with animal feeding and look at the plant's vegetative parts also as a resource of animal food while men would not focus on that so much because it might be out of their gendered roles. The authors could quickly consider this issue and see if there is any probability in this being the case or any other reason that could be mentioned to explain the differences observed between women and men. Any little clue or hypothetical phrase would give the gender dimension just a little more depth and clue for further investigation, rather than just saying that differences are to be expected.

To our knowledge, there is no literature on gender-differentiated variety preferences in teff, however several studies explored this factor in other crops. We added a reference to a comprehensive review paper from Jill Cairn’s group at CIMMYT, focusing on maize. Although specific to that crop, it provides an interesting perspective to interpret our findings. The paper is referred here:

“In Ethiopian wheat, men are mostly endowed with field work, while women are concerned with marketability of grains (Mancini et al., 2017). In maize, studies across Africa revealed gender-differentiated trait preferences matching different roles in cropping: women are mostly concerned with use traits (e.g. shellability, milling characteristics, storability) (Voss et al., 2021). However, the underlying causes of gender differentiation in teff trait preference need to be further characterized, also in relation to the multi-purpose of teff cultivation as food and feed.”

The authors nicely made the link to general breeding and the development of customer and product profiles in (public) breeding that are more and more expected to deliver on social impact in line with the sustainable development goals

The issue I raised with regards to the representativeness of the PVS trial locations has also been resolved satisfactorily.

This is now a really strong paper and needs to be published so I will be able to share it as a good example of transdisciplinary data driven research.

Please correct on line 161-162: reference is made to two figures (as the word 'and' is written) but only figure 6 is mentioned.

Fixed.

Furthermore, I would like to point the authors to the following publications based on work on the under researched African rice (Oryza glaberrima) and how farmers long trajectories of selection resulted in selecting 'robust' varieties that are adapted to change and dynamics (social as well as climatic) rather than narrow local localities. Any reference to this body of work could be made, although the manuscript stands strong as it is without referring to this body of work.

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0034801

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0085953

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0007335

https://link.springer.com/article/10.1007/s10745-012-9528-x

Thank you for suggesting these very valuable reads; it was our mistake to have missed them and we are glad to refer to this literature. We cited two works among those suggested, Mokuwa et al. 2013 and 2014. Tirst in the introduction:

“The agrobiodiversity that farmers maintain is the result of genetic, environmental and societal interactions and has large potential for varietal innovation when farmer preferences are factored in the selection process (Mokuwa et al., 2014).”

Then as methodological approach to answer to Reviewer#1 and Reviewer#2 request to expand on GxE interaction:

“We used an analysis of variance (ANOVA) to describe trait differences conditional to teff genetic groups, locations, and, in the case of PVS, to gender with a procedure similar to that used in Mokuwa et al. (2013).”

Reviewer #3:

The authors have done a nice job revising the manuscript using 10 genetic clusters, following the recommendations of the reviewers.

I would suggest incorporating the information in Supplementary file 1C into Tables 1 and 2. For example, columns for variance explained by genotype, location, rep, error, etc. could be added to these tables.

The information suggested by the reviewer is very interesting and we decided to enrich Supplementary file 1C with a column reporting percent variance explained by different components to the model. We had considered adding this information to Tables 1-2, but we realized that they would become too complex due to the many different models employed, especially for what concerns PVS data. Conversely, we believe that Supplementary file 1C in the new format allows a clear and comprehensive representation of this information.

I also recommend discussing the results in Supplementary file 1C (or Tables 1-2, if the authors choose to merge this information) further. The experimental design appears to have allowed for high estimation of heritability, which the authors have already stated in the first and latest versions of the manuscript. However, as a plant breeder, I would like to know more about the relative contributions of genotype, environment, GxE on the different traits measured in this germplasm. For example, it is interesting that (a) for BPR, location explains 75% of the variance, while genotype explains a 18% of the variance and there is no GxE effect, which contrasts to (b) CDF, which has large GxE variance (66%) and relatively smaller proportions of the variance are explained by genotype (7%) and location (19%). There are likely some trends/comparisons among traits that can be discussed, such as which traits tend to have high GxE vs traits that have high location effects vs traits with high genetic variance.

In response to this suggestion, we incorporated a new supplementary material (Supplementary file 1C), described in detail in the response to Reviewer#1. We also added a more thorough reference to variance components in the model:

“A significant portion of the variance for production traits could be explained by differences in location, including biomass (75%) and yield (67%). Conversely, phenology was mostly explained by genotype: this is the case of days to heading (92%) and days to maturity (86%) (Supplementary file 1C).”

[Editors’ note: what follows is the authors’ response to the third round of review.]

The manuscript has been improved but there are some remaining issues that need to be addressed, as outlined below:

1) Both reviewers 1 and 3 are still concerned with the interpretation of the GWAS. Please see especially reviewer 1's lengthy explanation to help clarify interpretation. Please take these concerns into account when interpreting GWAS results and reporting significant loci.

In response to this suggestion, we incorporated a new supplementary material (Supplementary file 1C), described in detail in the response to Reviewer#1. We also added a more thorough reference to variance components in the model:

“A significant portion of the variance for production traits could be explained by differences in location, including biomass (75%) and yield (67%). Conversely, phenology was mostly explained by genotype: this is the case of days to heading (92%) and days to maturity (86%) (Supplementary file 1C).”

Reviewer #1:

The paper reads much better and I am satisfied with the incorporation of GxE analysis. I still have concerns with the interpretation of the GWAS results and would recommend a final copy edit to fix the grammatical errors introduced in the editing process.

Statistically significant p-values are detected as deviations from the expectation of no-association based on statistical linkage between the causal association and the tested SNP (there are likely no causal loci tested in this limited marker set). The foundational assumption in GWAS is that these associations will be in local LD based on limited recombination in the region surrounding the causal polymorphism, allowing the researcher to zoom in on the loci underlying trait variation. This is visualized on an QQ plot as most dots on the diagonal of no-association with a small deviation associated with only the most significant SNPs (bio14 is a good example of this). On a Manhattan plot, these will for a localized peak around causal loci, characterized by the extent of the haplotype block associated with the causal polymorphism and the sampled marker density.

However, linkage can also exist between causal polymorphism and loci across the genome, even on other chromosomes, because selection is not random or evenly applied across a species or population. When populations differ due to drift processes, and selection is nested within randomly varying genetic structure, GWAS tests for the trait under selection will also find associations with the variants associated with differentiation between populations. For example, one population lives in a cold, highland environment and one lives in a tropical, lowland context. The highland population has a shorter growing season and consequently flowers earlier to ensure that the grain can mature in time before frost. The populations were already a bit different due to distance and chance, but now that the flowering window is non-overlapping for these populations it ensures that gene flow is basically stopped and that the populations will move further apart. If one were to do a GWAS for days to flowering or maturity (or for temperature or growing season variation) across these two populations the resulting QQ and Manhattan plot would look like the ones in the analysis for PC_bio2 or for DM; an early a persistent deviation from the expectation of no association in the QQ that localizes all over the genome in a Manhattan plot. This is not to say that there are not real associations (the peak on 6A is likely associated with a real causal locus for DM), but they are confounded with underlying structure and "statistically significant" associations are no longer a good way to identify truly associated loci. A better way to think of GWAS with these qualities is that the top SNPs are enriched for linked causal associations but any given association is suspect.

With this in mind, I would suggest a bit more discussion of confounded structure. Perhaps a way to frame it is in the context of local adaptation, tying in the GF models and variance analysis of GxE for the various traits.

331-336: The accepted associations for DM and PC2_bio are still very lax and based on the QQ plots heavily confounded with underlying population structure. This underlying structure is almost certainly driving association with the gradient forest model and I think that the conclusion that this association "support the importance of phenology in teff adaptation and geographic distribution" is reasonable, but not because of genetics underlying days to maturity per se. If the authors were to include a statement to this effect the overall interpretation is reasonable.

341-343: Again, when two traits are both heavily confounded with underlying structure it is not surprising that they would colocalize (via the third variable of structure), and I would be especially sceptical of colocalized regions between these traits.

349-355: Agree that this peak is likely real because of the very strong local LD. Not convinced of anything else

Thank you for your constructive criticism, thorough explanation and dedication in this revision. Indeed, you have been completely right in pointing out the role of underlying structure in the interpretation of the results. In this revision, we worked to incorporate your comments to the fullest. Also in consideration of Rev#3 comments, we decided to conduct a new association scan for those phenotypes showing inflation of the p values based on a visual assessment of the QQ plots. These included DM and PC2_bio as correctly pointed out. To reduce confounding structure, we run again the GWAS with additional PC covariates and we found that 25 covariates (instead of 10 as in the case of the other traits) could successfully reduce inflation of p values. This can be seen on the QQ plots and Manhattan plots in the revised Figure 5 S1 and Figure 5 S2 attached to the resubmission. The procedure is now reported in Materials and methods:

“The first 10 genetic PCs were used as covariates to account for population structure. The Kinship was estimated using the method implemented by VanRaden (VanRaden, 2008). Both kinship and PCA were calculated using the subset of LD-pruned markers used for population genetics analysis. GWAS was run on bioclimatic variables, agronomic traits, and PVS traits. After the first round of mapping with 10 PCs, individual QQ plot were visually surveyed for inflation in the p-value distributions, as these could be caused by sub-optimal correction for population structure. When inflation was detected, the corresponding GWAS scan was run again using 25 genetic PCs as covariates. QTN were called when association surpassed a multiple testing correction with False Discovery Rate (FDR) of 5% using R/q-value (Storey et al., 2021).”

As you correctly suspected, PC2_bio could not yield “true” associations and was excluded from the results. DM still showed all the relevant QTN but this time without the inflation on Chr 6A. The new correction applied to the models of the problematic traits resulted in a reduction of the number of QTNs. However, the remaining QTN are more solid and relevant, and add value to the manuscript. For this we need to tank you once again for your careful revision. The new GWAS scan results are reported in Table 1E, in this version featuring a new column reporting the number of PC covariates used for each trait. The text was updated to report and discuss the new results, as well as to acknowledge the critical points raised in the revision. We expanded the discussion of the role of confounding structure in the interpretation of the associations, we reframed the discussion of GWAS results in combination with the GF and opted for a more cautious interpretation of the results. Here below I report the most relevant text added to respond to the comments above. All changes are highlighted in text with track change.

In Results and Discussion when introducing GWAS results:

“The GWAS led to the identification of a total of 91 unique quantitative trait nucleotides (QTNs) (Supplementary file 1E). Of these, 43 QTNs were associated with bioclimatic variables at sampling sites. Given the low-density genotyping, we did not expect causal loci in the SNP set, yet it is possible to leverage local LD to target candidate genes in the vicinity of associations. Both in the case of trait values and bioclimatic variables, associations may be confounded by underlying LD structure driven by drift processes contributing to teff landraces’ distribution. Teff genetic clusters (Figure 1) show different distributions in regards to bioclimatic variables and traits values, including phenology, suggesting that the expression of adaptive traits may be confounded by underlying structure (Figure 2—figure supplement 2, Figure 3—figure supplement 2). The GWAS analysis was run with varying numbers of genetic covariates, so to optimize model fit in consideration of trait-genotype covariance. Visual assessment of QQ plots was used as a guidance to interpret significant associations, and a stringent threshold was employed so to minimize Type II errors (Figure 5—figure supplement 2).”

In Results and Discussion when discussing GF:

“We found that the turnover of allele frequencies across the cropping area was best predicted by geographic variables (Moran’s eigenvector map variables, MEM), confirming the importance of non-adaptive processes in teff differentiation and the resulting relevance of structure in interpreting associations.

In Results and Discussion when discussing loci of interest:

“The GF outcome was then linked to GWAS to identify genomic loci associated to climate, agronomic performance, farmers’ preference, and adaptive potential of teff (Figure 4 —figure supplement 2). We found that phenology QTNs were relevant in supporting GF prediction, although they could not suffice in explaining the geographic distribution of teff diversity (Figure 4 —figure supplement 3). The importance of phenology in teff adaptation was already reported (Woldeyohannes et al., 2020), although it may be confounded by underlying population structure. On chromosome 1A at 32.3 Mb, precipitation of driest month (bio14) and seasonality of precipitations (bio15) identify a QTN (Figure 5 —figure supplement 1; Figure 5 —figure supplement 2), whose local LD block harbors 176 gene models, among which four with predicted proteins with high homology with maize and Arabidopsis proteins (Supplementary file 1F).”

Reviewer #3:

The authors have done a nice job including more information about GxE, genetic variance, etc.

I am still not convinced of the GWA results for DM and PC2_bio, which show a very large deviation from the expected p-value distribution. The abundance of false positives cannot merely be dismissed with the authors' statement that "some of the associated traits…showed some statistical inflation on QQ plots likely contributed by residual population structure". A large proportion of all SNPs tested were deemed significant (assuming that the significance cut-off was approximately -log10p = 3.5) for DM and PC2_bio. I would suggest that the authors use a secondary threshold based on a visual assessment of the QQ plots. For DM, I would suspect that the ~5 SNPs with -log10p > 4.8 are truly significant (QTL on chromosome 6A). Similarly for PC2_bio, the 2 SNPs at the tail of the distribution are likely significant.

Thank you for rising this issue. We used your valuable suggestion of assessing QQ plots visually and decided to re-run GWAS with more covariates for traits showing inflation. We changed the manuscript accordingly, updating supplementary materials thanks to novel analyses as detailed in the response to Rev#1.

With respect to the gender aspects of this manuscript, I would recommend including references from similar work done by the NextGen Cassava project. Here are two examples:

https://doi.org/10.1007/s12231-018-9421-7

https://doi.org/10.1002/csc2.20152

Thank you for suggesting these papers which strongly relate to our work, we have included both in the manuscript text here:

“Rich NUS agrobiodiversity is still conserved in situ in smallholder agriculture systems, where the selection and cultivation conducted by local farmers resulted in the stabilization of farmer varieties largely untapped by breeding which could provide useful adaptation traits (Iragaba et al., 2020).”

and here:

“Studies across smallholder farming systems in Africa showed that gender-differentiated roles result in different trait preferences: both in cassava (Teeken et al., 2018) and in maize (Voss et al., 2021) women are mostly concerned with use traits, while men prioritize agronomic trats.”

https://doi.org/10.7554/eLife.80009.sa2

Article and author information

Author details

  1. Aemiro Bezabih Woldeyohannes

    1. Center of Plant Sciences, Scuola Superiore Sant'Anna, Pisa, Italy
    2. Amhara Regional Agricultural Research Institute, Bahir Dar, Ethiopia
    Contribution
    Resources, Data curation, Formal analysis, Investigation, Writing – original draft
    Contributed equally with
    Sessen Daniel Iohannes
    Competing interests
    No competing interests declared
  2. Sessen Daniel Iohannes

    Center of Plant Sciences, Scuola Superiore Sant'Anna, Pisa, Italy
    Contribution
    Resources, Data curation, Software, Formal analysis, Visualization, Writing – original draft
    Contributed equally with
    Aemiro Bezabih Woldeyohannes
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7831-8997
  3. Mara Miculan

    Center of Plant Sciences, Scuola Superiore Sant'Anna, Pisa, Italy
    Contribution
    Resources, Data curation, Software, Formal analysis, Investigation, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9884-5727
  4. Leonardo Caproni

    Center of Plant Sciences, Scuola Superiore Sant'Anna, Pisa, Italy
    Contribution
    Formal analysis, Investigation, Visualization, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7129-8575
  5. Jemal Seid Ahmed

    Center of Plant Sciences, Scuola Superiore Sant'Anna, Pisa, Italy
    Contribution
    Software, Formal analysis, Validation, Writing – review and editing
    Competing interests
    No competing interests declared
  6. Kauê de Sousa

    1. Digital Inclusion, Bioversity International, Montpellier, France
    2. Department of Agricultural Sciences, Inland Norway University of Applied Sciences, Hamar, Norway
    Contribution
    Data curation, Investigation, Visualization, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7571-7845
  7. Ermias Abate Desta

    Amhara Regional Agricultural Research Institute, Bahir Dar, Ethiopia
    Contribution
    Conceptualization, Supervision, Project administration, Writing – review and editing
    Competing interests
    No competing interests declared
  8. Carlo Fadda

    Biodiversity for Food and Agriculture, Bioversity International, Nairobi, Kenya
    Contribution
    Conceptualization, Supervision, Funding acquisition, Project administration, Writing – review and editing
    Competing interests
    No competing interests declared
  9. Mario Enrico Pè

    Center of Plant Sciences, Scuola Superiore Sant'Anna, Pisa, Italy
    Contribution
    Conceptualization, Supervision, Project administration, Writing – review and editing
    Competing interests
    No competing interests declared
  10. Matteo Dell'Acqua

    Center of Plant Sciences, Scuola Superiore Sant'Anna, Pisa, Italy
    Contribution
    Conceptualization, Data curation, Supervision, Validation, Visualization, Methodology, Writing – original draft, Project administration
    For correspondence
    m.dellacqua@santannapisa.it
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-5703-8382

Funding

Scuola Superiore Sant'Anna

  • Aemiro Bezabih Woldeyohannes

The funders had no role in study design, data collection, and interpretation, or the decision to submit the work for publication.

Acknowledgements

We wish to thank the men and women farmers who took part to the research, evaluating hundreds of genetic materials with great enthusiasm and competence in Adet and Akaki. These are their names in alphabetical order: Abebech, Adisse Abera, Adugawu Kere, Alebachew, Angoachi Fente, Askale, Ayana, Belachew Adimasu, Belay, Birtukan Ayele, Birhanu, Genanaw Dejene, Getasil, Habtamu Belay, Mame Seyum, Mamo, Misganaw Dagnaw, Regassa, Sefiwu Kassahun, Shashe, Tenagne Wubet, Tesfa Kefyalew, Tihune Dires, Tirusew, Tsehay Desie, Workinesh, Workineh Tsega, Wubet, Wubitu, Yechale Ayalneh, Yekoye, Yeniguse Worku, Yeniguse Wuletaw, Zebu Kebede, and Zina Yitay. We thank the Ethiopian Biodiversity Institute (EBI) for providing seed material and laboratory space to perform the DNA extraction in Ethiopia. Many thanks to Dr. Eleni Shiferaw, Dr. Basazen Fantahun Lakew, and Dr. Yosef Gebrehawairat Kidane for coordinating local support at EBI. We are grateful to the reviewers for constructive criticism and help in improving the manuscript. Funding was provided by the Doctoral School in Agrobiodiversity at Scuola Superiore Sant’Anna.

Senior Editor

  1. Detlef Weigel, Max Planck Institute for Biology Tübingen, Germany

Reviewing Editor

  1. Kelly Swarts, Gregor Mendel Institute of Molecular Plant Biology, Austria

Reviewer

  1. Bela Teeken, CGIAR, Nigeria

Publication history

  1. Preprint posted: August 28, 2021 (view preprint)
  2. Received: May 5, 2022
  3. Accepted: August 8, 2022
  4. Version of Record published: September 2, 2022 (version 1)

Copyright

© 2022, Woldeyohannes, Iohannes et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 536
    Page views
  • 111
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Aemiro Bezabih Woldeyohannes
  2. Sessen Daniel Iohannes
  3. Mara Miculan
  4. Leonardo Caproni
  5. Jemal Seid Ahmed
  6. Kauê de Sousa
  7. Ermias Abate Desta
  8. Carlo Fadda
  9. Mario Enrico Pè
  10. Matteo Dell'Acqua
(2022)
Data-driven, participatory characterization of farmer varieties discloses teff breeding potential under current and future climates
eLife 11:e80009.
https://doi.org/10.7554/eLife.80009
  1. Further reading

Further reading

    1. Genetics and Genomics
    Magnus Hallas-Møller, Katja S Johansen
    Insight

    How does a protein at the cell wall determine if a newly encountered fungus is safe to fuse with?

    1. Genetics and Genomics
    Courtney J Smith, Nasa Sinnott-Armstrong ... Jonathan K Pritchard
    Research Article Updated

    Pleiotropy and genetic correlation are widespread features in genome-wide association studies (GWAS), but they are often difficult to interpret at the molecular level. Here, we perform GWAS of 16 metabolites clustered at the intersection of amino acid catabolism, glycolysis, and ketone body metabolism in a subset of UK Biobank. We utilize the well-documented biochemistry jointly impacting these metabolites to analyze pleiotropic effects in the context of their pathways. Among the 213 lead GWAS hits, we find a strong enrichment for genes encoding pathway-relevant enzymes and transporters. We demonstrate that the effect directions of variants acting on biology between metabolite pairs often contrast with those of upstream or downstream variants as well as the polygenic background. Thus, we find that these outlier variants often reflect biology local to the traits. Finally, we explore the implications for interpreting disease GWAS, underscoring the potential of unifying biochemistry with dense metabolomics data to understand the molecular basis of pleiotropy in complex traits and diseases.