Data-driven, participatory characterization of farmer varieties discloses teff breeding potential under current and future climates

  1. Aemiro Bezabih Woldeyohannes
  2. Sessen Daniel Iohannes
  3. Mara Miculan
  4. Leonardo Caproni
  5. Jemal Seid Ahmed
  6. Kauê de Sousa
  7. Ermias Abate Desta
  8. Carlo Fadda
  9. Mario Enrico Pè
  10. Matteo Dell'Acqua  Is a corresponding author
  1. Center of Plant Sciences, Scuola Superiore Sant'Anna, Italy
  2. Amhara Regional Agricultural Research Institute, Ethiopia
  3. Digital Inclusion, Bioversity International, France
  4. Department of Agricultural Sciences, Inland Norway University of Applied Sciences, Norway
  5. Biodiversity for Food and Agriculture, Bioversity International, Kenya


In smallholder farming systems, traditional farmer varieties of neglected and underutilized species (NUS) support the livelihoods of millions of growers and consumers. NUS combine cultural and agronomic value with local adaptation, and transdisciplinary methods are needed to fully evaluate their breeding potential. Here, we assembled and characterized the genetic diversity of a representative collection of 366 Ethiopian teff (Eragrostis tef) farmer varieties and breeding materials, describing their phylogenetic relations and local adaptation on the Ethiopian landscape. We phenotyped the collection for its agronomic performance, involving local teff farmers in a participatory variety evaluation. Our analyses revealed environmental patterns of teff genetic diversity and allowed us to identify 10 genetic clusters associated with climate variation and with uneven spatial distribution. A genome-wide association study was used to identify loci and candidate genes related to phenology, yield, local adaptation, and farmers’ appreciation. The estimated teff genomic offset under climate change scenarios highlighted an area around lake Tana where teff cropping may be most vulnerable to climate change. Our results show that transdisciplinary approaches may efficiently propel untapped NUS farmer varieties into modern breeding to foster more resilient and sustainable cropping systems.

Editor's evaluation

Teff (Eragrostis tef), a small-market domesticate native and commonly grown in Ethiopia and the Horn of Africa, is comprehensively characterized for genetic, ecological and phenotypic variation in this ambitious and interdisciplinary publication. Integration of small holder farmers in phenotyping the collection, with an emphasis on gender considerations, elevates the characterization of Ethiopian teff. This paper provides a solid foundation to accelerate teff breeding for a changing climate, and provides an excellent model for the characterization of novel and underused crops.

eLife digest

Small farms support the livelihoods of about two billion people worldwide. Smallholder farmers often rely on local varieties of crops and use less irrigation and fertilizer than large producers. But smallholdings can be vulnerable to weather events and climate change. Data-driven research approaches may help to identify the needs of farmers, taking into account traditional knowledge and cultural practices to enhance the sustainability of certain crops.

Teff is a cereal crop that plays a critical role in the culture and diets of Ethiopian communities. It is also a super food appreciated on international markets for its nutritional value. Rural smallholder farmers in Ethiopia rely on the crop for subsistence and income and make up the bulk of the country’s agricultural system. Many grow local varieties with tremendous genetic diversity. Scientists, in collaboration with farmers, could tap that diversity to produce more productive or climate-resilient types of teff, both for national and international markets.

Woldeyohannes, Iohannes et al. produced the first large-scale genetic, agronomic and climatic study of traditional teff varieties. In the experiments, Woldeyohannes and Iohannes et al. sequenced the genomes of 366 Ethiopian teff varieties and evaluated their agronomic value in common gardens. The team collaborated with 35 local farmers to understand their preference of varieties and traits. They then conducted a genome-wide association study to assess the crops’ productivity and their adaptations to local growing conditions and farmer preferences. Genetic changes that speed up teff maturation and flowering time could meet small farmers’ needs to secure teff harvest. Woldeyohannes, Iohannes et al. also identified a region in Ethiopia, where local teff varieties may struggle to adapt to climate change. Genetic modifications may help the crop to adapt to frequent droughts that may be a common characteristic of future climates.

The experiments reveal the importance of incorporating traditional knowledge from smallholder farmers into data-driven crop improvement efforts considering genetics and climate science. This multidisciplinary approach may help to improve food security and protect local genetic diversity on small farms. It may also help to ensure that agricultural advances fairly and equitably benefit small farmers.


Large-scale, high-yielding cropping systems rely on a remarkable small set of crops. Approximately half of the global farming land is devoted to maize, wheat, rice, and soybean (FAOSTAT, 2021) and the overall composition of worldwide food systems is rather uniform (Khoury et al., 2014). Yet, hundreds of neglected or underutilized species (NUS) are still actively cultivated in highly diversified, small-scale cropping systems, where they support the livelihoods of millions of people (Jamnadass et al., 2020). NUS benefited of scant research and breeding improvement. Their diversity is not only a proxy of pedoclimatic diversification of cropping systems, but also reflects socioeconomic diversity and cultural heritage of local farmers (Tadele, 2019). Rich NUS agrobiodiversity is still conserved in situ in smallholder farming systems, where the selection and cultivation conducted by local growers resulted in untapped local varieties that could provide useful adaptive traits (Iragaba et al., 2020). A comprehensive, transdisciplinary characterization of NUS farmer varieties that takes into consideration diversity, adaptation, and farmer–consumer preferences may unlock the potential of NUS for the sustainable intensification of farming systems (Dawson et al., 2018).

Crop scientists and breeders can now leverage the big data revolution to bridge the gap between NUS and 21st century agriculture (Dawson et al., 2019). Genomic tools enable a rapid and cost-effective characterization of large germplasm collections to unlock agrobiodiversity for breeding (Poland, 2015), identify genetic factors responsible for traits of agronomic interest (Mascher et al., 2019), and accelerate genetic gains (Juliana et al., 2019). Genomic data can be combined with climatic data to gain insights into locus-specific adaptation (Lasky et al., 2015) and to estimate genomic vulnerability under climate change scenarios (Aguirre-Liguori et al., 2021). Data-driven methods can also be applied to characterize the socioeconomic contexts in which crops are grown (van Etten et al., 2019a; van Etten et al., 2019b), generating insights that are critical to understand cropping dynamics in smallholder farming systems (Gomez y Paloma et al., 2020; Iragaba et al., 2020; Terlau et al., 2019). The agrobiodiversity maintained by farmers is the result of interactions between genetic, environmental, and societal factors, and has a large potential for varietal innovation when farmer preferences are factored into the selection process (Mokuwa et al., 2014). Participatory varietal selection (PVS) approaches have been developed to directly involve farmers in the evaluation of breeding materials (Ceccarelli and Grando, 2007). Previous studies showed that data-driven PVS can be combined with genomic data to identify genomic loci responsible for farmers’ appreciation (Kidane et al., 2017) and model local crop performance in farmer fields (de Sousa et al., 2021).

The Ethiopian highlands are a paradigm of challenging agricultural ecosystems where NUS farmer varieties are widely cultivated. In Ethiopia, 85 million people live in rural areas; most of them are subsistence-based smallholder farmers who are responsible for about 90% of the cultivated land and agricultural output of the country (Bachewe and Taffesse, 2018). Teff (Eragrostis tef) is an annual, self-pollinating, and allotetraploid grass of the Chloridoideae subfamily (Ketema, 1997). It is widely grown in Ethiopia as a staple crop, where it is valued for its nutritional and health benefits, resilience in marginal and semi-arid environments and cultural importance (D’Andrea, 2008; Ketema, 1997). Teff was likely domesticated in the northern Ethiopian Highlands from the allotetraploid Eragrostis pilosa, but the timing of its initial cultivation and the identity of its diploid ancestors remain poorly understood (D’Andrea, 2008; Ingram and Doyle, 2003; VanBuren et al., 2020). Nowadays, hundreds of local landraces are grown in Ethiopia in a wide range of agroecologies, displaying broad environmental adaptation and phenotypic diversity (Woldeyohannes et al., 2020). Breeding efforts on teff have been underway for decades, and segregant and mutagenized populations are available (Cannarozzi et al., 2018). Yet, teff yields today remain much lower than potentially attainable and substantially lower than those of other cereals grown in the region, and the full breeding potential of teff farmer varieties is still undisclosed (Girma et al., 2014; Woldeyohannes et al., 2020). Research in teff is rapidly evolving; a draft genome sequence (Cannarozzi et al., 2014) and a high-quality genome sequence (VanBuren et al., 2020) recently brought this species into the international genomics research spotlight. A data-driven research approach may unlock the full potential of teff agrobiodiversity and propel breeding for new improved varieties that meet local needs.

Here, we report a transdisciplinary data-driven approach to characterize NUS genetic, agronomic, and climatic diversity, using teff as a case study. We selected 321 teff farmer varieties derived from landraces and 45 teff improved lines, and we genotyped them with genome-wide molecular markers. Concurrently, we characterized their agronomic performance at two locations in Ethiopia. Experienced teff farmers (15 women and 20 men) were asked to evaluate the teff genotypes, providing qualitative information that enabled us to prioritize better adapted genetic materials. Additionally, we derived current and projected climate data at the sampling locations of teff accessions and used them to estimate genomic offset under climate change scenarios. We combined all these data in a genome-wide association study (GWAS) framework to identify genomic loci and candidate genes with relevance for adaptation, performance, and farmers’ preferences. We discuss the potential of data-driven participatory approaches to characterize NUS diversity and to disclose their potential for the sustainable intensification of farming systems.

Results and discussion

Teff farmer varieties harness broad genetic diversity

We assembled a representative collection of teff cultivated in Ethiopia, hereafter named Ethiopian Teff Diversity Panel (EtDP). The EtDP comprises 321 farmer varieties spanning the entire geographical and agroecological range of teff cultivation in Ethiopia, from the submoist lowlands of Tigray in the North to the moist lowlands of Oromia in the South, and from the subhumid lowlands of Benishangul and Gumuz in the West to the subhumid mid-highlands of Oromia in the East (Figure 1—figure supplement 1, Supplementary file 1A; MoA, 2000). The EtDP also includes 45 improved varieties released since the first breeding efforts on teff and up to the moment of the EtDP assembly. A selection of seven Eragrostis spp., putative wild relatives of teff, was included as an outgroup. The genomic diversity of the EtDP was assessed starting from 12,153 high-quality, genome-wide single-nucleotide polymorphisms (SNPs) that were derived from double digest restriction site-associated sequencing (ddRAD-seq) of individual accessions, and were pruned for linkage disequilibrium (LD). The EtDP genomes could be grouped in 1240 haplotype blocks (Supplementary file 1B). EtDP chromosomes showed consistently higher pericentromeric LD, with localized LD peaks in telomeric regions (Figure 1—figure supplement 2). The A and B subgenomes showed comparable yet different LD profiles, possibly due to their specificity in terms of dominance, transposable elements content, and overall limited homoeologous exchange (VanBuren et al., 2020).

Population structure analysesshowed that EtDP accessions displayed varying degrees of genetic admixture (Figure 1A) and could be best summarized by 10 discriminant analysis of principal component (DAPC) clusters (Figure 1—figure supplement 3). About 15% of the genetic variability in the EtDP collection could be explained by the first three principal components (PCs) of LD-pruned SNP data (Figure 1B, C). Teff accessions from Tigray, in Northern Ethiopia, were markedly separated from the rest, and mainly belonged to cluster 7 (Figure 1D). Teff breeding lines showed a narrower genetic base compared to farmer varieties and could not summarize the broad diversity available within the EtDP (Figure 1—figure supplement 4). They belonged predominantly to clusters 2, 8, and 10 (Figure 2D, E). Genetic clusters are a proxy of teff landraces diversity and may support breeding efforts by improving the identification of parent lines for genomic selection to counter the depletion of allelic diversity (Heffner et al., 2009), or establishing breeding groups to explore heterotic potential in teff (Boeven et al., 2016). The preservation and valorization of teff genetic clusters are instrumental to respond to future breeding needs in light of changing climates and dynamic consumer preferences (Woldeyohannes et al., 2020; Araya et al., 2011).

Figure 1 with 6 supplements see all
Genetic diversity of teff in Ethiopia.

(A) ADMIXTURE results for the pruned single-nucleotide polymorphisms (SNPs) dataset. Each vertical bar represents an individual, colored according to one of the 20 groups reported by the analysis. Bars are ordered according to the 10 genetic clusters identified by discriminant analysis of principal component (DAPC), as reported by numbers on the x-axis. (B, C) Principal component analysis of genome-wide SNPs. Taxa are colored according to their DAPC genetic cluster, as indicated in the legend. About 10.98% of the genetic diversity in the panel can be explained by the first two principal components, which clearly separate cluster 7 from clusters 2 and 4. Open and close circles represent farmer varieties and improved varieties, respectively. (D) Distribution of Ethiopian Teff Diversity Panel (EtDP) georeferenced landraces (N = 314) across the altitudinal map of Ethiopia, color coded as in panel (B). (E) Altitude distribution across the DAPC genetic clusters,, with letters on top of boxplots denoting significance levels based on a pairwise Wilcoxon rank sum test and Bonferroni correction for multiple testing. (F) Distribution of genetic clusters across agroecological zones of Ethiopia, with color coding as in panel B. SM2, warm submoist lowlands; SM3, tepid submoist mid-highlands; SM4, cool submoist mid-highlands; M2, warm moist lowlands; M3, tepid moist mid-highlands; M4, cool moist mid-highlands; SH2, warm subhumid lowlands; SH3, tepid suphumid mid-highlands; SH4, cool subhumid mid-highlands; H3, tepid humid mid-highlands; H4, cool humid mid-highlands. This figure has six figure supplements.

Figure 2 with 2 supplements see all
Teff diversity on the landscape.

(A) Principal component analysis of bioclimatic diversity in the Ethiopian Teff Diversity Panel (EtDP). Dots represent teff farmer varieties belonging to genetic clusters, colored according to legend. Vectors represent the scale, verse, and direction of bioclimatic drivers of teff differentiation. (B) Linear regression of Fst values in relation to geographic distance of accessions in the EtDP. Accessions were grouped by local district of sampling. (C) Linear regression of Fst values in relation to environmental distance of accessions, also grouped by district of sampling. (D) Pairwise Fst values between teff accessions grouped by local districts of sampling, as in (C) and (D). Local districts, that is subregional groups, are ordered by administrative regions according to legend. This figure has two figure supplements.

The distribution of teff genetic variation is associated with geographic and environmental factors

Landraces evolve at the interface of natural and anthropogenic selection (Casañas et al., 2017). Hence, we hypothesized that teff genetic clusters might be associated with local environmental conditions. The EtDP showed limited genetic stratification, as evidenced by the high levels of admixture and low genetic variance explained by the principal components analysis (PCA) (Figure 1—figure supplement 5; Figure 1B, C). Yet, some genetic clusters could be associated to agroecological zones characterized by different temperature and precipitation conditions (Figure 1—figure supplement 6; Figure 3A, B; Figure 3—figure supplement 1). Cluster 4 and 7 are cultivated in warmer sampling locations, while cluster 6 comes from colder and wetter areas (Figure 3—figure supplement 1). Extant landraces diversity is not only contributed by climate, but also by seed circulation. Studies have shown that cultural and social dynamics, such as belonging to similar or different ethnolinguistic groups, are key factors in shaping seed exchange networks (Labeyrie et al., 2016; Samberg et al., 2007). In Ethiopia, regional districts are markers of cultural and historical diversity, and smallholder farmers have limited capacity for long-distance travel. Thus, we hypothesized that regional distinctions could impact seed exchanges and spatial patterns of teff genetic diversity.

Figure 3 with 2 supplements see all
Phenotypic diversity in the Ethiopian Teff Diversity Panel (EtDP).

(A) Pearson’s correlations between agronomic traits (y-axis) and farmer preference traits, by gender (x-axis). Correlation values are expressed in color shades, as indicated in the the legend. (B, C) Top ranking genotypes (90th percentile of the OA distribution) selected by men (blue) and women (purple) farmers, overlaid to the principal component analysis (PCA) of the EtDP genetic diversity (B) and phenotypic diversity (C). Genotypes that were not selected are reported as gray dots. When men and women select the same genotype, the corresponding point appears in dark violet. Trait distribution across discriminant analysis of principal component (DAPC) clusters, for overall appreciation (D) and panicle length (E). Farmer varieties are represented by open circles, improved lines are represented by full circles. (F) Alluvial plot reporting the consistency of farmers’ choice by quartiles of the OA distribution. Each vertical bar represents a combination of location (Adet, Akaki) and gender (M, W). EtDP accessions are ordered on the y-axis according to their OA score in each combination. Alluvial flows are colored according to OA quartiles combined across gender and across location according to the legend (q1, q2, q3, and q4). (G) Venn diagram reporting farmer varieties having values superior to the 75th percentile of the trait distribution of improved varieties (lower than the 25th percentile in the case of GFP). Each area of the Venn diagram reports the corresponding number of farmer varieties, as in the legend. DH, days to heading; DM, days to maturity; PH, plant height; PL, panicle length; PBPM, number of primary branches per main shoot panicle; TT, total tillers; CLF, first culm length; CDF, first culm diameter; PW, panicle weight; PY, panicle yield; GY, grain yield; BY, biomass yield; HI, harvest index; GFP, grain filling period; GFR, grain filling rate; BPR, biomass production rate; OA, overall appreciation; PA, panicle appreciation. This figure has two figure supplements.

The district of sampling of teff landraces in the EtDP was used to aggregate accessions and calculate the fixation index (Fst) as a measure of genetic distances. Although our experimental design does not allow to fully untangle the effect of geography, environment, and social factors, we found a pattern of genetic variation distribution that could be associated to an isolation by distance and adaptation process, where Fst values are significantly associated with geographic distance (Mantel r = 0.31; p = 9e−04) and environmental distance (Mantel r = 0.352; p = 0.0137; Figure 3C, D). Accessions from East Tigray (Misraquawi) showed the highest separation from the collection, followed by East Oromia (Misraq Harerge) and West Amhara (Agew Awi; Figure 3E). Tigray is believed to be the center of teff domestication, as earlier reports identified there much of its diversity (Costanza et al., 2007) and archaeobotanical excavations reported its cultivation there since the Pre Aksumite period (before I Century CE; D’Andrea, 2008). However, limited archeological information is available from other parts of the country, which may conceal other sites of early domestication of teff. When integrating the putative teff wild relatives Eragrostis curvula and E. pilosa in our teff landraces’ phylogeny, we found that they did not group with any specific DAPC cluster (Figure 3—figure supplement 2).

Participatory evaluation of the teff diversity prioritizes genetic materials for breeding

The EtDP was phenotyped for agronomic and farmer appreciation traits during the main cropping season in two locations in Ethiopia (Figure 1D) representing areas with high potential for teff cultivation. Experienced teff farmers including 15 women and 20 men provided their evaluations on overall appreciation (OA) and panicle appreciation (PA) across the EtDP. Women and men farmers were selected because they were both teff growers, and their responses were analyzed separately to highlight distinct trait evaluation criteria. Previous studies conducted in Ethiopia showed that agricultural traits relevant to marketability, food and drink preparations, animal feed, and construction are all perceived differently by women and men farmers (Assefa et al., 2014; Mancini et al., 2017). Regardless of gender, farmer evaluations were highly heritable and fully comparable to agronomic traits commonly targeted by breeding (Supplementary file 1C). OA provided by farmers across genders and across locations had a broad-sense heritability (H2) of 0.81 (Table 1). H2 of grain yield combined across the same two locations was 0.42 (Table 2). A significant portion of the variance for production traits could be explained by differences in location, including biomass (75%) and yield (67%). Conversely, phenology was mostly explained by genotype: this is the case of days to heading (92%) and days to maturity (86%; Supplementary file 1C). Depending on the trait, we found different effects of genetic background, location, and gender, a proxy of genotype by environment (G × E) interactions in determining teff performance and appreciation in tested locations. Genetic clusters had a significant effect on all traits, and location was important for yield and yield components (Supplementary file 1D). PA, plant height, panicle length, culm diameter, and panicle weight were affected by the interaction of location and genetic cluster. Gender and gender by location interactions always had significant effects for PVS traits (Supplementary file 1D).

Table 1
Heritabilities (H2) for farmers’ participatory variety selection traits.

H2 values are given for each trait, type, and location combination. PA, panicle appreciation; OA, overall appreciation; M, men; W, women; ALL, measures combined by either type or location.

Table 2
Heritabilities (H2) for agronomic traits.

H2 values are given for each trait, and location combination. DH, days to heading; DM, days to maturity; PH, plant height; PL, panicle length; PBPM, number of primary branches per main shoot panicle; TT, total tillers; CLF, first culm length; CDF, first culm diameter; PW, panicle weight; PY, panicle yield; GY, grain yield; BY, biomass yield; HI, harvest index; GFP, grain filling period; GFR, grain filling rate; BPR, biomass production rate; ALL, measures combined by location.


The high H2 achieved by farmers’ OA may be because farmers, in providing their overall evaluation, not only consider yield but also yield components with high heritability (Table 3). Farmers’ appreciation of teff genotypes was strongly associated with yield (p < 0.001) and its components regardless of gender, most notably plant height (p < 0.001) and grain filling period (GFP) (p < 0.05) (Figure 2A, Figure 2—figure supplement 1, Table 3). Plant height and GFP were very important for men, while women OA was highly correlated with biomass yield and panicle weight (Table 3). Different dynamics of trait prioritization by men and women are expected, and may reflect gender roles in the value chain (Weltzien et al., 2019). Studies across smallholder farming systems in Africa showed that gender-differentiated roles in agriculture may result in different trait preferences. In Ethiopian wheat, men are mostly endowed with field work, while women are concerned with marketability of grains (Mancini et al., 2017). Both in cassava (Teeken et al., 2018) and maize (Voss et al., 2021), women are mostly concerned with use traits, while men prioritize agronomic trats. The underlying causes of gender differentiation in teff trait preference need to be further characterized, also in relation to the multipurpose teff cultivation as food and feed. Still, men and women participating in the PVS were all expert teff growers, as reflected by the consistency of their evaluations (Figure 2). Previous studies showed that farmers’ appreciation has a genetic basis in the evaluated crop, and may be used to perform genomic prediction (de Sousa et al., 2021) and identify genome-wide associations (Kidane et al., 2017).

Table 3
Plackett–Luce estimates from farmer’s overall appreciation (OA) of genotypes associated with genotypes’ agronomic metrics, DM, days to maturity; PH, plant height; PW, panicle weight; GY, grain yield; BY, biomass yield; GFP, grain filling period.

The rankings were analyzed for the whole group (All) and in subsets among gender to assess differences in traits linkages within men and women farmers. * 0.05 < p < 0.01, ** 0.01 < p < 0.001, *** p < 0.001.

GroupEstimateStd. errorz valuePr(>|z|)

The top ranking teff accessions according to men and women farmers captured different genetic backgrounds (Figure 2B), but the same trait combinations (Figure 2C). This suggested that farmers were consistently preferring the same teff ideotype regardless of their gender and regardless of the genetic background and geographic provenance of the accession. High yielding, high biomass, and fast maturing varieties were preferred. Individually, teff improved varieties showed high OA and high panicle length, which is a key component of OA (Figure 2D, E). This is supported by the fact that the evaluation fields were high-potential areas for teff cultivation. Genetic cluster 2, that is representative of most breeding materials, is associated with longer days to heading and maturity, higher plant heights and panicle lengths, greater number of total tillers and higher yields (Figure 2—figure supplement 2). Still, several traditional farmer varieties from different genetic backgrounds recorded comparable, at times superior, performance than improved lines (Figure 2G). By selecting farmer varieties that outperform improved varieties’ performances in target breeding traits, it may be possible to prioritize landrace accessions for teff improvement (Figure 2D, E) or even immediately make them available to farmers, as suggested by previous experiences in wheat (Fadda et al., 2020). In areas exposed to terminal drought, common in Ethiopia, short maturation time is paramount to achieve harvestable yield (Mengistu and Mekonnen, 2012), and it is expectedly a major component of farmers’ OA (Figure 2A). The time of maturation is therefore an obvious target trait for teff breeding, although challenging to be combined with higher attainable yield. We found that many farmer varieties had a shorter GFP than most improved lines, and that genotype EBI 9551 combined this trait with higher yield and farmers’ appreciation (Figure 2G).

Participatory, climatic, and agronomic information to support teff breeding

The data deriving from the transdisciplinary characterization of the EtDP was integrated in a GWAS framework to identify genomic loci associated with agronomic performance, local adaptation, and farmer preferences. These loci could be then used in marker assisted selection, genomic selection, or new breeding technologies to support teff improvement. The GWAS led to the identification of a total of 91 unique quantitative trait nucleotides (QTNs) (Supplementary file 1E). Of these, 43 QTNs were associated with bioclimatic variables at sampling sites. Given the low-density genotyping, we did not expect causal loci in the SNP set, yet it is possible to leverage local LD to target candidate genes in the vicinity of associations. Both in the case of trait values and bioclimatic variables, associations may be confounded by underlying LD structure driven by drift processes contributing to teff landraces’ differentiation. Teff genetic clusters (Figure 1) show different distributions in regards to bioclimatic variables and traits values, including phenology, suggesting that the expression of adaptive traits may be confounded by underlying structure (Figure 3—figure supplement 2, Figure 2—figure supplement 2). The GWAS analysis was run with varying numbers of genetic covariates, so to optimize model fit in consideration of trait-genotype covariance. Visual assessment of QQ plots was used as a guidance to interpret significant associations, and a stringent threshold was employed so to minimize Type II errors (Figure 5—figure supplement 2).

We then focused on LD blocks associated to QTNs to identify homology of predicted teff proteins with protein sequences of Arabidopsis thaliana and Zea mays (Supplementary file 1F). Among others, we identified a locus on chromosome 2A that was associated with grain yield, grain filling rate and OA (lcl|2A-14415768) (Figure 5; Figure 5—figure supplement 2). The LD region targeted by this QTN harbors 39 gene models including two 60s ribosomal subunits and several homologs of maize genes with suggestive function in relation to yield determination. Et_2A_015515, in this block, is a homolog of a maize serine/threonine protein kinase 3, belonging to a broad class of proteins that was associated with inflorescence development (McSteen et al., 2007) and grain yield (Jia et al., 2020) in maize. These findings, although preliminary, may support the development of markers for teff improvement as well as directing local and international research toward loci of relevance for teff improvement.

We further explored the EtDP data using a gradient forest (GF) machine-learning algorithm to calculate the contribution of environmental gradients to genomic variation in teff, and to estimate its genomic offset, or vulnerability, under projected climates. We found that the turnover of allele frequencies across the cropping area was best predicted by geographic variables (Moran’s eigenvector map variables, MEM), confirming the importance of nonadaptive processes in teff differentiation and the resulting relevance of structure in interpreting associations. MEMs were sided by precipitation indicators, particularly precipitation of the coldest quarter (bio19) and precipitation of the wettest month (bio13; Figure 4—figure supplement 1). The GF allowed us to model climate-driven genomic variation patterns across the landscape (Figure 4A, B). Approximately a quarter of the SNPs (3,049 in 521 LD blocks) were predictive of the GF model: of these, 176 showed Fst values in the 99th percentile of the distribution (Supplementary file 1G).

Figure 4 with 7 supplements see all
Teff genomic offset.

(A) Geographic distribution of climate-driven allelic variation under current climates across the teff cropping area, with colors representing the three principal component (PC) dimensions reported in panel (B). (C) Genomic vulnerability across the teff cropping area based on the RCP8.5 climate projections. The color scale indicates the magnitude of the mismatch between current and projected climate-driven turnover in allele frequencies according to legend. Phenotyping locations are shown with yellow diamonds. This figure has seven figure supplements.

Figure 5 with 2 supplements see all
Manhattan plots reporting the genome-wide association study (GWAS) result for.

(A) Overall appreciation, (B), panicle length, (C) grain yield, and (D) grain filling period. On the x-axis, the genomic position of markers. The y-axis reports the strength of the association signal. Single-nucleotide polymorphisms (SNPs) are ordered by physical position and grouped by chromosome. Quantitative trait nucleotides (QTNs), for example SNPs surpassing a threshold based on a false discovery rate of 0.05, are highlighted in red. A strong signal on chromosome 2A matches across participatory varietal selection (PVS) and metric traits. This figure has two supplements.

The GF outcome was then linked to the GWAS to identify genomic loci associated to climate, agronomic performance, farmers’ preference, and adaptive potential of teff (Figure 4—figure supplement 2). We found that phenology QTNs were relevant in supporting GF prediction, although they could not suffice in explaining the geographic distribution of teff diversity (Figure 4—figure supplement 3). The importance of phenology in teff adaptation was already reported (Woldeyohannes et al., 2020), although it may be confounded by underlying population structure. On chromosome 1A at 32.3 Mb, precipitation of driest month (bio14) and seasonality of precipitations (bio15) identify a QTN (Figure 4—figure supplement 1; Figure 5—figure supplement 2), whose local LD block harbors 176 gene models, among which four with predicted proteins with high homology with maize and Arabidopsis proteins (Supplementary file 1F). The product of Et_1A_007229 is predicted to be homologous to a phosphatidylinositol kinase that is involved in flower development and has been shown to influence floral transition in condition of abiotic stress (Akhter et al., 2016). This locus is in the vicinity of a QTN for days to maturity, reinforcing this interpretation. Days to maturity was also associated with a LD block at 20.8 Mb on chromosome 6A (Figure 4—figure supplement 1; Figure 4—figure supplement 2), in the vicinity of associations for days to heading and for bioclimatic diversity (Supplementary file 1E). In this block, nine gene products share homology with Arabidopsis and maize (Supplementary file 1F). The protein encoded by Et_6A_046800 is homologous to an ETO1-like protein involved in the regulation of ethylene synthesis in Arabidopsis (Wang et al., 2004). Ethylene is a key plant hormone that has been shown to be related to spike development and senescence (Valluru et al., 2017). These candidate genes have not been validated yet, and their evaluation must be cautious especially for those signals that may be confounded by background genetic structure (Figure 5—figure supplement 2). With an increasing refinement of teff gene annotations, corroborated by reverse genetic approaches (Zhu et al., 2012), it will be possible to validate genes underlying traits of interest. Teff breeding could then fully benefit from targeted editing (Lemmon et al., 2018) to speed up the development of new varieties with improved yield, local adaptation and adherence to local preferences.

The genome-wide teff adaptive potential across the landscape varied in magnitude and distribution according to different predicted climate scenarios for 2070 (Figure 4—figure supplement 4). We used these data to compute the genomic-adaptive offset between current and future climate scenarios to identify vulnerable areas (Figure 4C, Figure 4—figure supplement 5). In all representative concentration pathways (RCPs), the highest offset was predicted in the north-western highlands of the Amhara region, south of lake Tana (Figure 4C, Figure 4—figure supplement 4). Compared to other regions of the country, we found a decreasing trend of rainfall change in this region across all emission scenarios (Figure 4—figure supplement 5; Figure 4—figure supplement 6). In this area, hot nights are projected to increase more quickly than hot days, with the most marked increases expected to be experienced in the July, August, September season (Figure 4—figure supplement 7). Decreasing trends of rainfall during the main growing season are predicted in all projected scenarios, suggesting that seasonality might critically impact teff development stages (Figure 4—figure supplement 6). In light of these results, a valid adaptation strategy could be the assisted migration of teff genetic backgrounds from areas of different vulnerability (Rhoné et al., 2020). However, crop migration and varietal replacement strategies will need to take into account ecological and socioeconomic factors, including the impacts on existing ecosystems and on farmers’ adoption of migrated varieties (Sloat et al., 2020).


A comprehensive interpretation of crop performance is key to a sustainable intensification that embraces cultural and agricultural diversity of cropping systems. While significant successes and even a plateau might have been reached in optimal growing environments where most common crops are cultivated, there is ample opportunity to enhance productivity in marginal growing environments (Godfray et al., 2010). The success of crop varieties is not only determined by yield performance, but also by adaptation to local environments and cultural needs (Weltzien et al., 2019). The integration of genomic, climatic, and phenotyping diversity in a participatory framework may help tailor varietal development for local adaptation. The involvement of farmers in varietal evaluation is increasingly utilized in a quantitative framework to guide breeding choices in combination with genomic data (Annicchiarico et al., 2019; de Sousa et al., 2021).

Transdisciplinary methods may support the integration of smallholder farmers in modern breeding and agricultural value chains. Modern data-driven research may then efficiently harness the genetic diversity generated in farmers’ fields to project NUS, such as teff, into modern breeding. Genebank genomics may be systematically used to map its large agrobiodiversity (Woldeyohannes et al., 2020), fully disclosing the opportunity to use next generation breeding technologies (Varshney et al., 2021) and large-scale genomic selection (Poland, 2015). Multilocation trials are needed to capture the range of G × E interactions that influence agronomic performance of teff, which we could only partially characterize here. Decentralized varietal evaluation approaches can be used to scale up the testing of teff genetic resources (de Sousa et al., 2021; van Etten et al., 2019a) to better capture G × E, produce new varieties with higher local adaptation, and resulting in higher farmers’ varietal adoption. An extensive socioeconomic characterization of teff cropping, including extensive farmers interviews (Labeyrie et al., 2016), may further unveil the dynamics of teff seed exchange and associated flow of information and knowledge related to local cropping (Occelli et al., 2021). Public sector breeding beyond NUS may further enhance the combination of data-driven research with participatory approaches to improve customer and product profiling to achieve social impact as cost effective as possible.

The Intergovernmental Panel on Climate Change (IPCC) reports indicate that East Africa will experience an increase in aridity and agricultural droughts, with a substantially higher frequency of hot days and nights (IPCC, 2017). Temperature increases are also expected to result in more intense heat waves and higher evapotranspiration rates, which, coupled with the altered rainfall patterns, may affect agricultural productivity. Enhancing NUS farmer varieties offers promising opportunities to tackle food insecurity resulting from climate change in smallholder farming settings and beyond. NUS have enormous untapped potential for improvement that is hampered by lack of tools and knowledge (Yerima and Achigan-Dako, 2021), but our analyses show that their characterization is at hand. As NUS proceed toward mainstream breeding, the collaborative effort of scientists, breeders, and farmers will unlock their full potential for sustainable intensification of farming systems.

Materials and methods

Plant materials

Request a detailed protocol

The EtDP used in this study was derived from a larger teff collection of 3850 accessions held at the Ethiopian Biodiversity Institute (EBI; Addis Abeba, Ethiopia), which represents the world’s largest active teff collection, and was amplified and characterized by Woldeyohannes et al., 2020. Criteria for selecting the EtDP from the larger EBI collection were the following: (1) visible morphological variation for panicle traits, (2) geographical and agroecological representativeness, (3) apparent grain yield potential, (4) presence of different maturity groups, and (5) presence of associated traditional names of specific landraces. The EtDP includes 321 farmer varieties, sided by all 45 E. teff improved varieties released since the beginning of teff breeding program in Ethiopia and until the assembly of the EtDP. Improved varieties were obtained by Ethiopian agricultural research centers. Seven accessions of teff wild relatives E. pilosa and E. curvula were also included in the collection (Supplementary file 1A). Landraces were purified selecting and reproducing one single panicle representative of each accession. The ITPGRFA defines a crop variety as ‘[…] defined by the reproducible expression of its distinguishing and other genetic characteristics’ (Ho, 2011). For the scope of this paper, we define farmer varieties as uniform genotypes derived from the purification of ex situ accessions (i.e., landraces) collected from local farmers. Farmer varieties are therefore a proxy of landraces originally collected in farmer fields and are discussed as such.

Sequencing and variant calling

Request a detailed protocol

Seeds of the EtDP were germinated in pots at the EBI in 2018, and at least three seedlings were harvested and pooled per accession. Genomic DNA was extracted from pooled seedlings at the EBI laboratories using the GenElute Plant Genomic DNA Miniprep Kit (Sigma-Aldrich, St. Louis, MO, USA) following the manufacturer’s instructions. DNA quality was checked by electrophoresis on 1% agarose gel and using a NanoDrop ND-1000 spectrophotometer and sent to IGATech (Udine, Italy) for sequencing. Genomic libraries were produced using SphI and MboI restriction enzymes in a custom protocol for the production of double digestion restriction site-associated DNA markers (ddRAD; Peterson et al., 2012). In short, ddRAD is based on a double restriction of the target DNA, typically using a rare cutter and a frequent cutter, followed by sequencing of restriction fragments to reduce the complexity of the target genome. ddRAD libraries were sequenced with V4 chemistry on Illumina HiSeq2500 sequencer (Illumina, San Diego, CA) with 125 cycles in a paired-end mode. Reads were demultiplexed using the process_radtags utility included in Stacks v2.0 (Catchen et al., 2013) and analyzed for quality control with the FastQC tool (v.0.11.5). High-quality paired-end reads of each individual were mapped against the E. tef reference genome (version 3, available from CoGe under ID 50954; VanBuren et al., 2020) with BWA (Burrows-Weeler-Aligner v.0.7.12) using the MEM algorithm with standard parameters (Li, 2013). Alignments were sorted and indexed with PicardTools ( and samtools (Li et al., 2009). Single-nucleotide variants were identified with GATK (McKenna et al., 2010) HaplotypeCaller algorithm (version 4.2.0), run in per-sample mode followed by a joint genotyping step completed by GenotypeVCFs tool. Raw variants were filtered out using the VariantFiltration and SelectVariants GATK functions with the following criteria: monomorphic or multiallelic sites, QUAL <30; QD <2.0; MQ <40.0; AF <0.01; DP <580; SNP clusters defined as three or more variants located within windows of 5 bp. For each accession, SNPs with a total read count of <3 were set to NA. Variants were discarded if located on unanchored contigs, InDels, missing data >20%, heterozygosity >15%.

Bioclimatic characterization

Request a detailed protocol

GPS coordinates of EtDP teff landraces were derived from EBI passport data and projected onto the map of Ethiopia using R/raster (Hijmans and Etten, 2012). Teff landraces were projected onto the agroecological zones map of Ethiopia provided by the Ethiopian Institute of Agricultural Research (EIAR; MoA, 2000), which subdivides Ethiopia into different zones according to altitudinal ranges and temperature and rainfall patterns. Altitudes were assigned to each landrace based on GPS coordinates, using the CGIAR SRTM database at 90 m resolution (Reuter et al., 2007). Current climate data (1970–2000 averages) relative to teff landraces’ sampling sites were retrieved from the WorldClim 2 database of global interpolated climate data (Fick and Hijmans, 2017) at the highest available spatial resolution of 2.5′ (approximately 4 × 4 km). Collinearity among historical bioclimatic variables was previously checked with the ensemble.VIF() function in R/BiodiversityR (Kindt and Coe, 2005). Only variables with a variation inflation factor (VIF) below 10 were retained, namely bio2, bio3, bio4, bio9, bio13, bio14, bio15, bio18, and bio19. The Hadley Centre Global Environmental Model 2-Earth System (HadGEM2-ES; Jones et al., 2011) under the fifth phase of the Coupled Model Intercomparison Project (CMIP5) protocols simulations was used to retrieve future climate scenarios at the following RCPs (RCP2.6, RCP4.5, RCP6.0, and RCP8.5).

Phenotyping and participatory variety selection

Request a detailed protocol

The EtDP was phenotyped in common garden experiments in two high-potential teff growing locations in Ethiopia, Adet (Amhara, 11°16′32″ N, 37°29′30″ E) and Akaki (Oromia, 8°50′07.6″ N, 38°49′58.3″ E), under rainfed conditions during the main cropping season of 2018 (July–November). Adet is the main experimental site of the Amhara Regional Agricultural Research Institute, at an altitude of 2240 meter above sea level. Soil type is vertisol, climate is moist cool, and average annual rainfall is 1250 mm. Akaki is a subsite of the Debre Zeit Agricultural Research Center, with an altitude of 2200 meter above sea level. Also in this case, soil type is vertisol, climate is moist cool and average annual rainfall is 1055 mm. Accessions were planted in two replications per site using an alpha lattice design, in plots consisting of three rows of 1 m in length and 0.2 m interrow distance. Three phenological traits, days to 50% heading (DH), days to 90% maturity (DM), and GFP were recorded on whole plots in each environment. Morphology and agronomic traits were recorded from five randomly selected teff plants per plot: plant height (PH, cm), panicle length (PL, in cm), number of primary branches per main shoot panicle (PBPM), number of total tillers (TT), first culm length (CLF, in cm), first culm internode diameter (CDF, in mm), panicle weight (PW, in g), panicle yield (PY, in g), grain yield (GY, ton/ha), biomass yield (BY, ton/ha), harvest index (HI), grain yield filling rate (GFR, kg/ha/day), and biomass production rate (BPR, kg/ha/days). Qualitative data were sourced from the characterization performed by Woldeyohannes et al., 2020.

A PVS was conducted in the two locations, involving 35 experienced teff farmers: 15 men and 10 women in Adet, five men and five women in Akaki. PVS was conducted close to physiological maturity in each location so to maximize variation between plots. Participating farmers were selected with the help of agricultural officers. They all had experience on teff production and were recognized as a local agricultural expert in teff cultivation. Being local farmers, they spoke different languages (Amharic and Oromo) and local interpreters were employed. Farmers were engaged in focus group discussions prior to the PVS to discuss most relevant traits in teff cultivation and to attend training on how to perform the PVS. During the evaluation, farmers were divided into gender-homogeneous groups with five people each. Groups were conducted across the field from random entry points and asked to evaluate two traits: PA and OA. PVS traits were given on a Likert scale from 1 (poor) to 5 (excellent), in a way answering a question in the form of: ‘how much do you like the [panicle appearance/overall appearance] of this plot from one to five?’. Farmers provided their scores simultaneously so that within-group scoring bias was reduced. Each farmers’ score was recorded individually. We used Plackett–Luce model (Turner et al., 2019) to analyze farmers’ OA. Agronomic traits from tested genotypes were linked to farmer’s OA with a linear model using an Alternating Directions Method of Multipliers (ADMM) algorithm proposed by Yıldız et al., 2020.

Phenotypic data analyses

Request a detailed protocol

We used an analysis of variance (ANOVA) to describe trait differences conditional to teff genetic groups, locations, and, in the case of PVS, to gender with a procedure similar to that used in Mokuwa et al., 2013. Best linear unbiased predictions (BLUPs) of agronomic and PVS traits were computed with R/ASReml (Gilmour et al., 2014). BLUPs for agronomic traits were derived from the general model in Equation S1:

(S1) yik=μ+gi+lk+glik+e

where the observed phenotypic value is yik , µ is the overall mean of the population, gi is the random effect for the ith genotype g, lk is the fixed effect for the kth location, glik is the random effect interaction between genotype and location, and e is the error. For calculation of BLUPs with a single location, the data were subset by location and the model in Equation (S1) was simplified accordingly. Broad-sense heritability (H2) of agronomic traits was derived from the variance component estimates deriving from Equation (S1) as follows:

(S2) H2=σgσg+σglnloc+σenrep*nloc

In Equation (S2), σg is the variance component of genotypes, σgl is the genotype by location variance, and σe is the error variance. nlocnrep are the number of locations and replications, respectively. For calculation of H2 within locations (i.e., repeatability), Equation (S2) was simplified accordingly.

The derivation of PVS BLUPs and H2 was like that used for agronomic traits except for the fact that gender of farmers was considered. BLUPs for PVS were obtained from the model in Equation (S3):

(S3) yikm=μ+gi+lk+pm+glik+gpim+plmk+e

where yikm is the observed PVS score, and µ, gi , lk , and glik are as in Equation (S1) and pm is the random effect for farmer gender. Accordingly, gpim is the random effect of the interaction between genotype and gender and plmk is the random interaction between gender m and the kth location. For calculation of BLUPs specific for gender, location, and gender by locations, Equation (S3) was simplified accordingly. H2 for PVS traits was derived from the following formula:

(S4) H2=σgσg+σglnloc+σgmngender+σenrep*nloc*ngender*nfarmer

In Equation (S4), σg is the variance component of genotypes, σgl is the genotype by location variance, σgm is the genotype by gender variance, and σe is the error variance. nloc , ngender , and nrep are the number of locations, genders, and replications, respectively. For calculation of H2 (i.e., repeatability) by gender and by location, Equation (S4) was simplified accordingly. The 90th percentile of the OA distribution was considered to identify top ranking accessions for men and women. Farmer varieties were benchmarked with the fourth quartile of the distribution of improved lines for all scored traits.

Genetic diversity

Request a detailed protocol

Phylogenetic relationships in the EtDP were assessed on a pruned set of SNP markers with MAF >0.05. Pruning was performed with the PLINK (Purcell et al., 2007) indep-pairwise function on a 100 SNPs window moving in 10 SNP steps with an LD r2 threshold of 0.3. Pairwise identity by descent (IBS) was calculated by PLINK and visualized with custom scripts in R (R Development Core Team, 2018). PCA and DAPCs were performed with R/adegenet (Jombart, 2008). The optimal number of clusters (K) for the DAPC was identified using find.cluster(). Bayesian information criterion (BIC) statistics were computed at increasing values of K to measure the goodness of fit at each K using 365 PCs and default settings. ADMIXTURE (Alexander and Lange, 2011) was run testing 2–25 K clusters using the default termination criterion. Each iteration was run using different random seeds, and parameter standard errors were estimated using 2000 bootstrap replicates. A fivefold cross-validation procedure was used to identify the most likely value of K. The correlation of the residual difference between the true genotypes and the genotypes predicted by the model was estimated using EvalAdmix (Garcia-Erill and Albrechtsen, 2020). A neighbor-joining (NJ) tree was developed computing genetic distances using the Tajima–Nei method (Tajima and Nei, 1984) and performing 500 bootstrap resampling, using MEGA X (Kumar et al., 2018). Different NJ tree visualizations were produced using R/ggtree (Yu et al., 2017). Genotypic data of putative teff wild relatives E. curvula and E. pilosa were integrated in the NJ phylogeny. A set of putative SNPs shared between wild relatives and cultivated teff was derived as described for the EtDP.

LD analyses were performed on SNPs with MAF >0.05. Average pairwise r2 for all markers within a window of ±5 Mb was estimated using R/LDheatmap (Shin et al., 2007). The LD was plotted against physical positions, averaging pairwise r2 values for each chromosome over sliding window considering portions equal to 5% of each chromosome’s physical length. LD decay was then estimated for each of the chromosomes (Hill and Weir, 1988) using a threshold of r2 = 0.3. Haplotype blocks were estimated using the PLINK-blocks function with default settings and following the interpretation of Gabriel et al., 2002.

Climatic diversity and GF

Request a detailed protocol

Agroecological and bioclimatic variation analyses were performed on georeferenced materials of the EtDP. The distribution of the DAPC genetic clusters across agroecological zones was mapped via R/raster. After aggregating teff georeferenced accessions in Ethiopian administrative regions at the second level (districts), pairwise Fst (Weir and Cockerham, 1984) was calculated across all SNP markers for all areas accounting at least five individuals. Centroid coordinates of the accessions within each district were used to estimate geographic distances, while environmental distances were calculated by averaging the value of noncorrelated historical bioclimatic variables and altitude. A measure of environmental distance between each accession was thus calculated as pairwise Euclidean differences between locations. A Mantel test with a Monte Carlo method (9999 replications) was implemented in R/ade4 (Dray and Dufour, 2007) to check associations between linearized Fst (Fst/1 − Fst) and geographic and environmental distances.

The teff cropping area was defined by the union of all polygons representing agroecological zones in which at least two teff landraces were sampled. Significant associations between genetic clusters and agroecological zones and administrative regions were assessed using Pearson’s chi-squared test of independence. Pairwise Wilcoxon rank sum test was used to test the significance (p < 0.05) of differences in bioclimatic variables among DAPC clusters. A GF machine-learning approach implemented in R/gradientForest (Ellis et al., 2012; Fitzpatrick and Keller, 2015) was used to map the turnover in allele frequencies using nonlinear functions of environmental gradients with historical and projected climates. The GF was developed using historical noncollinear bioclimatic variables and MEM variables representing climatic and geographic diversity in the sample, respectively. MEM variables were derived from geographic coordinates at sampling locations of the landraces in the EtDP (Dray et al., 2006; Griffith and Peres-Neto, 2006) and were calculated with (dbmem) in R/adespatial (Stéphane Dray et al., 2021). A function was built for each response variable (SNPs) using 500 regression trees. An aggregate function was created for all SNPs, whereas the bioclimatic variables and MEMs were used as predictors. The model was then run to predict teff genetic–geographic–climatic distribution on the teff cultivation range in Ethiopia. The GF model was also run using and projected climate data under different RCP scenarios.

Climate projections for areas of interest were analyzed to assess trends in rainfall and temperature. The 12 models best performing in the East Africa region according to IPCC (IPCC, 2017) were used to develop and ensemble projection of rainfall and temperature indices with Climate Data Operators (CDO) (Schulzweida, 2017) and custom R scripts. Projected data were compared with historical data to derive indices change in the interannual variability for the regions of interest.

Genome-wide association studies

Request a detailed protocol

QTNs were mapped in a GWAS. GWAS was performed with R/rMVP (Yin et al., 2019) using the Fixed and random model Circulating Probability Unification (FarmCPU) method (Liu et al., 2016) that incorporates corrections for population cryptic relatedness (Kinship). The first 10 genetic PCs were used as covariates to account for population structure. The Kinship was estimated using the method implemented by VanRaden, 2008. Both kinship and PCA were calculated using the subset of LD-pruned markers used for population genetics analysis. GWAS was run on bioclimatic variables, agronomic traits, and PVS traits. After the first round of mapping with 10 PCs, individual QQ plot was visually surveyed for inflation in the p value distributions, as these could be caused by suboptimal correction for population structure. When inflation was detected, the corresponding GWAS scan was run again using 25 genetic PCs as covariates. QTN was called when association surpassed a multiple testing correction with false discovery rate of 5% using R/q value (Storey et al., 2021). QTNs were assigned to the previously defined haplotype blocks. Blocks were extended by the chromosome-specific LD decay distance upstream and downstream and used as windows to search for candidate genes. The LD blocks thus obtained were combined with Fst and GF results to identify intersections across methods.

Teff gene annotations were retrieved from CoGe under id50954 (VanBuren et al., 2020). Nucleotide sequences of putative candidate genes were translated into the corresponding proteins and used as queries against Araport11 (Cheng et al., 2017) and the Maize reference proteome, available from UniProt ( under the ID UP000007305. E value of 10−20 and percentage of identity of 50% were used as threshold to retain blast hits on Arabidopsis and maize.

Data availability

Teff accessions are available upon request from the Ethiopian Biodiversity Institute (EBI, Raw DNA sequencing reads are available on the Short Read Archive ( under BioProject accession number PRJNA758057. All scripts used for data analysis are available on GitHub at, (copy archived at swh:1:rev:4935b0ef76f4c1631d323d39777ec375912e2f82).

The following data sets were generated
    1. Dell'acqua M
    (2021) NCBI BioProject
    ID PRJNA758057. Genetic characterization of Ethiopian Teff diversity panel.


  1. Book
    1. Gomez y Paloma S
    2. Riesgo L
    3. Louhichi K
    4. Fan S
    5. Rue C
    (2020) Nutrition
    In: Gomez y Paloma S, editors. The Role of Smallholder Farms in Food and Nutrition Security. Cham: Springer International Publishing. pp. 13–28.
  2. Book
    1. Ketema S
    Eragrostis Tef Zucc. Trotter: Promoting the Conservation and Use of Underutilized and Neglected Crops 12
    Italy: International Plant Genetic Resources Institute.
  3. Software
    1. R Development Core Team
    (2018) R: A language and environment for statistical computing
    R Foundation for Statistical Computing, Vienna, Austria.
  4. Website
    1. Schulzweida U
    (2017) CDO User’ s Guide
    Accessed August 25, 2021.

Article and author information

Author details

  1. Aemiro Bezabih Woldeyohannes

    1. Center of Plant Sciences, Scuola Superiore Sant'Anna, Pisa, Italy
    2. Amhara Regional Agricultural Research Institute, Bahir Dar, Ethiopia
    Resources, Data curation, Formal analysis, Investigation, Writing – original draft
    Contributed equally with
    Sessen Daniel Iohannes
    Competing interests
    No competing interests declared
  2. Sessen Daniel Iohannes

    Center of Plant Sciences, Scuola Superiore Sant'Anna, Pisa, Italy
    Resources, Data curation, Software, Formal analysis, Visualization, Writing – original draft
    Contributed equally with
    Aemiro Bezabih Woldeyohannes
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7831-8997
  3. Mara Miculan

    Center of Plant Sciences, Scuola Superiore Sant'Anna, Pisa, Italy
    Resources, Data curation, Software, Formal analysis, Investigation, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9884-5727
  4. Leonardo Caproni

    Center of Plant Sciences, Scuola Superiore Sant'Anna, Pisa, Italy
    Formal analysis, Investigation, Visualization, Methodology, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7129-8575
  5. Jemal Seid Ahmed

    Center of Plant Sciences, Scuola Superiore Sant'Anna, Pisa, Italy
    Software, Formal analysis, Validation, Writing – review and editing
    Competing interests
    No competing interests declared
  6. Kauê de Sousa

    1. Digital Inclusion, Bioversity International, Montpellier, France
    2. Department of Agricultural Sciences, Inland Norway University of Applied Sciences, Hamar, Norway
    Data curation, Investigation, Visualization, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7571-7845
  7. Ermias Abate Desta

    Amhara Regional Agricultural Research Institute, Bahir Dar, Ethiopia
    Conceptualization, Supervision, Project administration, Writing – review and editing
    Competing interests
    No competing interests declared
  8. Carlo Fadda

    Biodiversity for Food and Agriculture, Bioversity International, Nairobi, Kenya
    Conceptualization, Supervision, Funding acquisition, Project administration, Writing – review and editing
    Competing interests
    No competing interests declared
  9. Mario Enrico Pè

    Center of Plant Sciences, Scuola Superiore Sant'Anna, Pisa, Italy
    Conceptualization, Supervision, Project administration, Writing – review and editing
    Competing interests
    No competing interests declared
  10. Matteo Dell'Acqua

    Center of Plant Sciences, Scuola Superiore Sant'Anna, Pisa, Italy
    Conceptualization, Data curation, Supervision, Validation, Visualization, Methodology, Writing – original draft, Project administration
    For correspondence
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-5703-8382


Scuola Superiore Sant'Anna

  • Aemiro Bezabih Woldeyohannes

The funders had no role in study design, data collection, and interpretation, or the decision to submit the work for publication.


We wish to thank the men and women farmers who took part to the research, evaluating hundreds of genetic materials with great enthusiasm and competence in Adet and Akaki. These are their names in alphabetical order: Abebech, Adisse Abera, Adugawu Kere, Alebachew, Angoachi Fente, Askale, Ayana, Belachew Adimasu, Belay, Birtukan Ayele, Birhanu, Genanaw Dejene, Getasil, Habtamu Belay, Mame Seyum, Mamo, Misganaw Dagnaw, Regassa, Sefiwu Kassahun, Shashe, Tenagne Wubet, Tesfa Kefyalew, Tihune Dires, Tirusew, Tsehay Desie, Workinesh, Workineh Tsega, Wubet, Wubitu, Yechale Ayalneh, Yekoye, Yeniguse Worku, Yeniguse Wuletaw, Zebu Kebede, and Zina Yitay. We thank the Ethiopian Biodiversity Institute (EBI) for providing seed material and laboratory space to perform the DNA extraction in Ethiopia. Many thanks to Dr. Eleni Shiferaw, Dr. Basazen Fantahun Lakew, and Dr. Yosef Gebrehawairat Kidane for coordinating local support at EBI. We are grateful to the reviewers for constructive criticism and help in improving the manuscript. Funding was provided by the Doctoral School in Agrobiodiversity at Scuola Superiore Sant’Anna.

Version history

  1. Preprint posted: August 28, 2021 (view preprint)
  2. Received: May 5, 2022
  3. Accepted: August 8, 2022
  4. Version of Record published: September 2, 2022 (version 1)


© 2022, Woldeyohannes, Iohannes et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.


  • 1,426
  • 245
  • 9

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Aemiro Bezabih Woldeyohannes
  2. Sessen Daniel Iohannes
  3. Mara Miculan
  4. Leonardo Caproni
  5. Jemal Seid Ahmed
  6. Kauê de Sousa
  7. Ermias Abate Desta
  8. Carlo Fadda
  9. Mario Enrico Pè
  10. Matteo Dell'Acqua
Data-driven, participatory characterization of farmer varieties discloses teff breeding potential under current and future climates
eLife 11:e80009.

Share this article

Further reading

    1. Genetics and Genomics
    2. Neuroscience
    Céline Petitgas, Laurent Seugnet ... Serge Birman
    Research Article

    Adenine phosphoribosyltransferase (APRT) and hypoxanthine-guanine phosphoribosyltransferase (HGPRT) are two structurally related enzymes involved in purine recycling in humans. Inherited mutations that suppress HGPRT activity are associated with Lesch–Nyhan disease (LND), a rare X-linked metabolic and neurological disorder in children, characterized by hyperuricemia, dystonia, and compulsive self-injury. To date, no treatment is available for these neurological defects and no animal model recapitulates all symptoms of LND patients. Here, we studied LND-related mechanisms in the fruit fly. By combining enzymatic assays and phylogenetic analysis, we confirm that no HGPRT activity is expressed in Drosophila melanogaster, making the APRT homolog (Aprt) the only purine-recycling enzyme in this organism. Whereas APRT deficiency does not trigger neurological defects in humans, we observed that Drosophila Aprt mutants show both metabolic and neurobehavioral disturbances, including increased uric acid levels, locomotor impairments, sleep alterations, seizure-like behavior, reduced lifespan, and reduction of adenosine signaling and content. Locomotor defects could be rescued by Aprt re-expression in neurons and reproduced by knocking down Aprt selectively in the protocerebral anterior medial (PAM) dopaminergic neurons, the mushroom bodies, or glia subsets. Ingestion of allopurinol rescued uric acid levels in Aprt-deficient mutants but not neurological defects, as is the case in LND patients, while feeding adenosine or N6-methyladenosine (m6A) during development fully rescued the epileptic behavior. Intriguingly, pan-neuronal expression of an LND-associated mutant form of human HGPRT (I42T), but not the wild-type enzyme, resulted in early locomotor defects and seizure in flies, similar to Aprt deficiency. Overall, our results suggest that Drosophila could be used in different ways to better understand LND and seek a cure for this dramatic disease.

    1. Genetics and Genomics
    Gbolahan Bamgbose, Guillaume Bordet ... Alexei Tulin
    Research Article

    PARP-1 is central to transcriptional regulation under both normal and stress conditions, with the governing mechanisms yet to be fully understood. Our biochemical and ChIP-seq-based analyses showed that PARP-1 binds specifically to active histone marks, particularly H4K20me1. We found that H4K20me1 plays a critical role in facilitating PARP-1 binding and the regulation of PARP-1-dependent loci during both development and heat shock stress. Here, we report that the sole H4K20 mono-methylase, pr-set7, and parp-1 Drosophila mutants undergo developmental arrest. RNA-seq analysis showed an absolute correlation between PR-SET7- and PARP-1-dependent loci expression, confirming co-regulation during developmental phases. PARP-1 and PR-SET7 are both essential for activating hsp70 and other heat shock genes during heat stress, with a notable increase of H4K20me1 at their gene body. Mutating pr-set7 disrupts monomethylation of H4K20 along heat shock loci and abolish PARP-1 binding there. These data strongly suggest that H4 monomethylation is a key triggering point in PARP-1 dependent processes in chromatin.