A roadmap for gene functional characterisation in crops with large genomes: Lessons from polyploid wheat

  1. Nikolai M Adamski
  2. Philippa Borrill
  3. Jemima Brinton
  4. Sophie A Harrington
  5. Clémence Marchal
  6. Alison R Bentley
  7. William D Bovill
  8. Luigi Cattivelli
  9. James Cockram
  10. Bruno Contreras-Moreira
  11. Brett Ford
  12. Sreya Ghosh
  13. Wendy Harwood
  14. Keywan Hassani-Pak
  15. Sadiye Hayta
  16. Lee T Hickey
  17. Kostya Kanyuka
  18. Julie King
  19. Marco Maccaferrri
  20. Guy Naamati
  21. Curtis J Pozniak
  22. Ricardo H Ramirez-Gonzalez
  23. Carolina Sansaloni
  24. Ben Trevaskis
  25. Luzie U Wingen
  26. Brande BH Wulff
  27. Cristobal Uauy  Is a corresponding author
  1. John Innes Centre, Norwich Research Park, United Kingdom
  2. School of Biosciences, University of Birmingham, United Kingdom
  3. John Bingham Laboratory, United Kingdom
  4. Commonwealth Scientific and Industrial Research Organisation, Agriculture and Food (CSIRO), Australia
  5. Council for Agricultural Research and Economics, Research Centre for Genomics and Bioinformatics, Italy
  6. European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, United Kingdom
  7. Rothamsted Research, United Kingdom
  8. Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, Australia
  9. Division of Plant and Crop Sciences, The University of Nottingham, Sutton Bonington Campus, United Kingdom
  10. Department of Agricultural and Food Sciences (DISTAL), Alma Mater Studiorum - Università di Bologna (University of Bologna), Italy
  11. Crop Development Centre, University of Saskatchewan, Canada
  12. International Maize and Wheat Improvement Center (CIMMYT), Mexico
6 figures and 3 tables

Figures

Gene homology within polyploid wheat.

Due to two separate hybridisation events, genes in polyploid wheat will be present in multiple copies called homoeologs, which usually have similar chromosome locations. In the example of hexaploid bread wheat illustrated here, Gene X has homoeologs on chromosomes 1A, 1B and 1D. Duplicated genes, called paralogs (e.g. two copies of Gene Y on chromosome 7A), have evolved either within wheat or in one of its ancestral species. Most paralogs arise from intra-chromosomal duplications, although inter-chromosomal duplications can also occur.

The roadmap for gene characterisation in wheat.

Overview of a proposed strategy to take a gene from any plant species, identify the correct wheat ortholog(s) using Ensembl Plants (https://plants.ensembl.org) and determine gene expression using expression browsers and gene networks. Suggestions for functional characterisation are provided including induced variation such as mutants, transgenics or Virus-Induced Gene Silencing (VIGs). In addition, publicly available populations incorporating natural variation are available. Finally steps for growing, genotyping and crossing plants are outlined. Links to detailed tutorials and further information are provided and can be found on www.wheat-training.com. (1) www.wheat-training.com/wp-content/uploads/Genomic_resources/pdfs/EnsemblPlants-primer.pdf (2) www.wheat-training.com/wp-content/uploads/Genomic_resources/pdfs/Finding-wheat-orthologs.pdf (3) www.wheat-training.com/wp-content/uploads/Genomic_resources/pdfs/Genome_assemblies.pdf (4) www.wheat-training.com/wp-content/uploads/Genomic_resources/pdfs/Gene-models.pdf (5) www.wheat-training.com/wp-content/uploads/Genomic_resources/pdfs/Expression-browsers.pdf (6) www.wheat-training.com/wp-content/uploads/Genomic_resources/pdfs/Gene-networks.pdf (7) www.wheat-training.com/wp-content/uploads/Functional_studies/PDFs/Selecting-TILLING-mutants.pdf (8) www.wheat-training.com/wp-content/uploads/Functional_studies/PDFs/Transgenics.pdf (9) www.wheat-training.com/wp-content/uploads/Functional_studies/PDFs/Virus_Induced_Gene_Silencing.pdf (10) www.wheat-training.com/wp-content/uploads/Functional_studies/PDFs/Populations.pdf (11) www.wheat-training.com/wp-content/uploads/Genomic_resources/Variation-data.pdf (2) www.wheat-training.com/wp-content/uploads/Wheat_growth/pdfs/Growing_Wheat_final.pdf (13) www.wheat-training.com/wp-content/uploads/Wheat_growth/pdfs/Speed_breeding.pdf (14) www.wheat-training.com/wp-content/uploads/Functional_studies/PDFs/Designing-genome-specific-primers.pdf (15) https://www.biosearchtech.com/support/education/kasp-genotyping-reagents/running-kasp-genotyping-reactions (16) http://www.wheat-training.com/wp-content/uploads/Wheat_growth/pdfs/How-to-cross-wheat-pdf.pdf (17) www.wheat-training.com/wp-content/uploads/Functional_studies/PDFs/Designing-crossing-schemes.pdf.

Gene model ID nomenclature description from the five available gene annotations for domesticated polyploid wheat.

Here, one gene is used as an example to highlight the differences in gene ID nomenclature. Fields represented in the nomenclature are shown at the top with matching colours for the corresponding features in the gene names. Yellow background shows the CSS gene names with dark grey arrows pointing towards the corresponding field in the TGAC gene annotation (TGACv1, green background). Blue backgrounds show the gene nomenclatures for RefSeqv1.0 and v1.1 annotations (as used in Ensembl Plants), while the lilac background shows the nomenclature for Svevo v1.0 (modern durum wheat). (1) Two annotation versions are available for the RefSeqv1.0 genome assembly: RefSeqv1.0 (release annotation) and RefSeqv1.1 (improved annotation). These are differentiated by the annotation version number; ‘01’ for RefSeqv1.0 and ‘02’ for RefSeqv1.1. Otherwise, the annotations follow the same rules. (2) In the RefSeq and Svevo annotations, the biotype is represented by an additional identifier, where G = gene. (3) In the RefSeqv1.0 and v1.1 annotation, identifiers are progressive numbers in steps of 100 reflecting the relative position between gene models. For example, gene TraesCS5B02G236400 would be adjacent to gene TraesCS5B02G236500. However, it is important to note that the relative positions of genes may change in future genome releases as the assembly is improved, for example, if scaffolds are rearranged. In these cases, the gene order would no longer be retained. In the gene annotation for the tetraploid durum wheat cv. Svevo, the species name is TRITD (TRITicum Durum) and gene identifiers increase in steps of 10, rather than by steps of 100 as in the RefSeq hexaploid wheat annotation. Note that RefSeqv1.0 and v1.1 comprises High Confidence (HC) and Low Confidence (LC) gene models. Low Confidence gene models are flagged by the ‘LC’ at the end (not shown). HC and LC genes which otherwise display the same unique identifier are not the same locus and are not in sequential order. Hence, TraesCS5B02G236400 and TraesCS5B02G236400LC are both located on chromosome 5B, but are not the same gene nor are they physically adjacent. Similarly, genes from homoeologous chromosomes with the same subsequent numeric identifier are not necessarily homoeologous genes. For example, TraesCS5A02G236400, TraesCS5B02G236400 and TraesCS5D02G236400 are not homoeologous genes.

Crossing scheme to combine TILLING or CRISPR/Cas9 single mutants in wheat.

In tetraploid wheat, mutations in the A and B genome homoeologs can be combined through a single cross. The F1 plants are self-pollinated to produce a segregating F2 population which contains homozygous double and single mutants, as well as wild type plants (screening using molecular markers required; only four genotypes shown). These F2 progeny can be characterised for the phenotype of interest. The use of ‘speed breeding’ (Watson et al., 2018), reduces the time taken to reach this phenotyping stage from 12 (yellow) to 7.5 months (green). In hexaploid wheat, a second round of crossing is required to combine the mutant alleles from all three homoeologs. The F2 progeny segregating for the three mutant alleles can be genotyped using molecular markers to select the required combination of mutant alleles (only five genotypes shown; all factorial combinations are possible). Speed breeding reduces the time taken to generate triple homozygous mutants for phenotyping to 10 months (green), compared to 16 months in conventional conditions (yellow). Self-pollination is represented by an X inside a circle. Combinations of wild type alleles from the A (AA), B (BB) and D (DD) genomes, as well as the mutant alleles from each genome (aa, bb and dd, respectively) are indicated.

Case study exemplifying use of available resources for gene functional characterisation in wheat.

(A). The Ensembl Plants Gene Tree illustrates the identification of the wheat triad (green bar) most closely related to AtHSFB1 (shown in purple). (B) Using Os09g0456800 (the rice ortholog of AtHSFB1) as a BLASTp query against wheat predicted proteins independently identifies the same wheat triad. (C) Examination of RNA expression data from www.wheat-expression.com shows that the wheat triad is most highly expressed in the spike, with differential expression in abiotic and disease stress conditions. The samples are identified by tissue of origin (spike, green; grain, purple; leaves/shoots, orange; roots, yellow) and stress (none, light blue; abiotic, green; disease, dark blue) as they are on the website. (D) After identification of suitable wheat TILLING mutants, A and B genome homoeologs are combined via this example crossing scheme, demonstrating the four crosses required between the two selected mutations in each homoeolog. Note that the functional validation proposed in (D) is carried out using the tetraploid mutant population.

The KnetMiner network illustrates the putative role of the wheat TBF1 orthologs in responding to abiotic stress.

The wheat orthologs of the Arabidopsis gene TBF1, here depicted as three copies of the gene HSFB1 (light blue triangles) fall in expression module three (brown arrow; WGCNA module 3). The genes in this module are enriched for GO terms such as ‘Response to Stress’ and ‘Response to Abiotic Stimulus’ (dark green pentagons). The HFSB1 homoeologs are predicted to regulate other genes (blue triangles) in the GENIE3 network (purple connecting arrows) which are associated with the drought tolerance trait ontology terms (light green pentagon). PTC mutations are available for all three HFSB1 homoeologs (dark green stars connecting with STOP GAINED SNP effect) in the Cadenza population.

Tables

Table 1
Comparison of annotated genome assemblies in hexaploid and tetraploid wheat.
CSSTGACv1RefSeqv1.0Durum wheatWild emmer wheat
PublicationMayer et al., 2014Clavijo et al., 2017The International Wheat Genome Sequencing Consortium (IWGSC) et al., 2018Maccaferri et al., 2019Avni et al., 2017
Contigs/Chromosomes>1 million735,94321 chromosomes + ChrU14 chromosomes + ChrU14 chromosomes + ChrU
Mean scaffold size7.7 kbp88.7 kbpChromosomesChromosomesChromosomes
Assembly Size10.2 Gbp13.4 Gbp14.6 Gbp10.5 Gbp10.5 Gbp
OrderSynteny/genetic orderLarge BinsPhysical orderPhysical orderPhysical order
Coding genes*133,090 HC
88,998 LC
104,091 HC
103,660 LC
107,891 HC
161,537 LC
66,559 HC
303,404 LC
67,182 HC
271,179 LC
Assembly-related resourcesArchive Ensembl PlantsArchive Ensembl PlantsEnsembl Plants
GrainGenes, URGI
Ensembl Plants
GrainGenes
Ensembl Plants
GrainGenes
TILLING mutantsTILLING mutants
expVIP, wheatExpexpVIPexpVIP, eFP
AccessionChinese SpringChinese SpringChinese SpringSvevoZavitan
  1. *Number of high confidence (HC) and low confidence (LC) genes which are defined based on multiple criteria outlined in the published papers. Care must be taken when interpreting their nomenclature (see Figure 3).

    Chromosome arm assignment was derived from chromosome flow-sorting, while approximate intra-chromosomal ordering was established using synteny derived from grasses (GenomeZipper) and genetic mapping (POPSEQ) (Mascher et al., 2013; Mayer et al., 2014).

Table 2
Natural variation resources available in wheat.
CollectionShort descriptionNumber of accessionsGenotypingData/seed availabilityMore information/Reference
Wild wheat relatives and progenitor species
Seeds of DiscoveryWheat and wild relative accessions held by ICARDA and CIMMYT80,000 accessions: 56,342 domesticated hexaploid (eight taxa); 18,946 domesticated tetraploid (eight taxa); 3,903 crop wild relatives included all known 27 wild species from Aegilops-Triticum species complex and 11 genomic constitutions.DArT-seqCIMMYT Dataverse http://hdl.handle.net/11529/10548030
Germinate data warehouse http://germinate.cimmyt.org/wheat. Records for all germplasm accessions can also be accessed at https://ssl.fao.org/glis/
https://seedsofdiscovery.org/
Open Wild WheatAccessions of Aegilops tauschii (D genome progenitor)265 accessionsWhole genome shotgun sequenced (10-30x)Sequencing: https://opendata.earlham.ac.uk/wheat/under_license/toronto/; Seed: https://www.seedstor.ac.uk/search-browseaccessions.php?idCollection=38www.openwildwheat.org; Arora et al., 2019
Wild wheat introgression linesIntrogression lines from Aegilops caudata, Aegilops speltoides, Amblyopyrum muticum, Thinopyrum bessarabicum, Thinopyrum elongatum, Thinopyrum intermedium, Thinopyrum ponticum, Triticum timopheevii, Triticum urartu, rye and wheat cultivars (Chinese Spring, Higbury, Paragon, Pavon 76)153 stable homozygous introgression lines available35K Axiom Wheat Relative Genotyping array + 710 KASP markers (Grewal et al., 2020)Genotype: https://www.nottingham.ac.uk/wrc/germplasm-resources/genotyping.aspx; Seed: https://www.seedstor.ac.uk/ (accessions WR0001-WR0155)www.nottingham.ac.uk/WISP; Grewal et al., 2018a; Grewal et al., 2018b, King et al., 2018, King et al., 2017
Synthetic hexaploid wheat
Synthetic hexaploid wheatSythetic hexaploid wheats generated using Aegilops tauschii (DD) + European tetraploid (AABB) wheat50 synthetic hexaploid wheats + pre-breeding accessions; backcross populations with Robigus and Paragon also available35K Axiom breeders arrayGenotype: https://www.cerealsdb.uk.net/cerealgenomics/CerealsDB/axiom_download.php
Seed: https://www.seedstor.ac.uk/ (store codes WS0001-WS0232)
https://www.niab.com/research/research-projects/designing-future-wheat
Wheat diversity panels
Watkins historic collection of landrace wheatsWorld collection of wheat landraces grown as farmer saved seed before the 1930s. Genetically stable collection developed by two generations of single seed descent829 accessions (core set of 119 represent majority of assayed genotypic variation). F4:5 mapping populations against Paragon, mainly for the core set.35K Axiom breeders array (Allen et al., 2017); subset exome sequenced (Gardiner et al., 2018)Genotype: https://www.cerealsdb.uk.net/cerealgenomics/CerealsDB/axiom_download.php
Seed: https://www.seedstor.ac.uk/ (store codes WATDE0001-WATDE1063)
http://wisplandracepillar.jic
.ac.uk/results_resources.htm ; Wingen et al., 2014; Wingen et al., 2017
GEDIFLUX (Genetic Diversity Flux) collectionWestern European winter wheat varieties that individually occupied over 5% of national acreage from 1945 to 2000. Bi-parental populations with Paragon (ongoing)479 accessions35K Axiom breeders arrayGenotype: https://www.cerealsdb.uk.net/cerealgenomics/CerealsDB/axiom_download.php;
Seed: https://www.seedstor.ac.uk/ (store codes WGED0001- WGED0729)
http://wisplandracepillar.jic.ac.uk/results_resources.htm; Wingen et al., 2014
NIAB wheat association mapping panelBread wheat varieties released between 1916–2007. Predominantly UK varieties (68%), also other North Western European countries e.g. France (10%) and Germany (8%)480 accessions90K SNP arraySeed, Genotype and Pedigree:
https://www.niab.com/research/research-projects/resources
Fradgley et al., 2019
OzWheat diversity panelGenetic diversity in Australian wheat breeding (colonial landraces 1860s, first Australian-bred cultivars 1890s, CIMMYT-derived semi dwarfs 1960s, post 2000 wheat)285 accessions90K SNP array + additional 26K SNPs from transcriptome dataSeed and Genotype: contact Shannon Dillon from
CSIRO (Shannon.Dillon@csiro.au)
Vavilov wheat collectionHexaploid wheat accessions including landraces, historic breeding lines and cultivars. Pure lines generated by single seed descent295 accessionsDArtT-seq (34,311 polymorphic markers)Genotype: Lee Hickey at The University of Queensland (l.hickey@uq.edu.au)
Seed: Australian Grains Genebank (sally.norton@ecodev.vic.gov.au)
Riaz et al., 2017
WHEALBI wheat panelWorldwide wheat accessions including diploid and tetraploid wild relatives, old hexaploid landraces and modern elite cultivars487 accessionsExome capture (~600,000 genetic variants in ~40,000 genes; 12,000 genes identified as putative presence/absence variation compared to RefSeqv1.0)Genotype:https://urgi.versailles.inra.fr/download/iwgsc/IWGSC_RefSeq_Annotations/v1.0/iwgsc_refseqv1.0_Whealbi_GWAS.zip;
Seed: https://www.gbif.org/dataset/a52ca10a-136a-4072-a6de-3ec6e7852365
Pont et al., 2019
Global Durum Wheat (GDP) panelDiversity used in durum wheat breeding programs globally, including landraces and modern varieties1,056 accessions90K SNP arrayGenotype: ms in preparation; Seed: ICARDA genebank http://indms.icarda.org
Filippo Bassi,
F.Bassi@cgiar.org
Tetraploid wheat Global Collection (TGC)Wild emmer wheat, domesticated emmer, durum wheat landraces and other tetraploid wheat sub-species (Triticum aethiopicum, Triticum carthlicum, Triticum polonicum, Triticum turanicum, Triticum turgidum, Triticum karamyschevii and Triticum petropavlovsky)1,856 accessions90K SNP arrayGenotype: GrainGenes; Seed: on request for non-commercial use from University of Bologna (marco.maccaferri@unibo.it and roberto.tuberosa@unibo.it)Maccaferri et al., 2019
MAGIC populations
CSIRO, Aus4-way (parents Baxter, Chara, Westonia, Yitpi); 8-way (parents Baxter, Westonia, Yitpi, AC Barrie (Canada), Xiaoya54 (China), Volcani (Israel), Pastor (Mexico), Alsen (USA))1,500 (4-way) and 3,000 (8-way) RILs90K SNP array, microsatellite and DArT markers > 20,000 SNPs mapped in each populationSeed and Genotype: on request from CSIRO (Bill.Bovill@csiro.au)Huang et al., 2012; Shah et al., 2019
NIAB, UK8-way (parents Alchemy, Brompton, Claire, Hereward, Rialto, Robigus, Xi19, Soissions); 16-way (Banco, Bersee, Brigadier, Copain, Cordiale, Flamingo, Gladiator, Holdfast, Kloka, Maris Fundin, Robigus, Slejpner, Soissons, Spark, Steadfast, Stetson)NIAB 8-way MAGIC:>1,000 RILs; NIAB 16-way MAGIC: ~600 RILs35K Axiom breeders array. Genome sequence (Claire, Robigus, others underway). Exome capture sequence of 16-way parents. Skim-seq of all RILs underway.Claire and Robigus genomes: https://opendata.earlham.ac.uk/opendata/data/Triticum_aestivum/EI/v1.1/ Genotyping and Seed: https://www.niab.com/research/research-projects/resourcesMackay et al., 2014; Gardner et al., 2016
Germany8-way (Event, Format, BAYP4535, Potenzial, Ambition, Bussard, Firl3565, Julius)394 F6:8 RILs5,435 SNPs from SNP arrayGenotype and pedigree: http://doi.org/10.14459/2018mp1435172 (click the ‘open attachment browser’ link); Seed: Bavarian State Research Centre for Agriculture (Freising, Germany)Stadlmeier et al., 2018
GermanyWM-800, 8-way (Patras, Meister, Linus, JB Asano, Tobak, Bernstein, Safari, Julius)910 F4:6 RILs15K Infinium iSelect SNP arrayGenotype and pedigree: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6069784; Seed: on request from Martin Luther University, Germany (klaus.pillen@landw.uni-halle.de)Sannemann et al., 2018
Durum4-way (Claudio (Italy), Colosseo (Italy), Neodur (France), Rascon/2*Tarro (advanced CIMMYT line))334 F7:8 RILs90K SNP arrayGenotype and pedigree: https://onlinelibrary.wiley.com/doi/full/10.1111/pbi.12424; Seed: on request for non-commercial use from University of Bologna (marco.maccaferri@unibo.it and roberto.tuberosa@unibo.it)Milner et al., 2016
Table 3
Tetraploid and hexaploid wheat genome assemblies that are currently available, in addition to the Chinese Spring reference hexaploid genome.
VarietyHabitOriginAvailability †
Hexaploid wheat
CDC LandmarkspringCanada10+ Genome Project
CDC StanleyspringCanada10+ Genome Project
ParagonspringUK10+ Genome Project
CadenzaspringUK10+ Genome Project
LongReach LancerspringAustralia10+ Genome Project
MacespringAustralia10+ Genome Project
Synthetic W7984springMexicoChapman et al. (2015)
WeebillspringMexico10+ Genome Project
ArinaLrForwinterSwitzerland10+ Genome Project
JuliuswinterGermany10+ Genome Project
JaggerwinterUS10+ Genome Project
RobiguswinterUK10+ Genome Project
ClairewinterUK10+ Genome Project
Norin61winterJapan10+ Genome Project
SY MattiswinterFrance10+ Genome Project
Spelt (PI190962)winterEurope10+ Genome Project
Tetraploid wheat
Zavitan*-IsraelAvni et al. (2017)
SvevospringItalyMaccaferri et al., 2019
KronosspringUS10+ Genome Project
  1. *‘Zavitan’ is a tetraploid wild emmer (T. dicoccoides) accession.

    Varieties included within the 10+ Wheat Genomes Project can be accessed through the Earlham Grassroot Genomics portal (https://wheatis.tgac.ac.uk/grassroots-portal/blast) and the 10+ Wheat Genomes project portal (http://webblast.ipk-gatersleben.de/wheat_ten_genomes) (subset of varieties in each). The ‘Svevo’ genome can be accessed through https://www.interomics.eu/durum-wheat-genome and Ensembl Plants. ‘Synthetic W7984’ and ‘Zavitan’ can be accessed through the Grassroot Genomics, and Ensembl Plants, respectively.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Nikolai M Adamski
  2. Philippa Borrill
  3. Jemima Brinton
  4. Sophie A Harrington
  5. Clémence Marchal
  6. Alison R Bentley
  7. William D Bovill
  8. Luigi Cattivelli
  9. James Cockram
  10. Bruno Contreras-Moreira
  11. Brett Ford
  12. Sreya Ghosh
  13. Wendy Harwood
  14. Keywan Hassani-Pak
  15. Sadiye Hayta
  16. Lee T Hickey
  17. Kostya Kanyuka
  18. Julie King
  19. Marco Maccaferrri
  20. Guy Naamati
  21. Curtis J Pozniak
  22. Ricardo H Ramirez-Gonzalez
  23. Carolina Sansaloni
  24. Ben Trevaskis
  25. Luzie U Wingen
  26. Brande BH Wulff
  27. Cristobal Uauy
(2020)
A roadmap for gene functional characterisation in crops with large genomes: Lessons from polyploid wheat
eLife 9:e55646.
https://doi.org/10.7554/eLife.55646