Tiled-ClickSeq for targeted sequencing of complete coronavirus genomes with simultaneous capture of RNA recombination and minority variants

  1. Elizabeth Jaworski
  2. Rose M Langsjoen
  3. Brooke Mitchell
  4. Barbara Judy
  5. Patrick Newman
  6. Jessica A Plante
  7. Kenneth S Plante
  8. Aaron L Miller
  9. Yiyang Zhou
  10. Daniele Swetnam
  11. Stephanea Sotcheff
  12. Victoria Morris
  13. Nehad Saada
  14. Rafael RG Machado
  15. Allan McConnell
  16. Steven G Widen
  17. Jill Thompson
  18. Jianli Dong
  19. Ping Ren
  20. Rick B Pyles
  21. Thomas G Ksiazek
  22. Vineet D Menachery
  23. Scott C Weaver
  24. Andrew L Routh  Is a corresponding author
  1. Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, United States
  2. ClickSeq Technologies LLC, United States
  3. World Reference Center for Emerging Viruses and Arboviruses, University of Texas Medical Branch, United States
  4. Department of Microbiology and Immunology, The University of Texas Medical Branch, United States
  5. Department of Pediatrics, University of Texas Medical Branch, United States
  6. Department of Pathology, University of Texas Medical Branch, United States
  7. Institute for Human Infections and Immunity, University of Texas Medical Branch, United States
  8. Next-Generation Sequencing Core, The University of Texas Medical Branch, United States
  9. Sealy Centre for Structural Biology and Molecular Biophysics, University of Texas Medical Branch, United States
9 figures, 4 tables and 4 additional files

Figures

Schematic of Tiled-ClickSeq and computational pipeline: (A) Schematic of SARS-CoV-2 genome with two examples of sub-genomic mRNAs.

(B) Paired-primer approaches typically generate short amplicons flanked by upstream and downstream primers that are PCR amplified in non-overlapping pools. (C) Tiled-ClickSeq uses a single pool of primers at the reverse-transcription step with the upstream site generated by stochastic termination by azido-nucleotides. (D) 3’-Azido-blocked single-stranded cDNA fragments are ‘click-ligated’ using copper-catalyzed azide alkyne cycloaddition (CuAAC) to hexynyl functionalized Illumina i5 sequencing adaptors. Triazole-linked ssDNA is PCR amplified to generate a final cDNA library. (E) The structure of the final cDNA is illustrated indicating the presence of the i5 and i7 adaptors, the 12 N unique molecular identifier (UMI), the expected location of the triazole linkage, and the origins of the cDNA in the reads including the tiled primer-derived DNA, which is captured using paired-end sequencing. (F) The hypothetical read coverage over a viral genome is indicated in red, yielding overlapping ‘saw-tooth’ patterns of sequencing coverage. Longer fragment lengths with more extensive overlapping can be obtained using decreased AzNTP:dNTP ratios. (G) Final cDNA libraries are analyzed and size-selected by gel electrophoresis (2 % agarose gel). Duplicates of libraries synthesized from 8, 80, and 800 ng of input SARS-CoV-2 RNA input are shown. (H) Flowchart of the data processing and bioinformatic pipeline. Input data is in Blue, output data are in Green, scripts/processes are Purple.

Figure 2 with 2 supplements
Read coverage over the SARS-CoV-2 genome using Tiled-ClickSeq.

(A) Read coverage obtained from Tiled-ClickSeq over the whole viral genome is depicted when sequencing using an Illumina MiSeq (orange) or on an Oxford Nanopore Technologies MinION device (blue). A ‘saw-tooth’ pattern of coverage is observed with ‘teeth’ upstream of tiled-primers, indicated at the bottom of the plot by short black lines. (B) Zoomed in read coverage of nts 1–2400 of the SARS-CoV-2 genome with coverage of Illumina MiSeq reads from five individual primers coloured to illustrate coverage from downstream amplicons overlapping the primer-binding sites of upstream tiled-primers (Blue: Read coverage from primer 1; Orange: coverage from primer 2; Green: coverage from primer 3; Red: coverage from primer 4; Purple: coverage from primer 5).

Figure 2—figure supplement 1
Read coverage of tiled nanopore data over 12 SARS-CoV-2 isolates.

(A) Read coverage obtained from Tiled-ClickSeq over the whole viral genome for 12 World Reference Center for Emerging Viruses and Arboviruses (WRCEVA) isolates is depicted when using an Oxford Nanopore Technologies MinION device. Tiled-primers (v1) are indicated at the bottom of the plot by short blue lines. (B) Read count mapping statistics for each isolate are shown in the table.

Figure 2—figure supplement 2
Read coverage of tiled ARTIC data over 12 SARS-CoV-2 isolates: (A) Read coverage obtained from ARTIC sequencing protocol over the whole viral genome for 12 WRCEVA isolates is depicted when sequenced on an Illumina NextSeq.
Genome Reconstruction of 12 SARS-CoV-2 isolates deposited at the World Reference Center for Emerging Viruses and Arboviruses (WRCEVA).

(A) Read coverage is depicted over the 5’ UTR of the SARS-CoV-2 genome for each isolate revealing capture of this region. The 5’-most primer from the ARTICv3 protocol at nts-30–54 is illustrated. (B) Snapshot of read data from Tiled-ClickSeq is depicted using the Tablet Sequencing Viewer from WRCEVA_000508 over the same region of the 5’UTR as A. (C) The most common single-nucleotide variants (SNVs) found in complete genome reconstructions from all 12 isolates are illustrated and colour-coded to depict the underlying viral protein. (D) Phylogenetic tree of 12 WRCEVA isolates with their corresponding clade indicated.

Additional tiled-primers improves read coverage and allows identification of minority variants.

(A) Read coverage obtained from Tiled-ClickSeq over the whole viral genome is depicted using an Illumina MiSeq when using the original primers as in Figure 2 (v1 - blue) or with an additional 326 tiled-primers (v3 - pink). Tiled-primers are indicated at the bottom of the plot by short blue (v1) or pink (v3) lines. (B) The rates of mismatching nucleotides found in mapped NGS reads is depicted across the SARS-CoV-2 genome for isolate WRECVA_000508 prior to trimming the tiled primers from forward/‘R1’ reads and without PCR deduplication. (C) The rates of mismatching is also depicted after data quality processing to remove PCR duplicates and primer-derived nucleotides in the reads, revealing three minority variants in this sample with frequencies > 2%.

Figure 4—source data 1

The frequency of all mapped nucleotides at each genome coordinate for each WRCEVA isolate is provided.

The reference genome, nucleotide coordinate and expected reference Nucleotide is provided. Total read coverage and the numbers of each non-reference nucleotide are also shown. Finally, the mismatch/error rate at each site is provided which reveals minority variants in each isolate.

https://cdn.elifesciences.org/articles/68479/elife-68479-fig4-data1-v1.xlsx
Figure 5 with 1 supplement
Tiled-ClickSeq identifies sub-genomic mRNAs, structural variants, and defective-RNAs.

(A) A table of the most common RNA recombination events found using Tiled-ClickSeq to study ‘World Reference Center for Emerging Viruses and Arboviruses’ (WRCEVA) isolates. The recombination junctions are indicated on the left of the table, with their relative frequencies indicated in the table and colour-matched for each sample analyzed. All canonical sgmRNAs are found with their open-reading frame (ORF) indicated, in addition to one non-canonical sgmRNA (*). Three common structural variants including two deletions in spike protein and a deletion in ORF7a were also detected. (B) Unique RNA recombination events are plotted for 16 WRCEVA isolates as a scatter plots whereby the upstream ‘donor’ site is plotted on the y-axis and a downstream ‘acceptor’ site is plotted on x-axis. The read count for each unique RNA recombination event is indicated by the size of the point, while the number of samples in which this each RNA recombination event is found is indicated by the colour-bar. Insertions/duplication/back-splicing events are found above the x = y axis, while deletions and RNA recombination events yielding sgmRNAs are found below.

Figure 5—source data 1

Snapshot of Tiled-ClickSeq reads from icSARS-CoV-2 delta-PRRA.

https://cdn.elifesciences.org/articles/68479/elife-68479-fig5-data1-v1.zip
Figure 5—figure supplement 1
Integrative Genomics Viewer (IGV) snapshot of Tiled-ClickSeq data over icSARS-CoV-2 delta PRRA: A full-view of the SARS-CoV-2 genome with mapped Tiled-ClickSeq reads is depicted using Integrative Genomics Viewer.

Individual reads are illustrated with short grey lines. Recombination events mapped by ViReMa are illustrated by light-blue lines. Common variants engineered into the icSARS clone are indicated by vertical coloured striations.

Figure 6 with 1 supplement
Tiled-ClickSeq for surveillance of SARS-CoV-2 from nasopharyngeal (NP) swabs collected during routine diagnostics of COVID19 at UTMB.

(A) The percent genome coverage with greater than 10 reads is plotted as a function of the measured CT value (x-axis) for each clinical sample sequenced. Each point is colour-coded according to the batch of NGS libraries synthesized. (B) Sequence reads for one of the SARS-CoV-2 samples from the B.1.1.7 lineage are illustrated using Tablet sequence viewer to indicate the U to C transition at nt 2 (U2C) of the SARS-CoV-2 genome. (C) Sequence alignments of the 5’UTR of the consensus genomes of 18 clinical samples assayed illustrates the U2C SNVs found in each B.1.1.7 variant as well as a U13C and C21U in two other B.1.2 variants. (D) The structure of the first 35 nts of the SARS-CoV-2 5’UTR is illustrated which contains Stem Loop 1. The three SNVs identified in the consensus genomes of clinical samples are indicated.

Figure 6—figure supplement 1
Read coverage of Tiled-ClickSeq data over 60 SARS-CoV-2 clinical specimens: Read coverage obtained from Tiled-ClickSeq over the whole viral genome for 60 SARS-CoV-2 clinical specimens is depicted when sequenced on an Illumina NextSeq.

Samples are batched depending upon their measure CT value, ( < 17.5, 17.5–20, 20–25, 25–30, > 30, and unmeasured: ‘NA’) and plotted separately in a colour-coded manner.

Figure 7 with 1 supplement
Tiled-ClickSeq identifies sub-genomic mRNAs, structural variants and Defective-RNAs in clinical samples of SARS-CoV-2.

Similarly to Figure 5B, unique RNA recombination events are plotted for 36 clinical samples as a scatterplot whereby the upstream ‘donor’ site is plotted on the y-axis and a downstream ‘acceptor’ site is plotted on x-axis using the WA-1 reference coordinates for each sample. The read count for each unique RNA recombination event is indicated by the size of the point, while the number of samples in which this each RNA recombination event is found is indicated by the colour-bar. Insertions/duplication/back-splicing events are found above the x = y axis, while deletions and RNA recombination events yielding sgmRNAs are found below.

Figure 7—source data 1

BED files of RNA recombination events detected by ViReMa in the Tiled-ClickSeq data from each WRCEVA isolate and clinical sample.

https://cdn.elifesciences.org/articles/68479/elife-68479-fig7-data1-v1.zip
Figure 7—figure supplement 1
Tiled-ClickSeq identifies sub-genomic mRNAs, structural variants of clinical samples of SARS-CoV-2.

A table of the most common RNA recombination events found using Tiled-ClickSeq in this study, analogues to Figure 5A. The recombination junctions are indicated on the left of the table, with their relative frequencies indicated in the table and colour-matched for each sample analyzed. All canonical sgmRNAs are found with their open-reading frame (ORF) indicated, in addition to one non-canonical sgmRNAs (*). Three common structural variants including two deletions in spike protein and a deletion in ORF7a are also shown.

Author response image 1
Similarly to Figure 5B in the manuscript, unique RNA recombination events are plotted for WRCEVA samples as a scatter plots whereby the upstream ‘donor’ site is plotted on the y-axis and a downstream ‘acceptor’ site is plotted on x-axis.

The read count for each unique RNA recombination event is indicated by the size of the point, while the number of samples in which this each RNA recombination event is found is indicated by the colour-bar. Insertions/duplication/back-splicing events are found above the x=y axis, while deletions and RNA recombination events yielding sgmRNAs are found in Author response image 2.

Author response image 2
Same as Author response image 1, except that read data used in the analysis have been trimmed by 25nts to remove any potential primer-derived sequences that may give rise to artifactual chimeric reads.

Tables

Table 1
Read counts and mapping rates for random-primed versus Tiled-ClickSeq approaches.
SampleCTClickSeq readsVirus mapped% Viral ReadsTiled v1 readsVirus mapped% Viral Reads
WRCEVA_0050112.94,665,869116,0362.5%2,359,7952,204,75093.4%
WRCEVA_0050212.94,989,513118,2602.4%1,962,5811,820,92592.8%
WRCEVA_0050512.73,894,32571,8091.8%2,779,6722,482,85489.3%
WRCEVA_0050612.54,979,989108,5322.2%2,395,7502,148,25689.7%
WRCEVA_0050712.95,659,073161,0592.8%2,056,6701,867,01290.8%
WRCEVA_0050816.83,987,00991,4522.3%1,787,4181,433,00580.2%
WRCEVA_0050917.14,057,92857,4241.4%2,202,6611,856,63384.3%
WRCEVA_0051016.25,328,82965,2811.2%2,040,3321,601,54478.5%
WRCEVA_0051316.04,391,17569,1691.6%1,641,2131,455,99188.7%
WRCEVA_0051412.94,340,08484,2111.9%2,089,2411,902,74891.1%
WRCEVA_0051515.75,416853102,1791.9%2,205,1661,915,12986.8%
WRCEVA_0051617.44,290,92961,0171.4%1,988,9391,715,44886.2%
Table 2
Minority variants and rates ( > 2%) found across 16 WRCEVA isolates.
SampleNtNucReadDepthAUGCVariantRateLocationResult
WRCEVA_00050112,049C2,116095120204.5%ORF1abN3928K
WRCEVA_00050210,207C2,240011802,1225.3%--
WRCEVA_00050216,050U3,85303,322053113.8%--
WRCEVA_00050217,489A4,5974,433162113.6%ORF1abE5742V
WRCEVA_00050221,526A8,7496,50802,240125.6%ORF1abI7088V
WRCEVA_00050314,220C1,638146301,17428.3%--
WRCEVA_0005041,556A2,8282,4990328111.6%ORF1abI431V
WRCEVA_00050427,925C2,857013402,7234.7%ORF8T11I
WRCEVA_00050719,515A2,3932,29519704.1%--
WRCEVA_0005089,756G1,3762801,34802.1%ORF1abR3164H
WRCEVA_00050826,056G2092086200604.1%ORF3aD222Y
WRCEVA_00050827,556G20661280193806.2%ORF7aA55T
WRCEVA_00050911,956C1962019901,76310.1%--
WRCEVA_00050917,245C4,062247003,59011.6%ORF1abR5661C
WRCEVA_00050918,005U5,40814,94945808.5%ORF1abL5915R
WRCEVA_00050925,569U3,44843,32611353.5%--
WRCEVA_00050927,919U83908090303.6%ORF8I9T
WRCEVA_00050928,767C20110109019025.4%NT165I
WRCEVA_0005113,003U2,880792,7871132.7%ORF1abV913E
WRCEVA_00051110,738U4,58004,44001403.1%--
WRCEVA_00051125,892U1330130032.3%ORF3aI167T
WRCEVA_00051128,001G1,4141291,38402.1%--
WRCEVA_00051327,046C5,539013805,4012.5%MT175M
WRCEVA_00051411,603A5,4055,075033006.1%ORF1abM3780V
WRCEVA_00051426,526G52502050503.8%MA2S
Table 3
Micro-indels and rates ( > 2%) found across 16 WRCEVA isolates.
SampleMicroInDelNucsVariantRateLocationResult
WRCEVA_000502Δ519^523UGGUU2.2%ORF1ABFrameshift
WRCEVA_000504Δ29686^29,693CAGUGUGU3.5%3’UTR-
WRCEVA_000505Δ519^523UGGUU2.9%ORF1ABFrameshift
WRCEVA_000506Δ519^523UGGUU3.8%ORF1ABFrameshift
WRCEVA_000509Δ1237^1,239UCA2.9%ORF1ABΔH325
WRCEVA_000510Δ686^694AAGUCAUUU5.1%ORF1abΔLSF141-143
WRCEVA_000511Δ519^523UGGUU3.7%ORF1ABFrameshift
WRCEVA_000511Δ10811^10,813CUU3.1%ORF1ABΔL3516
WRCEVA_000512Δ29750^29,759GAUCGAGUG10.0%3’UTR-
Key resources table
Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional information
Chemical compound, drugAzNTPsBaseClick
  • BCT-25, BCT-26, BCT-27, BCT-28

Sequence-based reagent5’-hexynyl-functionalized i5 oligoIDT
  • DNA oligo

Sequence-based reagentTCSv3This PaperPCR PrimersSource data 1
Software, algorithmBash and python3 text filesThis Paper.txt and.py filesSource data 2

Additional files

Transparent reporting form
https://cdn.elifesciences.org/articles/68479/elife-68479-transrepform1-v1.docx
Source data 1

Annotations and sequences of tiled-primers used in this manuscript are provided in BED format.

https://cdn.elifesciences.org/articles/68479/elife-68479-supp1-v1.zip
Source data 2

Batch scripts provided all computational tools and parameters used and python3 scripts used in this study are provided.

https://cdn.elifesciences.org/articles/68479/elife-68479-supp2-v1.zip
Source data 3

A summary of all Single-Nucleotide Variants (SNVs) detected for all samples sequenced in this study are provided.

Each unique sample/isolate is listed, together with the SNVs relative to the WA-1 (NC_045512.2) strain in different NGS library preparation methods and sequencing platforms. The accession number for each reconstructed genome deposited in GenBank is also indicated.

https://cdn.elifesciences.org/articles/68479/elife-68479-supp3-v1.xlsx

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Elizabeth Jaworski
  2. Rose M Langsjoen
  3. Brooke Mitchell
  4. Barbara Judy
  5. Patrick Newman
  6. Jessica A Plante
  7. Kenneth S Plante
  8. Aaron L Miller
  9. Yiyang Zhou
  10. Daniele Swetnam
  11. Stephanea Sotcheff
  12. Victoria Morris
  13. Nehad Saada
  14. Rafael RG Machado
  15. Allan McConnell
  16. Steven G Widen
  17. Jill Thompson
  18. Jianli Dong
  19. Ping Ren
  20. Rick B Pyles
  21. Thomas G Ksiazek
  22. Vineet D Menachery
  23. Scott C Weaver
  24. Andrew L Routh
(2021)
Tiled-ClickSeq for targeted sequencing of complete coronavirus genomes with simultaneous capture of RNA recombination and minority variants
eLife 10:e68479.
https://doi.org/10.7554/eLife.68479