1. Chromosomes and Gene Expression
  2. Genetics and Genomics
Download icon

The genome of the Hi5 germ cell line from Trichoplusia ni, an agricultural pest and novel model for small RNA biology

  1. Yu Fu
  2. Yujing Yang
  3. Han Zhang
  4. Gwen Farley
  5. Junling Wang
  6. Kaycee A Quarles
  7. Zhiping Weng  Is a corresponding author
  8. Phillip D Zamore  Is a corresponding author
  1. Boston University, United States
  2. University of Massachusetts Medical School, United States
Research Article
Cite as: eLife 2018;7:e31628 doi: 10.7554/eLife.31628
8 figures, 2 tables, 3 data sets and 11 additional files


Figure 1 with 3 supplements
Chromosomes and genes in the T. ni genome based on data from the Hi5 cell line.

(A) Genome assembly and annotation workflow. (B) An example of a DAPI-stained spread of Hi5 cell mitotic chromosomes used to determine the karyotype. (C) Phylogenetic tree and orthology assignment of T. ni with 18 arthropod and two mammalian genomes. Colors denote gene categories. The category 1:1:1 represents universal single-copy orthologs, allowing absence and/or duplication in one genome. N:N:N orthologs include orthologs with variable copy numbers across species, allowing absence in one genome or two genomes from different orders. Lepidoptera-specific genes are present in at least three of the four lepidopteran genomes; Hymenoptera-specific genes are present in at least one wasp or bee genome and at least one ant genome. Coleoptera-specific genes are present in both coleopteran genomes; Diptera-specific genes present in at least one fly genome and one mosquito genome. Insect indicates other insect-specific genes. Mammal-specific genes are present in both mammalian genomes. The phylogenetic tree is based on the alignment of 1:1:1 orthologs.

Figure 1—figure supplement 1
Hi5 cell Karyotyping.

Thirty images showing the numbers of chromosomes (N) in Hi5 cells. N ranged from 103 to 122; mean ± S.D.=111.7 ± 5.45. Since lepidopteran cell lines are typically tetraploid, the haploid genome likely contains 28 (mean ± S.D.=27.9 ± 1.36) pairs of chromosomes.

Figure 1—figure supplement 2
Phylogenetic tree of 21 species showing the scale, branch lengths and bootstrap support.

Strict 1:1:1 orthologs were used to compute the phylogenetic tree using the maximum likelihood method. Black, branch length; red, bootstrap support.

Figure 1—figure supplement 3
Opsins in insects.
Figure 2 with 2 supplements
T. ni males are ZZ and females are ZW.

(A) Normalized contig coverage in males and females. (B) Relative repeat content, gene density, transcript abundance (female and male thoraces), and piRNA density of autosomal, Z-linked, and W-linked contigs (ovary). (C) Multiple sequence alignment of the conserved region of the sex-determining gene masc among the lepidopteran species.

Figure 2—figure supplement 1
T. ni sex determination and dosage compensation.

(A) Genomic coverage comparison of Z-linked, W-linked and autosomal contigs. Contig coverage was shuffled 1,000,000 times to calculate the coverage ratio. Outliers are not shown. (B) Autosomal, Z-linked and W-linked transcript abundance in Hi5 cells and T. ni tissues. (C) Transcript abundance ratios of autosomal, Z-linked, and W-linked genes in Hi5 cells and T. ni tissues. Error bars represent 95% confidence interval estimated from 1000 bootstrap replicates. (D). Sex-specific splicing of T. ni doublesex pre-mRNA.

Figure 2—figure supplement 2
CpG ratios and transposons.

(A) Distribution of observed-to-expected CpG ratios in protein-coding genes (left panel) and in 500 bp genomic windows (right panel) in A. mellifera, B. mori, D. plexippus, D. melanogaster, P. xylostella, T. castaneum, and T. ni. (B) Proportion of the genome occupied by transposons versus transposon sequence divergence. Sequence divergence was calculated by comparing individual transposon copies with the corresponding consensus sequence (see Materials and methods). (C) Repeat content in lepidopteran genomes.

miRNA expression in T. ni.

(A) Comparison of miRNA abundance in male and female T. ni thoraces. Solid circles, miRNAs with FDR < 0.1 and fold change >2. Outlined circles, all other miRNAs. (B) Comparison of the tissue distribution of the 44 most abundant miRNAs among T. ni ovaries, testes, and Hi5. (C) Heat map showing the abundance of miRNAs in (B). miRNAs are ordered according to abundance in ovary. Conservation status uses the same color scheme in (A).

Figure 4 with 2 supplements

(A) Distribution of siRNAs mapping to TNCL virus in the genomic (blue) and anti-genomic orientation (red). Inset: length distribution of TNCL virus-mapping small RNAs. (B) Distance between the 3′ and 5′ ends of siRNAs on opposite viral strands. (C) Distance between the 3′ and 5′ ends of siRNAs on the same viral strand. (D) Length distribution of small RNAs from unoxidized and oxidized small RNA-seq libraries. (E) Lepidopteran siRNAs are not 2′-O-methylated. The box plots display the ratio of abundance (as a fraction of all small RNAs sequenced) for each siRNA in oxidized versus unoxidized small RNA-seq libraries. The tree shows the phylogenetic relationships of the analyzed insects. Outliers are not shown.

Figure 4—figure supplement 1
T. ni siRNAs.

(A) siRNA length distributions for multiple insects in oxidized and unoxidized small RNA-seq libraries. (B) Length distribution of fully matched and tailed TNCL virus-siRNAs.

Figure 4—figure supplement 2
Loading asymmetry of siRNAs mapping to TNCL RNA1 (A) and RNA2 (B).

For each single-stranded siRNA species, we searched for siRNAs on the other strand that when paired produce a typical siRNA duplex with two-nucleotide overhanging 3′ ends.

Figure 5 with 2 supplements
piRNAs and miRNAs in the T. ni genome.

(A) Abundance of mRNAs encoding piRNA pathway proteins) in Hi5 cells, ovary, testis, and thorax. (B) Ideogram displaying the positions of miRNA genes (arrowheads) and piRNA clusters in the T. ni genome. Color-coding reports tissue expression for Hi5 cells, ovaries, testis, and thorax. Contigs that cannot be placed onto chromosome-length scaffolds are arbitrarily concatenated and are marked ‘Un.’ (C) Distribution of piRNAs among the autosomes, Z, and W chromosomes in Hi5 cells, ovary, testis, and female and male thorax, compared with the fraction of the genome corresponding to autosomes, W, and Z chromosomes.

Figure 5—figure supplement 1
piRNA abundance (ppm) along the most productive piRNA cluster.

Top, fixed scale (some data clipped); bottom, auto-scaled.

Figure 5—figure supplement 2
T. ni piRNAs.

(A) piRNA clusters tend to produce piRNAs that are antisense to transposons. The x-axis represents the ratio of piRNAs from the plus strand to piRNAs from the minus strand, with the dotted lines indicating twofold difference. The y-axis indicates the ratio of transposons lengths on the plus strand over transposon length on the minus strand. The solid line indicates regression line and shading indicates 95% confidence interval by LOWESS. Boxplot shows fractions of antisense transposons (i.e. transposons inserted opposite to the direction of piRNAs precursor transcription) in dual- and uni-strand clusters. Outliers are not shown. Wilcoxon rank-sum test. (B) piRNA densities on autosomal, Z-linked and W-linked contigs in Hi5 cells, ovary, testis, and female and male thorax. (C) Abundance of piRNAs from putative W-linked genes.

Figure 6 with 1 supplement
T. ni piRNAs.

(A) Hi5-specific piRNA clusters contain younger transposon copies. RC, rolling-circle transposons; LINE, Long interspersed nuclear elements; LTR, long terminal repeat retrotransposon; DNA, DNA transposon. (B) Comparison of piRNA abundance per cluster in female and male thorax. (C) piRNA precursors are rarely spliced. The number of introns supported by exon-exon junction-mapping reads is shown for protein-coding genes and for piRNA clusters for each tissue or cell type. (D) piRNA precursors are inefficiently spliced. Splicing efficiency is defined as the ratio of spliced over unspliced reads. Splice sites were categorized into those inside and outside piRNA clusters. Outliers are not shown.

Figure 6—figure supplement 1
T. ni piRNA clusters.

(A) Comparison of piRNA abundance (ppm) from ovary and Hi5 piRNA-producing loci and from ovary and testis piRNA-producing loci. (B) piRNA cluster lengths in T. ni ovary, testis, thorax, and Hi5 cells. (C) Motifs around intron boundaries of predicted protein-coding gene models within and outside of piRNA clusters.

Genome editing in Hi5 cells.

(A) Strategy for using Cas9/sgRNA RNPs to generate a loss-of-function TnPiwi deletion allele. Red, protospacer-adjacent motif (PAM); blue, protospacer sequence. Arrows indicate the diagnostic forward and reverse primers used in PCR to detect genomic deletions (Δ). Sanger sequencing of the ~1700 bp PCR products validated the TnPiwi deletions. (B) An example of PCR analysis of a TnPiwi deletion event. (C) Strategy for using Cas9/sgRNA RNPs and a single-stranded DNA homology donor to insert EGFP and an HA-tag in-frame with the vasa open-reading frame. (D) An example of PCR analysis of a successful HDR event. DNA isolated from wild type (WT) and FACS-sorted, EGFP-expressing Hi5 cells (HDR) were used as templates.

Hi5 cells contain nuage.

(A) Schematic of single-clone selection of genome-edited Hi5 cells using the strategy described in Figure 7C. (B) A representative field of Hi5 cells edited to express EGFP-HA-Vasa from the endogenous locus. (C) A representative image of a fixed, EGFP-HA-Vasa-expressing Hi5 cell stained with DAPI, anti-EGFP and anti-HA antibodies. EGFP and HA staining colocalize in a perinuclear structure consistent with Vasa localizing to nuage.



Table 1
Genome and gene set statistics for T. ni and B. mori (International Silkworm Genome Consortium, 2008).

Cytochrome P450s, glutathione S-transferases, carboxylesterases, and ATP-binding cassette transporters for B. mori were retrieved from (Yu et al., 2008; Yu et al., 2009; Ai et al., 2011; Liu et al., 2011b).

T. niB. mori
Genome metrics
 Genome size (Mb)368.2431.7
 Chromosome count2828
 Scaffold N50 (Mb)14.23.7
 Contig N50 (kb)621.915.5
 Mitochondrial genome (kb)15.815.7
Quality control metrics
 BUSCO complete (%)97.595.5
 CRP genes (%)100%100%
 OXPHOS genes (%)100%100%
Genomic features
 Repeat content (%)20.5%43.6%
 GC content35.6%37.3%
 CpG (O/E)1.071.13
 Coding (%)5.584.11
 Sex chromosomesZWZW
Gene statistics
 Protein-coding genes14,04314,623
with Pfam matches92959685
with GO terms979010,148
 Cytochrome P450 proteins10883
 Glutathione S-transferases3423
 ATP-binding cassette transporters5451
 Universal orthologs lost15675
 Species-specific genes30982313
Key resources table
Reagent type (species)
or resource
DesignationSource or referenceIdentifiersAdditional information
gene (Trichoplusia ni)vasathis paperTNI000568
gene (T. ni)ciwithis paperTNI008009
biological sample (T. ni)Somatic tissueBenzon Researchmale pupa
biological sample (T. ni)Somatic tissueBenzon Researchfemale pupa
biological sample (T. ni)ThoraxBenzon Researchmale adult
biological sample (T. ni)TestesBenzon Researchmale adult
biological sample (T. ni)ThoraxBenzon Researchfemale adult
biological sample (T. ni)OvariesBenzon Researchfemale adult
cell line (T. ni)High Five (BTI-TN-5B1-4)Thermo FisherThermo Fisher: B85502wild type cell line
cell line (T. ni)EGFP-HA-Vasathis paperpolyclonal stable cell line
cell line (T. ni)Ciwi-mCherrythis papermonoclonal stable cell line
recombinant proteinEnGen Cas9 NLSNew England BiolabsNew England Biolabs: M0646T
antibodyanti-GFP (mouse monoclonal)Developmental Studies Hybridoma BankDSHB: DSHB-GFP-1D2; RRID:AB_2617419(1:200)
antibodyanti-HA (rabbit monoclonal)Cell Signaling TechnologyCell Signaling Technology: 3724; RRID:AB_1549585(1:200)
antibodyAlexa Fluor 488-labeled donkey anti-mouseThermo FisherThermo Fisher: A-21202(1:500)
antibodyAlexa Fluor 680-labeled donkey anti-rabbitThermo FisherThermo Fisher: A10043(1:500)
recombinant DNA reagentEGFP-HA-Vasa (linear dsDNA)this papersynthesized gBlock from Integrated DNA technologies
sequence based reagents (DNA oligos)GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCthis papertracr RNA CoreUsed as template for sgRNA in vitro transcription
sequence based reagents (DNA oligos)CATTTTGTGTTTCTCAACACTGGthis papersgRNA1sgRNA target site for ciwi deletion (PAM)
sequence based reagents (DNA oligos)GGTACGGTGAGAAGCTCTACCGGthis papersgRNA2sgRNA target site forciwi deletion (PAM)
sequence based reagents (DNA oligos)GCTCAGTAGTAATAGATTTATGGthis papersgRNA3sgRNA target site for EGFP-HA-vasa mutation (PAM)
sequence based reagents (DNA oligos)GGATGATGGTGTCGGTGATGTGGthis papersgRNA4sgRNA target site for EGFP-HA-vasa mutation (PAM)
sequence based reagents (DNA oligos)ATGCTGCAGCTCCGGCGCGTAGGthis papersgRNA5sgRNA target site for mCherry-ciwi knockout (PAM)
sequence based reagents (DNA oligos)TTTTCAATAACCCAAACATATGGthis papersgRNA6sgRNA target site for mCherry-ciwi knockout (PAM)
sequence based reagents (DNA oligos)CtaatacgactcactataGGCATTTTGTGTTTCTCAACACgttttagagctthis paperT7-sgRNA1 forward primerForward primer for sgRNA in vitro transcription template generation
sequence based reagents (DNA oligos)CtaatacgactcactataGGGGTACGGTGAGAAGCTCTACgttttagagctthis paperT7-sgRNA2 forward primerForward primer for sgRNA in vitro transcription template generation
sequence based reagents (DNA oligos)CtaatacgactcactataGGGCTCAGTAGTAATAGATTTAgttttagagctthis paperT7-sgRNA3 forward primerForward primer for sgRNA in vitro transcription template generation
sequence based reagents (DNA oligos)CtaatacgactcactataGGGGATGATGGTGTCGGTGATGgttttagagctthis paperT7-sgRNA4 forward primerForward primer for sgRNA in vitro transcription template generation
sequence based reagents (DNA oligos)CtaatacgactcactataGGATGCTGCAGCTCCGGCGCGTgttttagagctthis paperT7-sgRNA5 forward primerForward primer for sgRNA in vitro transcription template generation
sequence based reagents (DNA oligos)CtaatacgactcactataGGTTTTCAATAACCCAAACATAgttttagagctthis paperT7-sgRNA6 forward primerForward primer for sgRNA in vitro transcription template generation
sequence based reagents (DNA oligos)GCACCGACTCGGTGCCACTthis papersgRNA reverse primerReverse primer for sgRNA in vitro transcription template generation
sequence based reagents (DNA oligos)/Biotin/CGAATCGAAATCTAAGGCAAGthis papervasa donor forwardForward primer for vasa HDR donor amplification
sequence based reagents (DNA oligos)ATCTTTGGTGTGAGCTCAAGCthis papervasa donor reverseReverse primer for vasa HDR donor amplification
sequence based reagents (DNA oligos)GCTATTTACCTACACAAACCAATTTthis paperciwi deletion forwardForward primer for ciwi deletion detection
sequence based reagents (DNA oligos)ACCACGACGTGATCCAthis paperciwi deletion reverseReverse primer for ciwi deletion detection
sequence based reagents (DNA oligos)TGACTTGTGAATCCTTGGTTACthis papervasa HR forwardForward primer for vasa HR detection
sequence based reagents (DNA oligos)CATTTTCATAATCCCTTGGTTCTCthis papervasa HR reverseReverse primer for vasa HR detection
sequence based reagents (DNA oligos)GCGATAAATTGTTGGAAACthis paperGFP-HA-Vasa N-FwForward primer for vasa HR insertion junction sequencing
sequence based reagents (DNA oligos)TCATCCATCCCGCTACthis paperGFP-HA-Vasa N-RvReverse primer for vasa HR insertion junction sequencing
sequence based reagents (DNA oligos)GTTTAGAAACATGgtgagcaaggthis paperGFP-HA-Vasa C-FwForward primer for vasa HR insertion junction sequencing
sequence based reagents (DNA oligos)CATTTTCATAATCCCTTGGTTCTCthis paperGFP-HA-Vasa C-RvReverse primer for vasa HR insertion junction sequencing
sequence based reagents (DNA oligos)GTAAAACGACGGCCAGthis paperM13 (-20) FwForward primer for colony PCR
sequence based reagents (DNA oligos)CAGGAAACAGCTATGACthis paperM13 RvReverse primer for colony PCR
commercial kitExpress Five Serum Free MediumThermo FisherThermo Fisher: 10486025Supplemented with 16mM L-Glutamine
commercial kitNextSeq 500/550 High Output v2 kit (150 cycles)IlluminaIllumina: FC-404-2005
commercial kitNextSeq 500/550 High Output v2 kit (75 cycles)IlluminaIllumina: FC-404-2002
commercial kitNextera Mate Pair Sample Prep KitIlluminaIllumina: FC-132-1001
commercial kitTruSeq DNA LT Sample Prep KitIlluminaIllumina: FC-121-2001
commercial kitQubit dsDNA HS Assay kitThermo FisherThermo Fisher: Q32851
commercial kitSMRTbell Template Prep Kit 1.0 SPv3Pacific BiosciencesPacific Biosciences: 100-991-900
commercial kitProLong Gold Antifade Mountant with DAPIThermo FisherThermo Fisher: P36931
commercial kitMirVana miRNA isolation kitThermo FisherThermo Fisher: AM1561
commercial kitRibo-Zero Gold kit (Human/Mouse/Rat)Epicentreepicentre: MRZG12324
commercial kitTrans-IT insect transfection reagentMirus BioMirus Bio:MIR 6104
commercial kitQIAquick Gel Extraction KitQIAGENQIAGEN:28704
commercial kitZero Blunt TOPO PCR Cloning KitThermo FisherThermo Fisher: K280020
commercial kitM-280 streptavidin DynabeadsThermo FisherThermo Fisher: 11205D
softwareonline CRISPR design toolhttp://crispr.mit.edu/PMID: 23873081
chemical compoundproteinase KSigma AldrichSigma Aldrich: RPROTK-RO
chemical compoundphenol:chloroform:isoamyl alcoholSigma AldrichSigma Aldrich: P2069
chemical compoundRNase ASigma AldrichSigma Aldrich: R4642
chemical compoundKaryoMAX Colcemid Solution in PBSLife TechnologiesLife Technologies: 15212012
chemical compoundTriton X-100Thermo FisherThermo Fisher: NC1365296
chemical compoundPBSLife TechnologiesLife Technologies: 10010049
chemical compound16% formaldehydeThermo FisherThermo Fisher: 28908
chemical compoundPhotoflo 200Detek IncDetek Inc: 1464510
other22 x 22 mm cover slipsThermo FisherThermo Fisher:12541B
other6-well plateCorningCorning: 351146
otherTranswell 96-well ReceiverCorning Life Sciences PlasticCorning Life Sciences Plastic: 3382
SoftwareCanu v1.3doi:10.1101/gr.215087.116
SoftwareBUSCO v3doi:10.1093/bioinformatics/btv351

Data availability

The following data sets were generated
  1. 1
  2. 2
The following previously published data sets were used
  1. 1

Additional files

Supplementary file 1

T. ni genome statistics.

(A) BUSCO assessments of T. ni and six other genomes. (B) CRP genes. (C) Genes in the OXPHOS pathway. (D) Genome comparisons. Genomes assembled using paired-end DNA-seq data from male and female T. ni pupae are compared with the Hi5 genome as the reference. The dot plots show genome alignments for contigs ≥ 1 kb. (E) Numbers of genes in lepidopteran genomes. (F) Positions of telomeric repeats: position of (TTAGG)n longer than 100 nt. (G) Transposons in T. ni subtelomeric regions. (H) Repeat statistics for the T. ni genome. (I) Transposon family divergence rates. (J) Manual curation of W-linked protein-coding genes and miRNAs.

Supplementary file 2

Genes encoding small RNA pathway proteins.

(A) Genes encoding miRNA and siRNA pathway proteins. (B) Genes encoding piRNA pathway proteins (grouped by sequence orthology).

Supplementary file 3

T. ni miRNAs, siRNAs and piRNAs.

(A) miRNA annotation. (B) Mapping statistics for endogenous siRNAs in T. ni and D. melanogaster. (C) piRNA cluster lengths. piRNA cluster coordinates in Hi5 (D), ovary (E), testis (F), female thorax (G), and male thorax (H).

Supplementary file 4

mirDeep2 output for T. ni miRNAs

Supplementary file 5

Genomes used in this study.

Supplementary file 6

T. ni detoxification-related genes.

(A) P450 gene counts by clade in T. ni and B. mori. (B) Sequences of P450 proteins. (C) Sequences of glutathione-S-transferase proteins. (D) Carboxylesterase gene counts by clade in T. ni and B. mori. (E) Sequences of carboxylesterase proteins. (F) ATP-binding cassette transporter gene counts by clade in T. ni and B. mori. (G) Sequences of ATP-binding cassette transporter proteins.

Supplementary file 7

T. ni chemoreception genes.

(A) Sequences of olfactory receptor proteins. (B) Sequences of gustatory receptor proteins. (C) Sequences of ionotropic receptor proteins.

Supplementary file 8

Genes in the juvenile hormone biosynthesis and degradation pathways.

Supplementary file 9

Genome-modified sequences.

Supplementary file 10

Single-stranded DNA donor purification

Transparent reporting form

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)