(A, B) Volcano plots showing differentially expressed genes from Tg(kdrl:HRAS-mCherry)s896-positive and negative (kdrlpos and kdrlneg) cells identified using RNA-seq reads quantified with (A) RefSeq …
DESeq2 output for kdrlpos and kdrlneg RNA-seq quantified with RefSeq (GCF_000002035.6_GRCz11; worksheet 1) or Ensembl, v95 (worksheet 2).
Gene expression levels were quantified using RSEM. Median ratio normalized expression values are shown for each replicate, along with adjusted p-value, and log2 fold change. Data used to generate plots in Figure 1A,B, and incorporated into source data tables indicated below.
Intersection of kdrlpos-enriched genes from RefSeq and Ens95 commonly annotated by NCBI ID.
Gene symbol, along with matching Ensembl gene ID and NCBI ID, as well as differential annotation (i.e., identified as differentially expressed only in RefSeq, Ens95, or both) are indicated. Expression data are derived from Figure 1—source data 1. Data used to generate plots in Figure 1C–G.
(A) Log10 average expression (n = 3) for kdrlpos-enriched genes as quantified by each indicated annotation. Each separate plot shows genes identified as kdrlpos-enriched only using or Ens95 or …
DESeq2 output for pdgfrbpos and pdgfrbneg RNA-seq quantified with RefSeq (GCF_000002035.6_GRCz11) or Ensembl, v95.
Gene expression levels were quantified using RSEM. Median ratio normalized expression values are shown for each replicate, along with adjusted p-value, and log2 fold change. Data used to generate plots in Figure 1—figure supplement 1C–E, and incorporated into source data tables indicated below.
Intersection of pdgfrbpos-enriched genes from RefSeq and Ens95 commonly annotated by NCBI ID.
Gene symbol, along with matching Ensembl gene ID and NCBI ID, as well as differential annotation (i.e., identified as differentially expressed only in RefSeq, Ens95, or both) are indicated. Expression data are derived from Figure 1—source data 1. Data used to generate plots in Figure 1—figure supplement 1B,F–H.
(A, B) Log10 average expression as quantified using indicated annotation for (A) kdrlpos- or (B) pdgfrbpos-enriched genes identified as such only in RefSeq and lacking an Ens95 3' UTR annotation. …
Missing 3' UTR annotations in RefSeq and Ens95.
This file includes lists of Ens95 (worksheet 1) and RefSeq (worksheet 2) genes indicating annotation as coding sequence (CDS) and whether there is an annotated stop codon and 3' UTR. Data from RNA-seq-based quantification for Ens95 genes missing a 3' UTR that is present in RefSeq is included for kdrlpos (worksheet 3), pdgfrbpos (worksheet 4), and Nr2f2pos (worksheet 5) cells. These data were used to generate Table 2 and graphs in Figure 2A,B; Figure 2—figure supplement 2I.
Reference gene set for 3' UTR comparisons.
IDs for representative Ens95, RefSeq, and V4.3 transcript ID, along with V4.3 gene symbols are shown with respective 3' UTR lengths (worksheet 1). Average median ratio normalized expression and log2 fold change (pos/neg) values quantified with Ens95, RefSeq, and V4.3 annotations from kdrlpos (worksheet 2), pdgfrbpos (worksheet 3), and Nr2f2pos (worksheet 4) RNA-seq for reference genes are included. Data directly used to generate Figure 2D–G, Figure 2—figure supplement 2C–H, Figure 3B–J and incorporated into source data as indicated below.
RNA-seq analysis of Nr2f2pos and NR2f2neg cells.
Output from DESeq2 analysis comparing Nr2f2pos and Nr2f2neg RNA-seq from gene expression levels quantified using RSEM with Ens95 (worksheet 1) or RefSeq (worksheet 2). Median ratio normalized expression values are shown for each sample, along with adjusted p-value, p-value, log2 fold change, fold change, and log10 adjusted p-value. Intersection of genesets identified as significantly enriched in Nr2f2pos cells using Ens95 or RefSeq (worksheet 3).
Transcript based-comparison of RefSeq and Ensembl annotations.
Worksheet one is a list of Ens95 genes missing from RefSeq with Ensembl gene ID, matching ZFIN ID and biotype annotation. Worksheet two is a list of RefSeq genes missing from Ensembl with NCBI gene ID, matching ZFIN ID, and coding sequence annotation. Transcript level matching output from gffcompare is included using Ens95 (worksheet 3) or RefSeq (worksheet 4) as a reference. Worksheet five is a transcript level comparison of Ens95 and Ens99. In this case, all transcripts exhibit a complete intron/exon chain match (designated by a ‘=" in class code). Data used to generate Table 3.
(A, B) Plots showing 3' UTR length from matched reference genes from indicated annotation identified as enriched only in Ens95 or RefSeq. Mean 3' UTR length for each group is shown, error bars …
(A,B) Volcano plots of differentially expressed genes from Nr2f2-positive and -negative (Nr2f2pos and Nr2f2neg) endothelial cells identified using RNA-seq reads quantified with (A) RefSeq or (B) …
(A) Schematic outline for generating a new zebrafish transcriptome annotation. See Results and Materials and methods sections for details. (B) Pie charts showing the proportion of reference genes …
List of SRA accession numbers, stages, and read numbers from GSE32900 for associated RNA-seq datasets used in this study.
List of manually-identified discrepancies in Ensembl gene annotation due to spurious fusionor overlapping transcripts.
Table includes Ens95 gene symbol, gene ID, and spurious transcript ID. Persistence of observed discrepancy in Ens99 is indicated, as is previous status of curation in ZFIN. All of these have been reported to ZFIN.
RefSeq (worksheet 1) and Ens99 (worksheet 2) genes missing from the V4.2 annotation.
Novel genes from V4.2 genome annotation.
This table includes information regarding blastx hits against zebrafish and human proteins, matches with lincRNAs, number of exons per gene, and whether the novel locus was included in the V4.3 annotation.
V4.3 gene information table, including unique LL ID numbers, associated Ens99 gene ID, NCBI ID, and ZFIN gene ID numbers, gene symbols, and gene names.
Annotation notes are also included regarding the relative strength of coordinate-based incorporation of NCBI (Entrez) and Ens99 gene identifiers.
Output from DESeq2 analysis comparing kdrlpos and kdrlneg RNA-seq.
Gene expression levels were quantified using RSEM with the V4.3 annotation. Median ratio normalized expression values are shown for each sample, along with adjusted p-value and log2 fold change. Matching Ensembl and NCBI gene IDs are included.
Output from DESeq2 analysis comparing pdgfrbpos and pdgfrbneg RNA-seq.
Gene expression levels were quantified using RSEM with the V4.3 annotation. Median ratio normalized expression values are shown for each replicate, along with adjusted p-value and log2 fold change. Matching Ensembl and NCBI gene IDs are included.
Worksheet 1 - Output from DESeq2 analysis comparing Nr2f2pos and Nr2f2neg RNA-seq.
Gene expression levels were quantified using RSEM with the V4.3 annotation. Median ratio normalized expression values are shown for each replicate, along with adjusted p-value and log2 fold change. Matching Ensembl and NCBI gene IDs are included. Worksheet 2 – Nr2f2pos-enriched genes with matched entries from reference gene set (Figure 2—source data 2) and associated 3' UTR lengths (Figure 2—source data 2).
(A, B) Annotated UCSC Genome Browser screenshots of (A) cenpq and mrpl39 loci and (B) talgn3b and abhd10b loci. RefSeq, Ens95, Ens99 and V4.3 transcript annotations are shown. Ensembl-annotated …
(A, B) Volcano plots of RNA-seq data from (A) kdrl-positive and negative and (B) pdgfrb-positive and negative cells quantified using V4.3. (A, B) Numbers of differentially expressed genes, along …
(A) Volcano plot of RNA-seq data from Nr2f2pos and Nr2f2neg cells quantified using V4.3. Numbers of differentially expressed genes are shown. Selected known venous endothelial genes are indicated …
(A, B) tSNE plots of cells from 5 day post fertilization (dpf) zebrafish embryos from the same mapped scRNA-seq reads quantified with (A) Ens95 or (B) V4.2. The total number of clusters and cells …
Metrics from CellRanger output for data quantified using Ens95 and V4.2.
Cluster-specific genes from whole embryo scRNA-seq at 5 days post fertilization (dpf) identified using Seurat from data quantified with Ens95.
The p-value, adjusted p-value, and average log2 fold change refer to comparison of indicated cluster with all other clusters. Cutoffs are adjp <0.05, log2 fold change >0.5. Pct.1 is proportion of cells within indicated cluster that express detectable levels of the indicated gene. Pct.2 is proportion of all other cells that express detectable levels of the indicated gene. Matching LL and Ensembl gene IDs are included.
Cluster-specific genes from whole embryo scRNA-seq at five dpf identified using Seurat from data quantified with V4.2.
p-value, adjusted p-value and average log2 fold change refer to comparison of indicated cluster with all other clusters. Cutoffs are adjp <0.05, log2 fold change >0.5. Pct.1 is proportion of cells within indicated cluster that express detectable levels of the indicated gene. Pct.2 is proportion of all other cells that express detectable levels of the indicated gene. Matching LL and Ensembl gene IDs are included.
(A) Plots of standard deviation against the number of principal components using whole embryo scRNA-seq data quantified using indicated annotation. Standard deviation values for 75 principal …
(A) tSNE plots showing expression of cartilage markers mia, matn1, col2a1a, and fgfbp2 using clustering based on data quantified with Ens95 or V4.2, as indicated. (B) tSNE plots showing expression …
(A) Venn diagrams illustrating intersection by common LL gene ID of genes enriched in mia-positive cartilage cells and and1-positive epidermis cells by both indicated annotations. (B) 3' UTR lengths …
Cartilage-specific genes identified by scRNA-seq.
Worksheet one includes all cartilage genes identified using both Ens95 and V4.2 quantification, as indicated, with associated adjusted p-values and log2 fold change (comparison of cartilage cells to all other clusters). 3' UTR length from matching reference gene (from Figure 2—source data 2) is indicated. Worksheet two includes matched bulk RNA-seq quantification from Ens95 and V4.3 annotations for cartilage-specific genes. Data used to generate Figure 5A–C, Figure 5—figure supplement 1A,B.
Epidermis-specific genes identified by scRNA-seq.
Worksheet one includes all epidermis genes identified using both Ens95 and V4.2 quantification, as indicated, with associated adjusted p-values and log2 fold change. 3' UTR length from matching reference gene (from Figure 2—source data 2) is indicated. Worksheet two includes matched bulk RNA-seq quantification from Ens95 and V4.3 annotations for epidermis-specific genes. Data used to generate Figure 5A–C, Figure 5—figure supplement 1A,B.
(A, B) Left, plots showing values from bulk RNA-seq comparison of pdgfrbpos and pdgfrbneg cells quantified with Ens95 for (A) cartilage and (B) epidermis genes identified as such selectively in V4.2 …
(A) tSNE plot of all clusters from Ensembl (Ens95)-quantified scRNA-seq of zebrafish embryos at 5 days post fertilization with one erythroid and two leukocyte clusters indicated. (A–E, G) Circled …
List of genes from 3-way Venn diagram output shown in Figure 6H.
First column is gene symbol, second column is assigned class shown in the Venn diagram.
Ens95 | RefSeq | |||
---|---|---|---|---|
All | w/NCBI ID | All | w/Ens ID | |
kdrlpos – all detected | 25704 | 21489 | 27516 | 21903 |
kdrlpos-enriched | 1632 | 1538 | 1780 | 1651 |
pdgfrbpos – all detected | 25699 | 21480 | 27598 | 21903 |
pdgfrbpos-enriched | 2186 | 2091 | 2323 | 2188 |
Nr2f2pos – all detected | 20516 | 17867 | 21788 | 18164 |
Nr2f2pos-enriched | 568 | 516 | 580 | 508 |
Ens95 | RefSeq | |
---|---|---|
All Genes | 32520 | 30445 |
annotated CDS | 25592 | 26120 |
CDS, missing annotated stop codon | 1585 | 269 |
CDS, missing annotated 3' UTR | 4703 | 1580 |
Annotation | Ensembl 95 | RefSeq | V4.2 | V4.3 |
---|---|---|---|---|
# genes | 32520 | 30445 | 39988 | 36351 |
# transcripts | 59876 | 55182 | 115496 | 111842 |
# exons | 335075 | 307538 | 414404 | 411330 |
# RefSeq genes missing | 3165 | - | 173 | 7c |
# Ensembl genes missinga | - | 2116 | 1133b | 957d |
a –RefSeq comparison with Ens95, V4 comparison with Ens99.
b – 956/1133 classified as rRNA, snRNA, snoRNA or sRNA.
c – left out from V4.2 add-back; see main text.
d – 956/957 are rRNA, snRNA, snoRNA, sRNA, or miscRNA; remaining protein coding gene is a sequence duplicate.
kdrlpos-enriched | pdgfrbpos-enriched | Nr2f2pos-enriched | |
---|---|---|---|
Ens95 | 1632 | 2186 | 568 |
RefSeq | 1780 | 2323 | 580 |
V4.3 | 2141 | 2794 | 613 |
annotated in Ens95 and RefSeq | 1938 | 2612 | 523 |
not annotated in Ens95 | 144 | 131 | 67 |
not annotated in RefSeq | 119 | 113 | 54 |
only annotated in V4.3 | 60 | 62 | 31 |
List of R commands used to run Seurat for clustering of data quantified using Ens95 and V4.2.
Perl script used to add strand information and filter reads in BAM file output from GSNAP.
Python script to identify and remove spurious fusion transcripts.
Zebrafish transcriptome annotation V4.2.
Contains genomic annotation file (153.8 MB, md5sum:44c87a2bdd19ccfd9f7cd526f9e21498) and gene information (as tab-delimited file and .xlsx file).
Zebrafish transcriptome annotation V4.3.1.
Contains genomic annotation file (152 MB, md5sum: 19759898187c47edfd9c216162851e31) and gene information (as tab-delimited file and .xlsx file).