Hypothalamic transcriptomes of 99 mouse strains reveal trans eQTL hotspots, splicing QTLs and novel non-coding genes

  1. Yehudit Hasin-Brumshtein  Is a corresponding author
  2. Arshad H Khan
  3. Farhad Hormozdiari
  4. Calvin Pan
  5. Brian W Parks
  6. Vladislav A Petyuk
  7. Paul D Piehowski
  8. Anneke Brümmer
  9. Matteo Pellegrini
  10. Xinshu Xiao
  11. Eleazar Eskin
  12. Richard D Smith
  13. Aldons J Lusis
  14. Desmond J Smith
  1. University of California, Los Angeles, United States
  2. University of California, Los Angeles, United states
  3. Pacific Northwest National Laboratory, United States
8 figures, 4 tables and 3 additional files

Figures

Genomic properties of novel genes are similar to known non coding genes.

Novel genes and isoforms are defined by Cufflinks class code 'u' and 'j' respectively. Distributions of transcript length (A), and maximal hypothetical peptide length (B) of novel genes (yellow), new isoforms (purple), known non coding transcripts (dashed line) and known coding transcripts (solid line). Transcriptional complexity (number of transcripts per locus, (C) and splicing complexity (number of exons per transcript, (D) of novel genes, novel isoforms, known coding and know non coding transcripts.

https://doi.org/10.7554/eLife.15614.004
Figure 2 with 1 supplement
Genetic regulation of expression in the mouse hypothalamus.

(A) Number of genes affected by trans (blue), local (yellow) or both (striped) variants as a function of statistical threshold. (B) Gene level expression quantitative trait loci (eQTL, top) but not transcript specific (isoQTL, bottom) show trans eQTL hotspots. Density shows the number of interactions at lower statistical thresholds (1e–6), red shows interactions passing 1E-12 threshold. Yellow indicates cis acting variants. (C). Genetic regulation occurs on every level, but gene level regulation is more prevalent than transcript specific cases. Supplementary figure shows correlations between allele expression in F1 and local eQTLs identified in HMDP hypothalamus.

https://doi.org/10.7554/eLife.15614.006
Figure 2—figure supplement 1
Allele specific expression in whole brain correlation to local eQTL.
https://doi.org/10.7554/eLife.15614.007
Alternative splicing in the mouse hypothalamus.

Alternative splicing events were classified (see Materials and methods) into 7 types: alternative 3’ splice (A3, blue), alternative 5’ splice (A5, purple), alternative first exon (AF, orange), alternative last exon (AL, brown), mutually exclusive exons (MX, black), retained intron (RI, green), and skipped exon (SE, dark red). All events were quantified in each sample for percent spliced in (PSI). (A) Number of alternative splicing events of each type (solid color), and number of genes affected by these events (light color). (B) Example of partial exon skipping in Colq gene. DBA shows the complete inclusion of the exon (therefore PSI = 1), while in C57BL/6 there is partial exon skipping (PSI = 0.78). (C) Number of alternative splicing events with and without local QTL signal (solid and light color respectively). (D) Alternative splice QTLs are mapping to the same chromosome, for all types of events, indicating that most of genetic regulation is by local (and likely cis acting) variants. (F) Distance between most significant SNP for each event and gene start. The largest effect is typically within 1 Mb of the gene. (G) An example of mapping of mutually exclusive exon event in Nnat gene mapping to SNP rs32019082. (E) Distribution of all PSI values of each event type in all samples.

https://doi.org/10.7554/eLife.15614.008
eQTL mapping suggests trans eQTL hotspots in the hypothalamus regulate expression of hundreds of genes.

(A) Mouse genome was broken into 100 kb bins. The plot presents genome wide counts of genes which expression is associated with SNPs in that region, in trans. (B) Zoom of trans eQTL locus on chromosome 15. Peak SNP (associated with most genes in trans, rs31703733) is shown in red, color of other SNPs indicates r2 to rs31703733. Lower track shows the 10 genes which expression is associated with rs31703733 locally. C,D,E pertain to the 10 genes associated locally with rs31703733 and therefore potentially mediate the trans effects. (C) Summary table about each gene. (D) Heatmap showing correlation of expression between genes associated with rs31703733 locally, and genes associated with rs31703733 in trans. Color indicates Pearson correlation coefficient. (E) Example correlation between potential regulator (RApgef3) and trait (HDL levels).

https://doi.org/10.7554/eLife.15614.010
Expression of long non coding RNAs in the hypothalamus is phenotypically relevant.

(A) Expression of heatmap known non coding RNAs and novel isoforms of these genes n the HMDP. Six lncRNAs (top cluster, Meg3, Gm26924, Snhg4, Miat, 6330403K07Rik, and Malat1) are highly expressed in almost all samples. (B) Novel isoforms of lncRNAs are expressed at a similar level of known ones. (C) Long non coding RNAs are associated with multiple phenotypes in the HMDP. (D) An example of association between a non coding RNA C330006A16Rik and average food intake.

https://doi.org/10.7554/eLife.15614.011
RNA editing is prevalent at the mouse hypothalamus at low levels.

(A) A total of 8462 editing sites were identified in the HMDP, with A to G accounting for >70% of the modifications. (B) Number of sites identified in each strain (color coding as in A). (C) Editing level at 90 sites, that were detected in at least 70% of the samples, were mapped. Heatmap shows variation in editing in these sites among the strains. (D) An example of an edited site in Ociad1 gene, and its genome wide mapping result.

https://doi.org/10.7554/eLife.15614.012
Groups of genes are associated with multiple related phenotypes in HMDP, although not necessarily enriched for GO ontology or specific pathways.

(A) Fraction of co-shared genes among the 500 genes most associated with a phenotype. (B) Enrichment analysis of the top 500 genes associated with each of the 150 phenotypes results in a small number of significant associations.

https://doi.org/10.7554/eLife.15614.013
RNA-Seq analysis framework.

General workflow used for analysis of RNA-Seq data in this study. Initial demultiplexed samples (fastq files) were aligned to the mouse genome with STAR, merged in one file per strain, and transcripts assembeled with cufflings. The resulting assembly files (one from each strain) were merged with GENECODE M2 annotation into unified assembly. The abundance of each transcript in the unified assembly was estimated in sample specific alignment files.

https://doi.org/10.7554/eLife.15614.015

Tables

Table 1

Transcriptome assembly and filtering.

https://doi.org/10.7554/eLife.15614.003
NoneFilter #1*Filter #2NR
Loci (genes)40472375911407914079
Transcripts3834203570665102450347
 known (=) coding9965820721
 known (=) non coding8584
 novel isoform (j)25957021234
 novel locus (u)11753485
 other status12439
TSS100073944173253720013
CDS4624246242186877643
Total features57020753531611632792082
  1. *Filter #1: Expression values of <1e–6 were rounded to zero, and novel transcripts with all zero values were removed both from expression table and from merged file. Also, all transcripts with class code not "=", "j" or "u" were removed.

  2. Filter #2: Implementation of detection and expression thresholds (detected in more than 100 samples and expressed (fpkm>1) in more than 50) on each feature separately.

  3. Filter_NR: Non redundant features count (those that do not have 1_2_1 correlation to a gene).

Table 2

Summary of peptide support for transcripts.

https://doi.org/10.7554/eLife.15614.005
Peptide matching*AllUniquely mapped
Known transcripts99131016
 Protein_coding98391002
 ncRNAs7414
Novel isoforms40194
Novel genes2924
Total number103431134
  1. *Peptides are counted towards supporting novel isoforms only if they do not match any known transcript, and towards supporting novel genes if they do not match either known or novel isoform transcripts. All peptides matching known transcripts were also assigned the most likely transcript. Non coding status was assigned only to peptides that do not match any coding transcripts (for full details please see Materials and methods).

Table 3

Alternative splicing events.

https://doi.org/10.7554/eLife.15614.009
All eventsCis QTL events
Alternative 3' splice site (A3)316 (266)155 (134)
Alternative 5' splice site (A5)365 (304)220 (189)
Alternative first exon (AF)5288 (1874)2776 (1278)
Alternative last exon (AL)507 (320)283 (208)
Mutually exclusive exons (MX)44 (36)29 (27)
Retained Intron (RI)645 (476)356 (285)
Skipped exon (SE)399 (323)221 (189)
Total7564 (3599)4040 (2310)
  1. Table 2: Number in parenthesis indicates number of distinct genes affected by the events.

Table 4

Expression of hypothalamic markers.

https://doi.org/10.7554/eLife.15614.014
GeneMarker forMean expressionlocal eQTL
AgRPARC neurons119.040ND
NPYARC neurons18.162ND
Foxo1ARC neurons4.736ND
POMCARC neurons271.9542.618E-04
Hcrt (orexin)LHA neurons128.8982.053E-05
Sf1VMHvl neurons65.333ND
Nkx2-1VMHvl neurons6.0671.100E-05
Tac-1VMHvl neurons36.3731.624E-09
BDNFVMHvl neurons4.464ND
Esr1Multiple2.658ND
LEPRMultiple1.6442.041E-04
INSRMultiple9.496ND
CX3CR1Microglia9.7901.400E-05
AIF-1Microglia98.598ND
CD68Microglia4.162ND
ItgamMicroglia3.599ND
MyD88Microglia2.3691.942E-17
Aqp4Astrocytes30.041ND
Slc1a3Astrocytes120.203ND
Aldh1l1Astrocytes16.787ND
GfapAstrocytes72.5022.140E-06
VEGF-ATanycytes39.559ND
CX3CL1Neurons99.4119.836E-09
MogOligodendrocytes34.5691.807E-04
MbpOligodendrocytes1266.574ND
Plp1Oligodendrocytes569.620ND
Car4Endothelial10.694ND
EsamEndothelial11.6233.184E-04
Flt1Endothelial9.538ND
Cldn5Endothelial15.594ND

Additional files

Supplementary file 1

Annotations of novel genes and samples.

(A) Annotation of novel genes and transcripts Basic annotation data of all transcripts classified as 'new locus' (class code = u).Data includes tracking ID of the transcript and gene, locus position, number of samples where transcript is detected (FPKM>1e-6) and number of samples where transcript is expressed (FPKM >1), mean expression across all samples and if gene arises from uniquely mapped reads. (B) Annotation of detected peptides Basic annotation data of all peptides detected by LC-MS proteomics. Data includes peptide sequence, tracking ids of matching transcripts (known, novel isoforms and novel genes, based on Cufflinks class code classification), number and list of strains in which the peptide was detected, and number of transcripts the peptide may be attributed to. (C) Sample description. Technical metadata pertaining to samples used in this study, including NCBI SRA accession numbers, RNA-Seq QC data (number of reads, mapped reads, detected junctions), and mouse id and strain.

https://doi.org/10.7554/eLife.15614.016
Supplementary file 2

Trans eQTL hotspots.

Includes counts of associated genes in 100 kb windows, and for each of the hotspots: a list of associated genes, interaction p values, list of local eQTLs for the top SNP and DAVID enrichment annotation of the genes associated in trans.

https://doi.org/10.7554/eLife.15614.017
Supplementary file 3

Trans eQTL hotspots - gene counts, functional enrichment and local QTLs.

(A) Gene-trait correlations Top known 500 genes associated with each phenotype in HMDP.Data is aggregated in table form crossing all genes with all traits, thus not all gene-trait pairs are significant. ‘Inf ‘indicates not significant interactions. Numeric values indicate p value of association, with 1E-3 correlating to 5% FDR threshold based on permutations. (B) Novel genes are associated with metabolic traits Data table with all detected associations between novel genes and the 150 phenotypes assessed in HMDP.

https://doi.org/10.7554/eLife.15614.018

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Yehudit Hasin-Brumshtein
  2. Arshad H Khan
  3. Farhad Hormozdiari
  4. Calvin Pan
  5. Brian W Parks
  6. Vladislav A Petyuk
  7. Paul D Piehowski
  8. Anneke Brümmer
  9. Matteo Pellegrini
  10. Xinshu Xiao
  11. Eleazar Eskin
  12. Richard D Smith
  13. Aldons J Lusis
  14. Desmond J Smith
(2016)
Hypothalamic transcriptomes of 99 mouse strains reveal trans eQTL hotspots, splicing QTLs and novel non-coding genes
eLife 5:e15614.
https://doi.org/10.7554/eLife.15614