1. Genomics and Evolutionary Biology
  2. Neuroscience
Download icon

Hypothalamic transcriptomes of 99 mouse strains reveal trans eQTL hotspots, splicing QTLs and novel non-coding genes

  1. Yehudit Hasin-Brumshtein Is a corresponding author
  2. Arshad H Khan
  3. Farhad Hormozdiari
  4. Calvin Pan
  5. Brian W Parks
  6. Vladislav A Petyuk
  7. Paul D Piehowski
  8. Anneke Brümmer
  9. Matteo Pellegrini
  10. Xinshu Xiao
  11. Eleazar Eskin
  12. Richard D Smith
  13. Aldons J Lusis
  14. Desmond J Smith
  1. University of California, Los Angeles, United States
  2. University of California, Los Angeles, United states
  3. Pacific Northwest National Laboratory, United States
Tools and Resources
Cited
1
Views
1,051
Comments
0
Cite as: eLife 2016;5:e15614 doi: 10.7554/eLife.15614

Abstract

Previous studies had shown that the integration of genome wide expression profiles, in metabolic tissues, with genetic and phenotypic variance, provided valuable insight into the underlying molecular mechanisms. We used RNA-Seq to characterize hypothalamic transcriptome in 99 inbred strains of mice from the Hybrid Mouse Diversity Panel (HMDP), a reference resource population for cardiovascular and metabolic traits. We report numerous novel transcripts supported by proteomic analyses, as well as novel non coding RNAs. High resolution genetic mapping of transcript levels in HMDP, reveals both local and trans expression Quantitative Trait Loci (eQTLs) demonstrating 2 trans eQTL 'hotspots' associated with expression of hundreds of genes. We also report thousands of alternative splicing events regulated by genetic variants. Finally, comparison with about 150 metabolic and cardiovascular traits revealed many highly significant associations. Our data provide a rich resource for understanding the many physiologic functions mediated by the hypothalamus and their genetic regulation.

https://doi.org/10.7554/eLife.15614.001

eLife digest

Metabolism is a term that describes all the chemical reactions that are involved in keeping a living organism alive. Diseases related to metabolism – such as obesity, heart disease and diabetes – are a major health problem in the Western world. The causes of these diseases are complex and include both environmental factors, such as diet and exercise, and genetics. Indeed, many genetic variants that contribute to obesity have been uncovered in both humans and mice. However, it is only dimly understood how these genetic variants affect the underlying networks of interacting genes that cause metabolic disorders.

Measuring gene activity or expression, and tracing how genetic instructions are carried from DNA into RNA and proteins, can reliably identify groups of genes that correlate with metabolic traits in specific organs. This strategy was successfully used in previous studies to reveal new information about abnormalities linked to obesity in specific tissues such as the liver and fat tissues. It was also shown that this approach might suggest new molecules that could be targeted to treat metabolic disorders.

A brain region called the hypothalamus is key to the control of metabolism, including feeding behavior and obesity. Hasin-Brumshtein et al. set out to explore gene expression in the hypothalamus of 99 different strains of mice, in the hope that the data will help identify new connections between gene expression and metabolism.

This approach showed that thousands of new and known genes are expressed in the mouse hypothalamus, some of which coded for proteins, and some of which did not. Hasin-Brumshtein et al. uncovered two genetic variants that controlled the expression of hundreds of other genes. Further analysis then revealed thousands of genetic variants that regulated the expression of, and type of RNA (so-called "spliceforms") produced from neighboring genes. Also, the expression of many individual genes showed significant similarities with about 150 metabolic measurements that had been evaluated previously in the mice.

This new dataset is a unique resource that can be coupled with different approaches to test existing ideas and develop new ones about the role of particular genes or genetic mechanisms in obesity. Future studies will likely focus on new genes that show strong associations with attributes that are relevant to metabolic disorders, such as insulin levels, weight and fat mass.

https://doi.org/10.7554/eLife.15614.002

Introduction

The regulation of body weight and appetite are complex processes, in which hypothalamic nuclei play a pivotal role. Genome wide association studies have shown that DNA sequence variants significantly contribute to variation in metabolic traits both in humans and mice. However, in most cases the connection between genetic variant and final phenotype remains unknown (Suhre et al., 2011; Teslovich et al., 2010; Lappalainen et al., 2013). In an effort to better understand how genetic variation results in phenotypic differences, many projects in the last decade have focused on genome wide characterization of sequence variants regulating gene expression in different tissues and organisms (Lappalainen et al., 2013; Majewski and Pastinen, 2011; Grundberg et al., 2011, , 2012). These studies showed that up to 80% of genetic variants associated with complex traits likely act through the regulation of gene expression rather than changing protein sequence and function. Such genes, termed expression quantitative trait loci (eQTLs), offer useful insights into the mechanistic links between genotype and phenotype, providing the eQTLs are characterized with sufficient power and resolution in phenotypically relevant tissues and states (Pickrell et al., 2010; Min et al., 2012; Gaulton et al., 2015).

Mouse is the primary model organism for many cardiovascular traits, including atherosclerosis, metabolic syndrome, obesity, the neural control of metabolism, and diabetes (Lusis, 2000; Billon et al., 2010; Allayee et al., 2003). Dozens of loci contribute genetically to metabolic traits have been identified in mouse models (Parks et al., 2013; Lusis, 2012). Indeed, we and others have extensively characterized eQTLs in metabolically relevant tissues of mice, suggesting potential new genes related to obesity (van Nas et al., 2010; Keane et al., 2011; Huang et al., 2009; Aylor et al., 2011). Integration of transcriptomic data from liver and adipose with genetic mapping and phenotypes led to mechanistic insights into the complexity of metabolic phenotypes. Yet, hypothalamus, which is not a readily accessible tissue, lacks such high resolution expression data. In fact, only one previous study examined eQTLs in mouse hypothalamus, using mice from 39 inbred strains fed chow diet, using microarray data and 5000 SNPs (McClurg et al., 2007). The lack of extensive transcriptome data, which allows mapping of eQTLs and the linking of traits and transcripts, is a major impediment in integrating the hypothalamus with a systems biology of metabolism. In this study, we used RNA-Seq to characterize the hypothalamic transcriptome in 99 inbred strains of mice from the Hybrid Mouse Diversity Panel (HMDP, [Bennett et al., 2010; Ghazalpour et al., 2012]) fed a high fat, high sugar diet (Parks et al., 2013). The advantage of the HMDP is that these strains have all been densely genotyped and carefully phenotyped for about 150 metabolic traits, allowing high resolution genetic mapping of QTLs and eQTLs. We identified thousands of novel isoforms and hundreds of new genes that were not previously annotated, and that may represent variants or transcripts specific to the hypothalamus. The HMDP also allowed us to map QTLs with high resolution and power, identifying both local and trans acting variants. The RNA sequencing data permitted examination of a much broader spectrum of transcriptional features and facilitated analysis not only at the gene level, but also of genetic variants affecting specific isoforms, coding sequences or transcription start sites.

Results

In this study, we explored the transcriptional landscape of mouse hypothalamus using RNAseq from 282 mice, representing 99 inbred and recombinant inbred strains from the HMDP. We first focused on quantifying gene expression in a transcript specific manner, the discovery of novel expressed regions and isoforms, and the contribution of genetic factors to expression variance. We also sought qualitative support for translation of new isoforms and genes. We then examined and quantified RNA modifications, such as alternative splicing events and editing in our data and used the data to map genetic variants affecting these events. Finally, we used the extensive phenotyping available for the HMDP, to look for associations between the expression of genes and transcripts, and physiologic phenotypes. All of the expression data and transcriptome assembly are publicly available from the Gene Expression Omnibus database, accession number GSE79551.

Hypothalamic gene expression and proteomic data reveal multiple new isoforms and novel genes

Similar to other large-scale RNAseq studies (Mutz et al., 2013) we identified thousands of novel transcripts, with the vast majority of them only expressed at low levels in a small subset of samples (Table 1). Nevertheless, in the robust set of transcripts that are expressed at appreciable levels (FPKM >1 in 50+ samples and FPKM>0 in 100+ samples), we still identified 21,234 novel isoforms and 485 transcripts originating from 407 novel expressed genes (Supplementary file 1). Interestingly, the number of novel isoforms in our study is comparable to the number of previously annotated transcripts passing the same filtering criteria (n = 29,305, Table 1).

Table 1

Transcriptome assembly and filtering.

https://doi.org/10.7554/eLife.15614.003
NoneFilter #1*Filter #2NR
Loci (genes)40472375911407914079
Transcripts3834203570665102450347
 known (=) coding9965820721
 known (=) non coding8584
 novel isoform (j)25957021234
 novel locus (u)11753485
 other status12439
TSS100073944173253720013
CDS4624246242186877643
Total features57020753531611632792082
  1. *Filter #1: Expression values of <1e–6 were rounded to zero, and novel transcripts with all zero values were removed both from expression table and from merged file. Also, all transcripts with class code not "=", "j" or "u" were removed.

  2. Filter #2: Implementation of detection and expression thresholds (detected in more than 100 samples and expressed (fpkm>1) in more than 50) on each feature separately.

  3. Filter_NR: Non redundant features count (those that do not have 1_2_1 correlation to a gene).

There are several possible interpretations to as why the 407 genes could be missing from the genomic annotation. First, the 407 novel genes expressed in our data potentially constitute transcripts that are unique to the hypothalamus. Second, the GENCODE M2 annotation used in this study was the most recent available when we started to analyze the data. Since then, however, the annotation has been augmented based on more recently published datasets and improved prediction pipelines. Thus, while the 407 genes are novel relative to the M2 version, they may have been added later. To explore that possibility, we compared our assembly to the latest version of annotation released by GENCODE – M10 (released January 2016, http://www.gencodegenes.org/) and find that 193 out of 407 loci are still novel. Some of the genes may also represent genomic DNA contamination. However, we consider this possibility less likely since we used stringent filtering criteria based on the number of samples that the genes were expressed in.

In terms of genomic properties, such as putative open reading frame length, transcript length or splicing complexity, the novel genes seem to resemble known non-coding genes, suggesting that the majority of the novel genes likely belong to this class. On the other hand, the properties of novel isoforms are consistent with known coding transcripts (Figure 1). To explore the translational potential of newly identified isoforms and genes, we compared our RNA-Seq data to proteomic data generated from additional hypothalamus samples from the HMDP. Specifically, we translated all known and newly identified transcripts in 6 potential open reading frames, and compared this dataset to the hypothalamic peptide sequences. As expected, >95% of the identified peptides (Supplementary file 1) matched at least one known transcript with the vast majority of these (>99%) corresponding to known protein coding transcripts, suggesting high accuracy of the peptide data. In addition, 430 peptides matched either novel isoforms (n = 401), or novel genes (n = 29) exclusively (Table 2). Since the genomic properties of the novel genes hint that the majority of them are likely non coding, we do not find their low representation in peptide data surprising. Moreover, manual in-depth characterization of the 29 peptides matching novel genes, with UCSC and NCBI database revealed that almost all of them represent processed pseudogenes, rather than proteins with novel functions. This finding both supports the previously published reports of wide spread transcription of pseudogenes, and their translation (Kim et al., 2014; Tay et al., 2015), while strengthening the suggestion that the majority of novel genes identified in this work are not protein coding.

Genomic properties of novel genes are similar to known non coding genes.

Novel genes and isoforms are defined by Cufflinks class code 'u' and 'j' respectively. Distributions of transcript length (A), and maximal hypothetical peptide length (B) of novel genes (yellow), new isoforms (purple), known non coding transcripts (dashed line) and known coding transcripts (solid line). Transcriptional complexity (number of transcripts per locus, (C) and splicing complexity (number of exons per transcript, (D) of novel genes, novel isoforms, known coding and know non coding transcripts.

https://doi.org/10.7554/eLife.15614.004
Table 2

Summary of peptide support for transcripts.

https://doi.org/10.7554/eLife.15614.005
Peptide matching*AllUniquely mapped
Known transcripts99131016
 Protein_coding98391002
 ncRNAs7414
Novel isoforms40194
Novel genes2924
Total number103431134
  1. *Peptides are counted towards supporting novel isoforms only if they do not match any known transcript, and towards supporting novel genes if they do not match either known or novel isoform transcripts. All peptides matching known transcripts were also assigned the most likely transcript. Non coding status was assigned only to peptides that do not match any coding transcripts (for full details please see Materials and methods).

We further examined whether multi-mapping reads are a substantial contributor to the measured expression of the novel identified genes. For that purpose, we re-quantified 3 samples from C57BL/6J strain, using uniquely mapping reads only (mapping quality = 255), and compared the quantification of the novel genes between the two approaches. Not surprisingly, the proportion of uniquely mapped reads contributing to the expression of the genes matched very well between the 3 samples, suggesting that it is an intrinsic gene property and not unique to the individual samples. Importantly, while the low uniqueness of mapping reads may indicate false results, we also noted that among the 29 peptides matching to the novel genes exclusively and uniquely, 18 match genes that do not pass our threshold. In fact, many of gene families share sequence motifs and are homologues, and as such some of the reads would be multi-mapping between them. Thus we chose not to remove these genes from the annotation.

Genetic regulation of gene expression

We examined expression QTLs in terms of type of the affected features and identified SNPs affecting expression of all transcripts at a locus (eQTL), specific transcript isoforms (isoQTLs), transcription start sites (tssQTLs) or open reading frame (cdsQTLs) (Hasin-Brumshtein et al., 2016). Since the linkage disequilibrium (LD) is extensive in the mouse genome, we distinguish between three types of QTLs: local (within 2 Mb of the gene), distant (on the same chromosome as the gene, distance >2 Mb) and trans (SNP and gene reside on different chromosomes). The number of identified interactions, and genes affected by such regulation depends on the statistical cutoff of p-value for the interaction one chooses. We examined a set of thresholds ranging from very liberal to stringent. The liberal threshold was previously established by permutation from microarray expression data in the HMDP, i.e. p<1.4E–3 for local variants and p<6E–6 for distant and trans. The most stringent threshold used a Bonferroni corrected threshold (i.e 0.01 divided by the number of tests, p<1E-12). As expected, different thresholds resulted in different numbers of QTLs, with local and distant QTLs being more robust to threshold stringency than trans QTLs. However, 85–95% of the genes regulated by distant variants, were also regulated by local variants, suggesting that most of the distant QTLs reflect local signals emulated at a distance as a result of wide-ranging LD, rather than independent regulatory elements. In contrast, 70% of the trans interactions involved genes lacking local signal over the entire set of thresholds. Since distant regulation was mostly redundant to local, and it would be very difficult to determine whether that signal is a result of independent regulation versus extended LD, we chose to focus our analysis on trans and local interactions only, defining trans as trans-chromosomal interactions. If we look at regulation types over a wide range of thresholds, the local signal predominates at the more stringent thresholds reflecting the larger typical effect size in this group (Figure 2A). We identified local and trans QTLs for all expressed feature types, with the most common being eQTLs and isoQTLs (Figure 2B). Notably, while we used kinship matrix specific for our strains, still several genes in our dataset show extensive trans regulation (horizontal lines in Figure 2B) suggesting a residual influence of population structure on our mapping results. We also note that regulation of gene expression often occurs at the gene level, than at transcript specific levels (Figure 2C).

Figure 2 with 1 supplement see all
Genetic regulation of expression in the mouse hypothalamus.

(A) Number of genes affected by trans (blue), local (yellow) or both (striped) variants as a function of statistical threshold. (B) Gene level expression quantitative trait loci (eQTL, top) but not transcript specific (isoQTL, bottom) show trans eQTL hotspots. Density shows the number of interactions at lower statistical thresholds (1e–6), red shows interactions passing 1E-12 threshold. Yellow indicates cis acting variants. (C). Genetic regulation occurs on every level, but gene level regulation is more prevalent than transcript specific cases. Supplementary figure shows correlations between allele expression in F1 and local eQTLs identified in HMDP hypothalamus.

https://doi.org/10.7554/eLife.15614.006

Classical genome-wide eQTL studies used association analysis of total gene expression levels to map local eQTL, assuming that variation linked to proximal genetic variant indicates cis regulation. Recently, several studies have exploited the single base resolution of RNA-Seq to examine this assumption. RNA-Seq permits the identification allele expression (ASE), a hallmark of functional cis acting regulation, in heterozygotes, such as humans or in mouse F1s. Notably, these studies generally show poor concordance between ASE ratios and previously identified local eQTLs, which had been attributed mostly to different technical aspects of RNAseq and ASE analysis (Hasin-Brumshtein et al., 2014; Lappalainen et al., 2013; Grundberg et al., 2012). To examine the concordance between local eQTLs obtained from ASE and genome-wide association, we performed RNA-Seq on brain tissue from 20 mice representing two F1 crosses between three of the classical inbred strains used to construct the HMDP. The parental strains were C57Bl/6 (B), A/J (A) and C3H (H), and the F1 offspring were BxA and HxB. We compared the effect size of ~2600 local eQTLs, to the average ASE effect in both crosses. While local eQTLs were significantly positively correlated with ASE (p<2E-16) correlation estimates were modest (R= 0.2), and lower than between the ASEs in the two sets of F1 mice (R= 0.4, Figure 2—figure supplement 1).

RNA-Seq shows extensive alternative splicing in the hypothalamus, with complex regulation pattern

Using a previously established pipeline for analysis of alternative splicing events, SUPPA (see Materials and methods), we identified 7564 alternative splicing events affecting 3599 genes (Table 3). An alternative first exon was most common, accounting for the majority of alternative splicing events with several genes exhibiting multiple alternative first exons (Figure 3A). For other types of events, each most often affected one gene, with some of the genes exhibiting a combination of alternative splicing events. The extent of alternative splicing in each sample was quantified as a percent spliced in (PSI) of every event (Figure 3B). This quantification can be regarded as a quantitative trait; however, the distribution of PSI for every type of event suggests an excess of 0 or 1 values (never or always spliced in, respectively Figure 3E). This observation is consistent with alternative splicing being often bimodal rather than a normally distributed continuous measure (e.g. present or absent splice site), however, it also shows that quantitative regulation plays a significant role in ratio of isoforms that arise from the inclusion or exclusion of a splicing event.

Alternative splicing in the mouse hypothalamus.

Alternative splicing events were classified (see Materials and methods) into 7 types: alternative 3’ splice (A3, blue), alternative 5’ splice (A5, purple), alternative first exon (AF, orange), alternative last exon (AL, brown), mutually exclusive exons (MX, black), retained intron (RI, green), and skipped exon (SE, dark red). All events were quantified in each sample for percent spliced in (PSI). (A) Number of alternative splicing events of each type (solid color), and number of genes affected by these events (light color). (B) Example of partial exon skipping in Colq gene. DBA shows the complete inclusion of the exon (therefore PSI = 1), while in C57BL/6 there is partial exon skipping (PSI = 0.78). (C) Number of alternative splicing events with and without local QTL signal (solid and light color respectively). (D) Alternative splice QTLs are mapping to the same chromosome, for all types of events, indicating that most of genetic regulation is by local (and likely cis acting) variants. (F) Distance between most significant SNP for each event and gene start. The largest effect is typically within 1 Mb of the gene. (G) An example of mapping of mutually exclusive exon event in Nnat gene mapping to SNP rs32019082. (E) Distribution of all PSI values of each event type in all samples.

https://doi.org/10.7554/eLife.15614.008
Table 3

Alternative splicing events.

https://doi.org/10.7554/eLife.15614.009
All eventsCis QTL events
Alternative 3' splice site (A3)316 (266)155 (134)
Alternative 5' splice site (A5)365 (304)220 (189)
Alternative first exon (AF)5288 (1874)2776 (1278)
Alternative last exon (AL)507 (320)283 (208)
Mutually exclusive exons (MX)44 (36)29 (27)
Retained Intron (RI)645 (476)356 (285)
Skipped exon (SE)399 (323)221 (189)
Total7564 (3599)4040 (2310)
  1. Table 2: Number in parenthesis indicates number of distinct genes affected by the events.

Contrary to gene expression, all forms of alternative splicing were predominantly regulated by local variants (Figure 3C,D). This is not unexpected, since variation in alternative splicing is most likely to arise from particular sequence variants in the RNA itself. Across all examined categories of alternative splicing, 50–60% of the events were significantly associated with local variants, the strongest signal often residing within 1 Mb of the event (Figure 3F). We observe that for many of the genetically regulated splicing events the data show an excess of 0 or 1 PSI values that correlate with the genetic variant (e.g Figure 3G), rather than a shift in a continuous quantitative distribution. This observation suggests that many of the genetically regulated events in alternative splicing sequence play a deterministic role.

Trans eQTL hotspots

Trans eQTLs are not uniformly distributed across the genome, clustering into potential hotspots of genome wide regulation. Such hotspots have been observed in several datasets (Orozco et al., 2015; Orozco et al., 2012; Tian et al., 2015), and they are thought to represent cases where a cis acting variant affects a regulatory gene, e.g. transcription factor, subsequently affecting expression of multiple targets. To identify trans hotspots we divided the genome into bins of 100 kb, and counted the number of distinct genes which exhibit a trans interaction associated with any of the SNPs in the bin (Figure 4A). Importantly, the bins do not necessarily contain the same number of SNPs, and are an order of magnitude shorter than the typical LD blocks in the inbred mouse genome. Just counting the number of SNPs in a bin that are associated in trans would be confounded by SNP density. However, since LD essentially recapitulates the same interactions over and over again, counting the number of genes per bin, rather than the number of SNPs, is not significantly affected by LD.

eQTL mapping suggests trans eQTL hotspots in the hypothalamus regulate expression of hundreds of genes.

(A) Mouse genome was broken into 100 kb bins. The plot presents genome wide counts of genes which expression is associated with SNPs in that region, in trans. (B) Zoom of trans eQTL locus on chromosome 15. Peak SNP (associated with most genes in trans, rs31703733) is shown in red, color of other SNPs indicates r2 to rs31703733. Lower track shows the 10 genes which expression is associated with rs31703733 locally. C,D,E pertain to the 10 genes associated locally with rs31703733 and therefore potentially mediate the trans effects. (C) Summary table about each gene. (D) Heatmap showing correlation of expression between genes associated with rs31703733 locally, and genes associated with rs31703733 in trans. Color indicates Pearson correlation coefficient. (E) Example correlation between potential regulator (RApgef3) and trait (HDL levels).

https://doi.org/10.7554/eLife.15614.010

There are clearly two very strong trans acting hotspots on chromosome 1 and 15, which are observable for eQTLs, but not for isoQTLs, tssQTLs or cdsQTLs. Each of the two hotspots span 5–6 Mb and include >500 SNPs regulating >400 genes in trans (Figure 4B, Supplementary file 2). Functional enrichment analysis of gene targets of these hotspots suggests that the hotspot on chr1 regulates multiple genes involved in nucleotide binding, while genes regulated by chr15 locus are clearly associated with ion transport in synapse activity (Figure 4C, Supplementary file 2). Moreover, consistent with the hypothesis that trans regulation is mediated by local effects, the trans acting SNPs in these hotspots, are also associated with expression of several local genes (Figure 4B,C, Supplementary file 2).

To identify potential mediators of trans regulation, we examined the trans eQTL hotspot on chr15 in more detail. In this region a cluster of SNPs, which share high linkage disequilibrium, is associated with the expression of the majority of the genes mapping to this locus in trans (Figure 4B). We focused on the SNP with most trans associations (rs31703733), and found that while the hotspot itself contains dozens of genes, rs31703733 constitutes a local eQTL for only 10 of them (Figure 4B). Moreover, the expression levels of genes regulated in trans by the hotspot as well as 6 genes that were local eQTLs were closely correlated (Figure 4C), forming a coexpression module. Further, this module was significantly associated with triglycerides and cholesterol measurements in the mice. Interestingly, the two strongest local eQTL genes for this hotspot, Endou (an endonuclease) and Rapgef3 (also known as EPAC, cAMP activated guanine exchange factor involved in Ras signaling) were significantly associated with cholesterol and triglyceride measures in this study as well.

Genetic association does not necessarily imply that the associated allele is causative for the change in expression. In fact, the majority of the eQTL SNPs fall into regions with poor or no annotation, making it unlikely to be the actual causative variant. However, surprisingly, peak SNPs in the chr15 hotspot (SNPs rs31703733 and rs31780670) are located within a H3K4me1 histone methylation mark associated with enhancers, and which was detected in all neuronal tissues probed by ENCODE in adult mice (cortex, cerebellum, olfactory bulb), but not in other metabolic tissues such as liver, heart, intestine or lung (UCSC genome browser). Based on this preliminary and indirect evidence one may hypothesize that rs31703733 and rs31780670 potentially affect the enhancer activity and the expression of nearby genes. However, to reach any functional conclusion based on our data would require experimental validation in model systems.

Long non-coding RNAs in the mouse hypothalamus

Long non coding RNA (lncRNAs) are a generally poorly characterized class of RNA molecules, with sometimes unclear classification (St Laurent et al., 2015). Commonly used criteria for identification of lncRNAs are a transcript >200 bp long without an obvious potential open reading frame. In contrast to the relative paucity of data regarding the functionality of most lncRNAs, several specific lncRNAs (e.g. Xist, HOTAIR or H19) had been studied extensively and shown to play an essential role in cellular function (Quek et al., 2015). Recent large RNA-Seq studies and integrative projects such as ENCODE suggest that lncRNAs likely constitute up to 60% of the transcribed RNAs (Djebali et al., 2012). Moreover, in recent years an increasing number of functional studies have shown that lncRNAs play an important role in the regulation of transcription and translation, as well as in cell differentiation, signaling and other processes (Sun et al., 2013; Guttman et al., 2011; Ramos et al., 2015). Further, lncRNAs are enriched for genetic association signals in genome wide association studies (Iyer et al., 2015; Kumar et al., 2013).

GENCODE M2 annotation classifies 4540 lncRNA transcripts into at least 5 biotypes, based on their overlap with protein coding genes (http://www.gencodegenes.org/mouse_releases/2.html). The two most common biotypes - long intergenic non coding RNAs (lincRNAs) and antisense RNAs, account for 97% of all the annotated lncRNAs. Since our data were generated without retaining strand specificity, we focused on lincRNAs only. LincRNAs have been shown to be highly tissue specific (Cabili et al., 2011), and therefore it is not surprising that although there are 2417 annotated lincRNA transcripts, we found only 381 expressed in the mouse hypothalamus at our filtering criteria. The 381 transcripts represent 237 known and 144 novel isoforms of 181 lincRNA genes, with both novel and known isoforms showing similar expression levels (Figure 5A,B). We also used length and open reading frame criteria to examine the novel loci. Notably, the RNA length criteria of >200 bp is a commonly accepted parameter, while the length of minimal potential open reading frame varies between 30–100 aa in different studies. Based on these criteria we can identify up to 129 potentially novel lincRNAs expressed in the mouse hypothalamus, with 49 of them being spliced. Since lincRNAs have been shown to be often highly tissue specific, we consider the novel lincRNAs to be good candidates for hypothalamus specific transcripts.

Expression of long non coding RNAs in the hypothalamus is phenotypically relevant.

(A) Expression of heatmap known non coding RNAs and novel isoforms of these genes n the HMDP. Six lncRNAs (top cluster, Meg3, Gm26924, Snhg4, Miat, 6330403K07Rik, and Malat1) are highly expressed in almost all samples. (B) Novel isoforms of lncRNAs are expressed at a similar level of known ones. (C) Long non coding RNAs are associated with multiple phenotypes in the HMDP. (D) An example of association between a non coding RNA C330006A16Rik and average food intake.

https://doi.org/10.7554/eLife.15614.011

We used gene expression levels to explore the correlation of 310 lincRNAs (181 known and 129 novel genes) with metabolic traits in HMDP. Up to 35% of lincRNA transcripts significantly (p<1e–3) correlated to at least one phenotype in the HMDP, with a few lincRNAs associated with a multitude of related phenotypes (Figure 5C,D). The statistical threshold was determined by permutations in previous studies of gene expression in HMDP, is less stringent than a Bonferroni correction, and reflects interdependencies among phenotypes. Expression of >30% of lincRNAs is associated with a significant local eQTL, suggesting that a considerable number of lincRNAs in the hypothalamus are playing an important role in translation of genetic regulatory variance into physiologic phenotypes. Notably, six lincRNAs – Meg3, Gm26924, Snhg4, Miat, 6330403K07Rik, and Malat1– were highly expressed in most strains (Figure 5A upper cluster). Meg3 is a known imprinted tumor suppressor gene (Zhang et al., 2010) and was recently implicated in hepatic insulin resistance (Zhu et al., 2016). Miat and Malat1 are both part of nuclear bodies. Knockout models of Malat1 are all viable and normal (Zhang et al., 2012; Eißmann et al., 2012; Nakagawa et al., 2012), however their response to metabolic challenges, such as high fat diet, had not been assessed. Myocardial infarction associated transcript (Miat) was initially linked to myocardial infarction through a genetic association study(Ishii et al., 2006). Subsequently, Miat was shown to regulate development of neuronal progenitors, involved in schizophrenia pathogenesis and fear response(Aprea et al., 2013; Liao et al., 2016). Although all of the six highly expressed lincRNAs are not novel, none of them were previously reported to be expressed in the hypothalamus, or to play an established role in the metabolic or reproductive system.

Previous investigations have documented several examples of lincRNAs that code for short translated open reading frames (Anderson et al., 2015). In addition, recent work has shown that lincRNAs can be associated with ribosomes, and their occupancy is similar to that of protein coding transcripts (Ingolia et al., 2014; Ruiz-Orera et al., 2014). We detected 6 peptides which mapped uniquely to 3 different lincRNAs (Gm26825, Gm16295, and Gm26593), suggesting that translation of lincRNAs occurs rarely, and that their association with ribosomes is more likely to be in the context of translational regulation of other protein coding transcripts.

RNA editing in the mouse hypothalamus

RNAseq provides the opportunity to look at RNA sequence modifications in a quantitative manner. In this study, we examined RNA editing patterns of all possible substitution types over the genome. A recent study showed that genetic variation affects C to U editing in the intestine, both in site specific and genome wide manners in the mouse Diversity Outbred cross (Gu et al., 2015). We hypothesized that genetic variation may contribute to the extent of RNA editing either in a site specific (e.g. by altering editing sites in cis) or genome wide manner (e.g. by altering the expression or specificity of editing enzymes). Altogether we detected 8462 potential editing sites in 3319 genes. As expected, the majority of edits (>70%) were A-to-G modifications, and the total number of detected sites in each strain varied between several hundred to over a thousand (Figure 6A). A comparison of our findings to previously reported editing sites in two databases - DARNED and RADAR - suggests that 75% of these are novel. We then used the proportion of edited reads per site, or per substitution type, across strains as a quantitative trait for mapping genetic variants that contribute to editing. We did not detect any significant genetic association for genome-wide levels of editing, which is consistent with both lack of genetic variation in ADAR in our panel and its very stable expression level across samples (49 ± 0.27 FPKM).

RNA editing is prevalent at the mouse hypothalamus at low levels.

(A) A total of 8462 editing sites were identified in the HMDP, with A to G accounting for >70% of the modifications. (B) Number of sites identified in each strain (color coding as in A). (C) Editing level at 90 sites, that were detected in at least 70% of the samples, were mapped. Heatmap shows variation in editing in these sites among the strains. (D) An example of an edited site in Ociad1 gene, and its genome wide mapping result.

https://doi.org/10.7554/eLife.15614.012

Most editing sites were detected only in a small number of strains, which precluded meaningful mapping. For the site specific mapping we therefore focused on 90 sites that were detected in >70% of the strains (Figure 6C). We detected editing QTLs for 3 specific sites. For example, one of the editing sites in Ociad1 (on chr5 at 73312444) had a strong association (p<1e–8) with genetic variation residing on the same chromosome (Figure 6D). Altogether our data suggest that RNA editing occurs at low level in thousands of genes, however the impact of genetic variation on the editing level in mouse brain is limited.

Association of hypothalamic expression and phenotypes

There are 150 phenotypes available for the HMDP, reflecting many clinically relevant traits as well as various metabolic measures (Parks et al., 2013). We used several approaches to examine the potential role of hypothalamic expression in the various phenotypes. We began by identifying the top 500 transcripts that were significantly correlated with each of the phenotypes (correlation p values ranged from 4e–3 to <2e–16, Supplementary file 3). We then analyzed each of these expression sets for enrichment of potential pathways and functional annotation, using DAVID (Huang et al., 2009a, 2009b). Surprisingly, only a few of those expression sets showed moderate to strong enrichment (Figure 7B). For example, fat mass accumulation between 4 and 8 weeks, was distinctly associated with pathways related to oxidative phosphorylation and energy metabolism, while genes associated with food intake were enriched for ribosome related pathways. Clustering of phenotypes based on the proportion of shared genes associated with each trait (Figure 7A), clearly showed that related phenotypes also shared expression dependencies. For example, up to 70% of the genes most correlated with 'esterified cholesterol', were also correlated with 'total cholesterol', and more than 30% were shared with 'HDL'. Together these data validate the notion that related phenotypes share underlying molecular mechanisms, yet these shared genetic correlations may not necessarily correspond to specific readily identifiable pathways.

Groups of genes are associated with multiple related phenotypes in HMDP, although not necessarily enriched for GO ontology or specific pathways.

(A) Fraction of co-shared genes among the 500 genes most associated with a phenotype. (B) Enrichment analysis of the top 500 genes associated with each of the 150 phenotypes results in a small number of significant associations.

https://doi.org/10.7554/eLife.15614.013

We then analyzed whether the novel genes and isoforms we uncovered potentially contribute to phenotypic diversity. We found 232 novel transcripts associated with at least one phenotype (Supplementary file 3). For example TCONS_00279690, was strongly associated with several metabolic traits, such as total mass in response to a high fat, high carbohydrate dietary challenge, initial mouse insulin levels, as well as weight and fat mass (Supplementary file 3). Similar to known isoforms, expression of ~30% of the novel isoforms was significantly correlated with phenotypes. Correlation coefficients and p values were also similar between known and novel isoforms, suggesting that novel isoforms are as likely to contribute to mouse diversity as previously identified transcripts.

Expression of cell population markers in the hypothalamus

The hypothalamus is a heterogeneous brain region containing multiple nuclei that affect different aspects of metabolism and endocrine physiology. One of the drawbacks of our approach, dictated by practical considerations, is that we performed RNA-Seq on the entire hypothalamus rather than particular cell populations or nuclei. This approach likely results in dilution of signals from any one population of cells. Nevertheless, the sensitivity of RNA-seq allows examination of expression of particular cellular markers that are specific to certain cell types. To assess which cell populations are well represented in our data, we examined the expression, genetic regulation and association with phenotypes of some of the known markers of hypothalamic cell populations (Table 4). Our data show high expression of well-known hypothalamic neuronal markers, such as Agouti-related protein (Agrp), pro-opiomelanocortin (Pomc), hypocretin (orexin, Hcrt) and steroidogenic factor-1 (SF1). In addition, we detect high expression of oligodendrocyte markers and microglia, but only modest expression of epithelial markers.

Table 4

Expression of hypothalamic markers.

https://doi.org/10.7554/eLife.15614.014
GeneMarker forMean expressionlocal eQTL
AgRPARC neurons119.040ND
NPYARC neurons18.162ND
Foxo1ARC neurons4.736ND
POMCARC neurons271.9542.618E-04
Hcrt (orexin)LHA neurons128.8982.053E-05
Sf1VMHvl neurons65.333ND
Nkx2-1VMHvl neurons6.0671.100E-05
Tac-1VMHvl neurons36.3731.624E-09
BDNFVMHvl neurons4.464ND
Esr1Multiple2.658ND
LEPRMultiple1.6442.041E-04
INSRMultiple9.496ND
CX3CR1Microglia9.7901.400E-05
AIF-1Microglia98.598ND
CD68Microglia4.162ND
ItgamMicroglia3.599ND
MyD88Microglia2.3691.942E-17
Aqp4Astrocytes30.041ND
Slc1a3Astrocytes120.203ND
Aldh1l1Astrocytes16.787ND
GfapAstrocytes72.5022.140E-06
VEGF-ATanycytes39.559ND
CX3CL1Neurons99.4119.836E-09
MogOligodendrocytes34.5691.807E-04
MbpOligodendrocytes1266.574ND
Plp1Oligodendrocytes569.620ND
Car4Endothelial10.694ND
EsamEndothelial11.6233.184E-04
Flt1Endothelial9.538ND
Cldn5Endothelial15.594ND

Interestingly, we identified genetic variation affecting the expression of some key functional molecules for metabolic regulation and response to high fat diet. For example, myeloid differentiation primary response 88 factor (Myd88), is a Toll-like receptor (TLR) adaptor molecule. This protein mediates fatty acid induced inflammation and leads to leptin and insulin resistance in the central nervous system. Mice with CNS specific deletion of Myd88, are protected from high fat diet induced weight gain, and development of leptin resistance induced by acute central application of palmitate (Kleinridders et al., 2009). Our data show that there is a strong eQTL (p value 1.9e–17, Table 4) modulating expression of Myd88. In addition, we detect genetic variations that affect expression of key molecules such as leptin receptor (LepR) and Pomc. Together, these data suggest that the genetic background of inbred mice is an important factor in functional studies, and that the results of molecular perturbations of hypothalamic metabolic pathways can be modulated by genetics.

In addition to exploring the links between genetic variation and expression traits, we also looked into the association of transcript levels with phenotypes. We confirmed known relationships – for example the expression of orexigenic neuropeptide Y (NPY) was associated with total body mass (pvalue = 2.4e–6). Interestingly, one of the strongest correlations we observed were between fractalkine receptor Cx3cr1 and fat mass response (p = 9.62E–10). Fractalkine (Cx3cl1) is a chemokine, that was recently implicated in diet induced obesity, insulin regulation and promotion of hypothalamic inflammatory response to fatty acids (Shah et al., 2015). However, different models of Cx3cr1 knockout mice resulted in variable results on diet induced obesity. The correlation we found is consistent with previous reports that identify a central role for fractalkine receptor Cx3cr1 as a regulator of diet induced obesity and hypothlamic inflammation (Lee et al., 2013, Morari et al., 2014). Our results also indicate that the expression of Cx3cr1 is affected by genetic background, and suggest that one possible explanation for the heterogeneity in Cx3cr1 knockout results is the different genetic backgrounds used in those studies.

Discussion

In this work we present a comprehensive picture of the transcriptome of the mouse hypothalamus and its genetic variation and regulation. We identify thousands of new isoforms, and >400 new genes, and show independent support for these being translated into protein, which further validates our data. Notably, transcription of pseudogenes had been noticed previously, and likely plays a role in gene regulation. Recent shotgun proteomic studies of the human proteome strongly suggest that a sizable fraction of pseudogenes and lncRNAs is translated (Ji et al., 2015; Kim et al., 2014). The peptide data from our study supports low level translation of processed pseudogenes and is in line with these results.

The hypothalamus is a highly heterogeneous tissue with multiple nuclei and cell types acting in concert. A recent RNA-seq study of fed and food-deprived mice showed that cell type specific transcripts in hypothalamic Agrp and Pomc neurons exhibited specific co-expression networks associated with feeding (Henry et al., 2015). In contrast, due to the practical constrains of the study, our data were collected on the entire hypothalamus, and as such would be less sensitive to cell type specific signals. Nevertheless, expression of cell-specific markers and functional molecules showed that our approach recapitulates known correlations between genes and metabolic phenotypes, and identifies new ones. In addition, our data capture the various non neuronal cell types, such as microglia or astrocytes, which are often overlooked in the mostly neuron focused studies of the hypothalamus. These cells are important mediators of hypothalamic inflammation and other processes induced by a high fat diet. Regulation of gene expression in this population impacts every aspect of metabolism. While cell type specific transcriptomics is valuable for understanding cellular processes in hypothalamic neurons, our data provide a robust framework recapitulating transcriptional processes affecting multiple cell populations. Our approach is thus complementary to the cell type-specific transcriptomic efforts.

In this study, we showed extensive genetic regulation of transcription and alternative splicing in the hypothalamus and identified two loci which influence transcription of hundreds of genes in trans. In addition, our data indicate that a considerable proportion of the new isoforms and transcripts are significantly correlated to physiological phenotypes. While human studies generally lack the power to address trans regulation on a genome-wide scale, the HMDP provides a powerful resource for such analysis. Indeed, we identified two very strong trans acting hotspots that seem to harbor major regulators of gene expression in hypothalamus. We suggest that the trans effects of genetic variation in these regions are likely mediated by local interactions, which is consistent with previously observed cases (Small et al., 2011; Heinig et al., 2010). Enrichment analyses clearly suggest that each trans eQTLs hotspot regulates a set of functionally enriched genes (e.g. the hotspot on chromosome 15 is strongly associated with ion transport in synaptic areas) suggesting a new link between genetic variants in these loci and specific cellular function. We further showed that the genes associated with these hotspots correlate to physiological phenotypes, such as HDL and triglyceride levels providing insight into the mechanism behind correlation of these genotypes with complex traits. The connection between the associated genes and traits does not imply direct causality. For example, ion transport is regulated by circadian rhythm (Ko et al., 2009), which in turn is associated with many other aspects of metabolism (Tsuneki et al., 2016).

Notably, although several examples of trans eQTL hotspots were published and analyzed (Small et al., 2011, Heinig et al., 2010Tian et al., 2016), authenticity of such hotspots remains somewhat controversial, and trans eQTL hotspots may arise due to uncontrolled batch effects (Breitling et al., 2008; Kang et al., 2008; Joo et al., 2014), which are difficult to distinguish from real interactions. To minimize technical batch effects arising from sequencing, we used a round robin approach where each sample was sequenced as part of several pools (see details in Materials and methods). Batch effects may also arise from random environmental exposures, rather than technical sample preparation. In such case, it is unlikely that those effects would be tissue specific or affect only gene level analysis. We found no indication of these trans eQTL hotspots in the adipose or liver data from the same cohort of mice (Parks et al., 2013), nor were the hotspots detected for isoQTL. Thus, while we cannot exclude the possibility that trans eQTL hotspots described in this study may had arisen from unaccounted environmental or technical effects, we believe that this is unlikely, but further molecular studies are required to validate our results.

RNA-Seq allowed us to quantify transcripts at different levels of analysis- from the total expression of all isoforms in a locus, to transcript specific estimates and their combinations. Our analysis showed that regulation of the total number of transcripts from a gene is far more common than isoform specific regulation. Nevertheless, we were able to identify specific interactions at every level of detail we explored, namely eQTLs, isoQTLs, tssQTLs and cdsQTLs. In addition, we identified and quantified >7000 alternative splicing events affecting >3500 genes in the hypothalamus, and showed that these events are mostly affected by extensive local genetic regulation. In many cases of alternative splicing QTLs, the associated variant resulted in an excess of either 0 or 1 splice PSI values, suggesting that variants affecting splicing act as strong determinants rather than weak contributors to a complex trait.

RNA editing is a result of post transcriptional deamination processes whereby an adenosine is converted to inosine (A-to-I), or cytosine to uracil (C-to-U). Both types of RNA editing are mediated by specific enzymes members of the ADAR family facilitate the A-to-I editing which is most commonly observed in neuronal tissues, while Apobec1 mediates the C-to-U editing, and is primarily expressed in the intestine and liver. Editing is a tissue specific process which usually results in modification of only fraction of the transcripts, and therefore can be regarded as a quantitative trait. Previous studies showed that polymorphisms either in the editing enzymes or in the sequence proximal to editing site affect the extent of editing. Specifically, Gu et al recently reported a sequence variant in Apobec1 which affects in trans multiple C-to-U editing sites in the mouse liver (Gu et al., 2015). As expected from previous studies, A-to-I was by far the most common RNA editing event in our study. Further, our data show that RNA editing occurs at low levels in thousands of sites, but is highly variable. Less than <1% of the sites were consistently detected in the adult mouse brain across >75% HMDP strains. Consistent with the lack of coding genetic variation in the ADAR enzymes in our mouse panel, and with their invariable expression levels, we did not observe any trans editing QTL that would affect editing levels of multiple sites. However, we found a few local associations that seem to affect editing of specific sites. For example a strong local editing QTL (pOciad1 gene (also known as Asrij). This editing QTL is likely due to a sequence variant in or near the Ociad1 gene. Notably, Ociad1 is expressed in stem cells, where it regulates pleuropotency via the JAK/STAT pathway (Sinha et al., 2013). Editing of Ociad1 or its expression in neuronal tissue was not reported before.

Another significant aspect of our data relates to long non-coding RNAs. LincRNAs have been shown to play an important role in various cellar functions. Recent genome annotations include thousands of known lincRNAs, yet most of them remain functionally uncharacterized, and only a few studies have examined the genetic control of their expression in detail. Moreover, lincRNAs are often expressed in a tissue specific manner, and therefore are not readily identifiable from general expression datasets. In this study, we detected expression of 381 known and novel lincRNA isoforms, and also identified 129 novel, potentially hypothalamus- specific lincRNAs. Our data indicate that lincRNAs are subject to similar variation in expression, and exhibit similar overall genetic control as the coding genes. Specific lincRNAs have been implicated in a variety of phenotypes (Guttman et al., 2011; Huarte et al., 2010; Kumar et al., 2013), and our data indicate strong correlations between some of the hypothalamic lincRNAs and metabolic phenotypes, such as body weight.

The hypothalamus is a very heterogeneous tissue, and one of the major drawbacks of our analysis is that we used whole hypothalamus, rather than dissecting specific nuclei. This limitation stems from practical considerations – meaningful expression QTL analysis requires sacrifice of hundreds of mice, while the dissection of specific hypothalamic nuclei is delicate and time consuming and thus was not feasible within the constraints of this study. Still, this shortcoming is likely to limit our power to detect meaningful associations, rather than introducing spurious ones. Moreover, since our study mostly focuses on genetic regulation of transcription, which was shown to be largely shared among tissues and cell types, we believe that the data presented in this work are not substantially confounded by heterogeneity.

To summarize, our data fill a substantial gap and will be useful to the research community. The hypothalamus is composed of multiple nuclei, which are distinct in morphology and function, and many laboratories focus on disentangling the complex interactions that ultimately affect metabolism and behavior. However, genome wide transcriptome analysis of this tissue has not been published, and genetic regulation of transcription as well as tissue specific transcripts remains largely obscure. We believe that our study is complimentary to physiological studies and will facilitate research into the crosstalk between the brain and other metabolic tissues. All of the expression data described in this paper are publicly available from NCBI archives GEO (GSE79551) and SRA (project number PRJNA314533).

Materials and methods

Samples, library preparation and sequencing

Altogether we sequenced 285 samples, from 99 strains of the HMDP. A total of 87 strains had 3 samples per strain, 11 strains had 2 samples per strain, and 2 strains had 1 sample per strain (a detailed list is in Supplementary file 1). RNA was extracted using Qiazol followed by miRNAeasy kit from Qiagen (RRID:SCR_008539). Unstranded mRNA libraries were prepared by the UCLA Neuroscience Genomics Core with Illumina standard kits (TruSeq v3) according to standard protocols. All samples were barcoded, and sequenced with ~18 samples per lane, with HiSeq2000 using 50 bp paired-end sequencing protocol. A round robin design was implemented such that biological replicates were sequenced on different lanes, and each sample was part of more than one sequencing pool. Samples were demultiplexed by sequencing facility, forward and reverse read fastq files were supplied for each sample.

Data quality control (QC) and mapping

Read QC was done using FastQC (RRID:SCR_005539) in batch mode. The samples had excellent quality, with all bases exceeding median quality score of 28, and >98% of the sequences with a mean quality score >28. The average number of reads per sample was 26.3 M. All reads were passed to mapping as is, without trimming or filtering. Samples were mapped to the mm10 genome using STAR aligner version 2.3.1 (https://github.com/alexdobin/STAR/releases/tag/STAR_2.3.1z9). The mm10 sequence was downloaded from http://cufflinks.cbcb.umd.edu/igenomes.html (UCSC annotation files). Reference sequences were built using known splice junctions (–sjdbGTFfile option) from known genes annotation file. Mapping was performed allowing up to 3 mismatches per read (--outFilterMismatchNmax 3), removing non canonical un-annotated junctions (--outFilterIntronMotifs RemoveNoncanonicalUnannotated) and allowing up to 10 multiple mappings per read (--outFilterMultimapNmax 10). Alignment files from all samples of the same strain were merged into one alignment file per strain using the 'merge' function of the samtools package. On average 94.1% of the reads mapped, and of the mapped reads 96.4% mapped uniquely. STAR also detected on average 2.8 M splices per sample, with 98% of them previously annotated. Detailed mapping counts for each sample are in Supplementary file 1.

Transcript assembly and quantification

Our pipeline is summarized in Figure 8. For the purpose of transcript assembly, sample specific alignment files were pooled into unified BAM alignment files by strain which were then sorted and indexed using samtools. Transcript assembly was done on each strain specific alignment file with Cufflinks v2.2.0., with mouse genome version mm10, and gtf file of known transcripts as a reference guide (-g option, reference file downloaded from UCSC genome browser), together with bias and multimap option corrections (-b and –u respectively). This step resulted in 99 strain specific transcript assemblies. We then used Cuffmerge to compare those assemblies to GENCODE M2 annotation (http://www.gencodegenes.org/), and to consolidate them into one unified assembly file representing all transcripts from GENCODE M2 and from the strain specific assemblies (merged.gtf). To obtain FPKM expression values, we run Cuffquant and Cuffnorm v2.2.1 with default parameters, using the sample specific alignment files and unified assembly (merged.gtf) file.

RNA-Seq analysis framework.

General workflow used for analysis of RNA-Seq data in this study. Initial demultiplexed samples (fastq files) were aligned to the mouse genome with STAR, merged in one file per strain, and transcripts assembeled with cufflings. The resulting assembly files (one from each strain) were merged with GENECODE M2 annotation into unified assembly. The abundance of each transcript in the unified assembly was estimated in sample specific alignment files.

https://doi.org/10.7554/eLife.15614.015

Filtering of the expressed features for subsequent analyses

Cufflinks assembly and abundance estimation results in assessment of gene expression at multiple levels of genomic resolution - transcripts, transcription start sites, coding sequence and loci (genes). We implemented 3 filtering steps:

First, we removed transcripts that are likely to be an assembly or sequencing artifacts, based on following criteria:

Comparison to GENECODE M2 indicated that this is not a known transcript (class code '='), new isoform of a known gene (class code 'j') or novel intergenic transcript (class code 'u').

b. Novel isoforms and intergenic transcripts which expression was <1e-6 FPKM in all samples.

Loci and transcription start sites associated exclusively with transcripts removed in steps 1 and 2.

This filtering step removed 7% of the transcripts, with the other 93% (535,316 features) treated as potentially true.

II. While we consider that all these features may potentially represent true splicing and transcription events, only features that show sufficient variation and expressed at appreciable level are useful in eQTL analysis. Further, many of the potentially new features were expressed either at very low levels or in a small number of samples. Therefore, at the second filtering step we implemented detection and expression thresholds (FPKM>0 in >=100 samples, and FPKM>1 in >=50 samples) to focus on ~100,000 expressed features that are more likely to produce meaningful mapping results in the HMDP panel.

Third, expression values (FPKM) of transcription start sites, coding sequences and genes are a sum of expression values of transcripts associated with these features. Consequently, if a locus, transcription start site or coding sequence is associated with only one transcript, the expression information of that feature is identical and thus redundant to the respective transcript. Therefore, such features were removed from eQTL mapping.

Altogether, this filtering reduced the number of explored features to ~ 93,000.

Allele-specific expression and cis eQTL

All features that passed the described above filtering criteria, were analyzed for eQTLs using expression estimates in 282 individual mice representing 100 strains, 193,400 SNPs and the fastLMM algorithm using an appropriate kinship matrix that accounts for the HMDP population structure. Cis acting eQTLs were identified by allele expression (ASE) as described previously (Hasin-Brumshtein et al., 2014). Briefly, allele-specific counts at each exonic SNP, filtered for quality control criteria, were summed in a genespecific manner. The resulting counts were normalized and analyzed for differential expression between the alleles using the DEseq package.

Identification of RNA editing sites

To identify RNA editing sites, we re-aligned the reads to the mouse genome (mm10) and transcriptome using the single nucleotide variant-sensitive aligner RASER (Ahn and Xiao, 2015) with the parameters –m 0.06 and –b 0.03. We then detected likely mismatch positions considering the read sequence quality score at that position and its position within in a read. We further required potential editing sites to be covered by >=5 total and >=3 edited reads and excluded positions that overlap SNPs or are located in homopolymers, simple repeats, and within 4nts of splice junctions (Lee et al., 2013).

LC-MS proteomics

Proteomic data were collected on 110 samples representing the 99 strains used in this study, plus 10 C57Bl/6 samples. All samples were independent biological replicates of the strains used in this study fed on the high fat, high sugar diet, and not the same hypothalami used for RNA extraction.

Sample preparation

The preparation of 110 mouse hypothalamus samples for LC-MS/MS analysis was performed according to the protocol described before (Ghazalpour et al., 2011). Briefly, the samples were split evenly across two 96-well plates, and homogenized using TissueLyser (QIAGEN, Hilden, Germany) in (8 M urea, 50 mM NH4HCO3 pH 7.8 and 10 mM DTT) at 20 Hz for 2 min with two repetitions. Upon denaturation for 1 hr at 37°C the extracted protein concentrations were measured using coomassie assay and readjusted using aforementioned denaturation buffer to 1.5 mg/ml. Equal aliquots of 150 μg were taken further into cysteine alkylation and trypsin digestion steps. After purification of peptides by solid phase extraction using C18 columns the average recovered peptide amount was 68 μg. The resulting peptide solutions were normalized to 0.5 μg/μL for LC-MS/MS analysis

Instrumental analysis

Samples were analyzed on an Orbitrap Velos mass spectrometer (Thermo Fisher, Waltham, MA) that was interfaced with a 75 μm i.d.×65 cm long LC column packed with 3 μm Jupiter C18 particles (Phenomenex) and a 5 μL injection loop. The mobile phase solvents consisted of (A) 0.2% acetic acid and 0.05% TFA in water and (B) 0.1% TFA in 90% acetonitrile. An exponential gradient was used for the separation, which started with 100% A and gradually increased to 60% B over 100 min.

Data analysis

The acquired MS/MS spectra datasets were preprocessed with DeconMSn [18304935] and DtaRefinery [20019053] software followed by spectra interpretation using MSGFplus [25358478] software by matching against custom protein fasta file build based on RNA-Seq data. The fasta file included sequences of known genes, newly discovered isoforms and novel genes. The MSGFplus MS/MS search settings were as follows: tryptic peptides only, 10 ppm parent ion mass tolerance, static cysteine carbamylation and maximum charge state 4+. The resulting mzIdentML files were analyzed using MSnID Bioconductor package [http://www.bioconductor.org/packages/release/bioc/html/MSnID.html] to confidently isolate identifications of know genes, isoforms and novel genes. The prior probabilities that the know genes, isoforms and novel genes are real and exist in the protein form are substantially different. Therefore the corresponding identifications were considered separately employing a process effectively emulating the proposed earlier cascade search [26084232]. Briefly, in the first pass we considered peptide identifications matching only to the protein sequences of the known genes (highest prior probability). The filtering criteria on peptide-to-spectrum matching (Spectrum E-Value <9.4e-11 and parent ion mass measurement tolerance <2.4 ppm) were optimized to achieve maximum peptide identifications whilst not exceeding 1% FDR based on reverse sequence identifications. Next, the spectra that match confidently identified 9910 peptide sequences of the known genes were removed from further consideration. In the next round we applied filtering criteria (Spectrum E-Value <2.5e–13 and parent ion mass measurement tolerance <10 ppm) that allowed to confidently identify 236 unique peptide sequences matching to isoforms. After further removing peptides matching to isoforms and applying 1% FDR optimized criteria (Spectrum E-Value <2.3e–9 and parent ion mass measurement tolerance <0.2 ppm) we identified 20 peptide sequences matching novel genes.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
    Genetic fine mapping and genomic annotation defines causal mechanisms at type 2 diabetes susceptibility loci
    1. KJ Gaulton
    2. T Ferreira
    3. Y Lee
    4. A Raimondo
    5. R Mägi
    6. ME Reschen
    7. A Mahajan
    8. A Locke
    9. NW Rayner
    10. N Robertson
    11. RA Scott
    12. I Prokopenko
    13. LJ Scott
    14. T Green
    15. T Sparso
    16. D Thuillier
    17. L Yengo
    18. H Grallert
    19. S Wahl
    20. M Frånberg
    21. RJ Strawbridge
    22. H Kestler
    23. H Chheda
    24. L Eisele
    25. S Gustafsson
    26. V Steinthorsdottir
    27. G Thorleifsson
    28. L Qi
    29. LC Karssen
    30. EM van Leeuwen
    31. SM Willems
    32. M Li
    33. H Chen
    34. C Fuchsberger
    35. P Kwan
    36. C Ma
    37. M Linderman
    38. Y Lu
    39. SK Thomsen
    40. JK Rundle
    41. NL Beer
    42. M van de Bunt
    43. A Chalisey
    44. HM Kang
    45. BF Voight
    46. GR Abecasis
    47. P Almgren
    48. D Baldassarre
    49. B Balkau
    50. R Benediktsson
    51. M Blüher
    52. H Boeing
    53. LL Bonnycastle
    54. EP Bottinger
    55. NP Burtt
    56. J Carey
    57. G Charpentier
    58. PS Chines
    59. MC Cornelis
    60. DJ Couper
    61. AT Crenshaw
    62. RM van Dam
    63. AS Doney
    64. M Dorkhan
    65. S Edkins
    66. JG Eriksson
    67. T Esko
    68. E Eury
    69. J Fadista
    70. J Flannick
    71. P Fontanillas
    72. C Fox
    73. PW Franks
    74. K Gertow
    75. C Gieger
    76. B Gigante
    77. O Gottesman
    78. GB Grant
    79. N Grarup
    80. CJ Groves
    81. M Hassinen
    82. CT Have
    83. C Herder
    84. OL Holmen
    85. AB Hreidarsson
    86. SE Humphries
    87. DJ Hunter
    88. AU Jackson
    89. A Jonsson
    90. ME Jørgensen
    91. T Jørgensen
    92. WH Kao
    93. ND Kerrison
    94. L Kinnunen
    95. N Klopp
    96. A Kong
    97. P Kovacs
    98. P Kraft
    99. J Kravic
    100. C Langford
    101. K Leander
    102. L Liang
    103. P Lichtner
    104. CM Lindgren
    105. E Lindholm
    106. A Linneberg
    107. CT Liu
    108. S Lobbens
    109. J Luan
    110. V Lyssenko
    111. S Männistö
    112. O McLeod
    113. J Meyer
    114. E Mihailov
    115. G Mirza
    116. TW Mühleisen
    117. M Müller-Nurasyid
    118. C Navarro
    119. MM Nöthen
    120. NN Oskolkov
    121. KR Owen
    122. D Palli
    123. S Pechlivanis
    124. L Peltonen
    125. JR Perry
    126. CG Platou
    127. M Roden
    128. D Ruderfer
    129. D Rybin
    130. YT van der Schouw
    131. B Sennblad
    132. G Sigurðsson
    133. A Stančáková
    134. G Steinbach
    135. P Storm
    136. K Strauch
    137. HM Stringham
    138. Q Sun
    139. B Thorand
    140. E Tikkanen
    141. A Tonjes
    142. J Trakalo
    143. E Tremoli
    144. T Tuomi
    145. R Wennauer
    146. S Wiltshire
    147. AR Wood
    148. E Zeggini
    149. I Dunham
    150. E Birney
    151. L Pasquali
    152. J Ferrer
    153. RJ Loos
    154. J Dupuis
    155. JC Florez
    156. E Boerwinkle
    157. JS Pankow
    158. C van Duijn
    159. E Sijbrands
    160. JB Meigs
    161. FB Hu
    162. U Thorsteinsdottir
    163. K Stefansson
    164. TA Lakka
    165. R Rauramaa
    166. M Stumvoll
    167. NL Pedersen
    168. L Lind
    169. SM Keinanen-Kiukaanniemi
    170. E Korpi-Hyövälti
    171. TE Saaristo
    172. J Saltevo
    173. J Kuusisto
    174. M Laakso
    175. A Metspalu
    176. R Erbel
    177. KH Jöcke
    178. S Moebus
    179. S Ripatti
    180. V Salomaa
    181. E Ingelsson
    182. BO Boehm
    183. RN Bergman
    184. FS Collins
    185. KL Mohlke
    186. H Koistinen
    187. J Tuomilehto
    188. K Hveem
    189. I Njølstad
    190. P Deloukas
    191. PJ Donnelly
    192. TM Frayling
    193. AT Hattersley
    194. U de Faire
    195. A Hamsten
    196. T Illig
    197. A Peters
    198. S Cauchi
    199. R Sladek
    200. P Froguel
    201. T Hansen
    202. O Pedersen
    203. AD Morris
    204. CN Palmer
    205. S Kathiresan
    206. O Melander
    207. PM Nilsson
    208. LC Groop
    209. I Barroso
    210. C Langenberg
    211. NJ Wareham
    212. CA O'Callaghan
    213. AL Gloyn
    214. D Altshuler
    215. M Boehnke
    216. TM Teslovich
    217. MI McCarthy
    218. AP Morris
    219. N William Rayner
    220. Jian'an Luan
    221. T Jørgensen
    222. S Männistö
    223. A Stančáková
    224. KH Jöcke
    225. O Pedersen
    226. AP Morris
    227. DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium
    (2015)
    Nature Genetics 47:1415–1425.
    https://doi.org/10.1038/ng.3437
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
    Biological, clinical and population relevance of 95 loci for blood lipids
    1. TM Teslovich
    2. K Musunuru
    3. AV Smith
    4. AC Edmondson
    5. IM Stylianou
    6. M Koseki
    7. JP Pirruccello
    8. S Ripatti
    9. DI Chasman
    10. CJ Willer
    11. CT Johansen
    12. SW Fouchier
    13. A Isaacs
    14. GM Peloso
    15. M Barbalic
    16. SL Ricketts
    17. JC Bis
    18. YS Aulchenko
    19. G Thorleifsson
    20. MF Feitosa
    21. J Chambers
    22. M Orho-Melander
    23. O Melander
    24. T Johnson
    25. X Li
    26. X Guo
    27. M Li
    28. Y Shin Cho
    29. M Jin Go
    30. Y Jin Kim
    31. JY Lee
    32. T Park
    33. K Kim
    34. X Sim
    35. R Twee-Hee Ong
    36. DC Croteau-Chonka
    37. LA Lange
    38. JD Smith
    39. K Song
    40. J Hua Zhao
    41. X Yuan
    42. J Luan
    43. C Lamina
    44. A Ziegler
    45. W Zhang
    46. RY Zee
    47. AF Wright
    48. JC Witteman
    49. JF Wilson
    50. G Willemsen
    51. HE Wichmann
    52. JB Whitfield
    53. DM Waterworth
    54. NJ Wareham
    55. G Waeber
    56. P Vollenweider
    57. BF Voight
    58. V Vitart
    59. AG Uitterlinden
    60. M Uda
    61. J Tuomilehto
    62. JR Thompson
    63. T Tanaka
    64. I Surakka
    65. HM Stringham
    66. TD Spector
    67. N Soranzo
    68. JH Smit
    69. J Sinisalo
    70. K Silander
    71. EJ Sijbrands
    72. A Scuteri
    73. J Scott
    74. D Schlessinger
    75. S Sanna
    76. V Salomaa
    77. J Saharinen
    78. C Sabatti
    79. A Ruokonen
    80. I Rudan
    81. LM Rose
    82. R Roberts
    83. M Rieder
    84. BM Psaty
    85. PP Pramstaller
    86. I Pichler
    87. M Perola
    88. BW Penninx
    89. NL Pedersen
    90. C Pattaro
    91. AN Parker
    92. G Pare
    93. BA Oostra
    94. CJ O'Donnell
    95. MS Nieminen
    96. DA Nickerson
    97. GW Montgomery
    98. T Meitinger
    99. R McPherson
    100. MI McCarthy
    101. W McArdle
    102. D Masson
    103. NG Martin
    104. F Marroni
    105. M Mangino
    106. PK Magnusson
    107. G Lucas
    108. R Luben
    109. RJ Loos
    110. ML Lokki
    111. G Lettre
    112. C Langenberg
    113. LJ Launer
    114. EG Lakatta
    115. R Laaksonen
    116. KO Kyvik
    117. F Kronenberg
    118. IR König
    119. KT Khaw
    120. J Kaprio
    121. LM Kaplan
    122. A Johansson
    123. MR Jarvelin
    124. AC Janssens
    125. E Ingelsson
    126. W Igl
    127. G Kees Hovingh
    128. JJ Hottenga
    129. A Hofman
    130. AA Hicks
    131. C Hengstenberg
    132. IM Heid
    133. C Hayward
    134. AS Havulinna
    135. ND Hastie
    136. TB Harris
    137. T Haritunians
    138. AS Hall
    139. U Gyllensten
    140. C Guiducci
    141. LC Groop
    142. E Gonzalez
    143. C Gieger
    144. NB Freimer
    145. L Ferrucci
    146. J Erdmann
    147. P Elliott
    148. KG Ejebe
    149. A Döring
    150. AF Dominiczak
    151. S Demissie
    152. P Deloukas
    153. EJ de Geus
    154. U de Faire
    155. G Crawford
    156. FS Collins
    157. YD Chen
    158. MJ Caulfield
    159. H Campbell
    160. NP Burtt
    161. LL Bonnycastle
    162. DI Boomsma
    163. SM Boekholdt
    164. RN Bergman
    165. I Barroso
    166. S Bandinelli
    167. CM Ballantyne
    168. TL Assimes
    169. T Quertermous
    170. D Altshuler
    171. M Seielstad
    172. TY Wong
    173. ES Tai
    174. AB Feranil
    175. CW Kuzawa
    176. LS Adair
    177. HA Taylor
    178. IB Borecki
    179. SB Gabriel
    180. JG Wilson
    181. H Holm
    182. U Thorsteinsdottir
    183. V Gudnason
    184. RM Krauss
    185. KL Mohlke
    186. JM Ordovas
    187. PB Munroe
    188. JS Kooner
    189. AR Tall
    190. RA Hegele
    191. JJ Kastelein
    192. EE Schadt
    193. JI Rotter
    194. E Boerwinkle
    195. DP Strachan
    196. V Mooser
    197. K Stefansson
    198. MP Reilly
    199. NJ Samani
    200. H Schunkert
    201. LA Cupples
    202. MS Sandhu
    203. PM Ridker
    204. DJ Rader
    205. CM van Duijn
    206. L Peltonen
    207. GR Abecasis
    208. M Boehnke
    209. S Kathiresan
    (2010)
    Nature 466:707–713.
    https://doi.org/10.1038/nature09270
  64. 64
  65. 65
  66. 66
  67. 67
  68. 68
  69. 69
  70. 70

Decision letter

  1. Joel K Elmquist
    Reviewing Editor; University of Texas Southwestern Medical Center, United States

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

[Editors’ note: this article was originally rejected after discussions between the reviewers, but the authors were invited to resubmit after an appeal against the decision.]

Thank you for submitting your work entitled "Hypothalamic transcriptomes of 99 mouse strains reveal trans eQTL hotspots, splicing QTLs and novel non-coding genes" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Mark McCarthy as the Senior Editor. One of the three reviewers has agreed to reveal her identity: Penelope Bonnen (Reviewer #1).

Our decision has been reached after consultation between the reviewers. Based on these discussions and the individual reviews below, we regret to inform you that your work will not be considered further for publication in eLife.

While all of the reviewers found your data to be of potential interest, they all raised substantive concerns regarding both study design and data presentation. The consensus was that addressing all of these issues would entail substantial work. As you may know, one of the goals of eLife is to avoid protracted cycles of "long-loop" revision, and this means that we try to avoid any recommendations for revision that are likely to take longer than a few weeks.

Reviewer #1:

1) Trans eQTL hotspots analysis is nicely done and the authors relate two well-supported and interesting examples of trans eQTLs. Two comments for this section. Firstly, the method used to identify regions of trans eQTLs was blind to LD; a simple distance (100 kb) based bin was used and the number of trans acting SNPs counted. The authors relate that a 'hotspot' of trans acting SNPs contained several SNPs in LD with each other. This approach is valid, but may skew results to identify regions with a larger number of trans acting SNPs because of the underlying LD structure. In this case LD helps to identify these regions. However, there may be regions with less LD but still strong trans acting SNPs; using the method these regions will be de-emphasized, not due to the strength of trans acting elements but rather due to LD architecture of the genome. The authors may wish to mention this. Second, poorly annotated regions of the genome are not devoid of functional elements and it may be a bit of an overstatement to say the eQTL SNPs in these regions are unlikely to be causative.

2) Which lincRNAs were included in the association testing between lincRNAs and HMDP phenotypes? Were both the 381 plus 129 novel tested? In particular it would be interesting to clarify if any of the 129 newly identified lincRNAs associate with HMDP phenotypes. This reviewer found this sentence a little confusing 'none of these lincRNA were previously reported in the hypothalmus'; does this mean they all belonged to the group of 129 novel lincRNAs? Were these 129 totally novel or simply not previously known to be expressed in the hypothalamus? Some clarification would be helpful and would emphasize the findings that are particular to this study.

Reviewer #1 (Additional data files and statistical comments):

Please include the top 500 transcripts that associate with a phenotype in supplemental materials listed along with transcript ID, phenotype, and p-value. These are referred to in the first paragraph of the subsection “Association of hypothalamic expression and phenotypes” and are a valuable work product of this study. As such readers may benefit from this information.

Reviewer #2:

This manuscript focuses on a detailed analysis of hypothalamic transcriptomes from 99 mouse strains to among many things identify eQTL, spicing eQTL and novel non-coding genes and their relationship to metabolic and cardiovascular traits. Many of the aspects of the relevance or tissue-specific findings pertaining to hypothalamus were not well clarified leading to a concern with the overall study design. Many of the analyses suffer from potential interpretation or technical flaws limiting my enthusiasm.

A key feature of the manuscript is the identification of numerous novel isoforms and genes. It is an exceptional feature that the authors are able to confirm their data using peptide data. However, most of these are pseudogenes when manual characterized against NCBI and UCSC. Is this simply a deficiency in the GENCODE M2 annotation? Further, can the authors discount mismapped reads from highly homologous sequences to these pseudogenes. Key aspects of this pipeline are not well presented and I am very skeptical of the "novelty" of the set of genes they have found or even their relevance to the hypothalamus in specific let alone any hypothalamus specific function. Criteria for tissue-specificity should further be better described.

The distal eQTL information provides no extra information. Indeed, the authors themselves describe it as most likely a local signal.

The correlation of ASE and eQTL is a technical feature that has been reported in several other papers. It shouldn't be listed as a major feature of the manuscript and would be better as supplemental information.

The authors derive their own ASE approach using DEseq. This is not a conventional approach for this tool. Other published ASE-specific software exists. Can the authors explain why they did it this way?

How correlated are ASE sites within a gene? Does this support site aggregation for a gene-centric approach?

Trans-eQTLs can be spurious due to the chance correlation of a genetic marker with technical factors such as batch or biological factors such as sex or BMI. The authors do not provide enough detail/experimentation as to make these effects believable as true trans-eQTLs compared to spurious correlations.

Furthermore, for RNA-seq data many 1-1 trans-eQTLs can be the product of mismapping and are clearly identified when looking at the distribution of reads across a gene model in a tool like IGV.

The comment on rs31703733 is very speculative. The authors should prove that this is the case otherwise, I am not sure there is anything that can be definitely said about trans-eQTLs in this manuscript.

Some of the statements in the manuscript could benefit from more formally described multiple testing correction or an FDR. For instance "35% of lincRNA.… significantly (p<1e-3) correlate to at least one phenotype in the HMDP". If I do some rudimentary correction assuming 150 traits, I am not sure this is a significant finding. This is an issue for the trait analysis of the novel transcripts too.

How believable are the non A-to-G modification? Is 30% an expected number?

Reviewer #3:

In this manuscript, Hasin-Brumshtein and coauthors provide a systems-level analysis of hypothalamic gene expression, its regulation by genetic elements, and its association with metabolic phenotypes. The hypothalamus is composed of diverse and distributed neuronal subpopulations that control multiple aspects of metabolic homeostasis. Here, the authors test the hypothesis that differences in hypothalamic gene expression in a hybrid mouse diversity panel (HMDP) may influence the sensitivity of particular strains to phenotypes associated with diet-induced obesity. The authors provide two specific examples from their data of transcripts (1 annotated and 1 lincRNA) whose expression correlates with a specific metabolic endpoint (Figures 4E and 5D). To increase confidence in the analysis, the authors should validate the hypothalamic expression differences between strains for a handful of these transcripts by in situ hybridization.

Overall, the lack of functional validation of candidate genes with phenotypic outcomes severely limits this as a resource, and greatly diminishes the impact of this study. Nonetheless, this study does begin offering a detailed mapping of genomic regions underlying differential gene expression and RNA processing events in the hypothalamus, which with additional work could enhance our functional understanding of the hypothalamus. However, in its present form, it is not accessible to the research community working in the hypothalamus.

Hasin-Brumshtein et al. characterize the expression of both annotated and novel genes, transcript variants, and lincRNAs from the hypothalamus of mice representing 99 strains from the HMDP. The effort invested in obtaining and analyzing both genomic and proteomic data at this scale is impressive. However, profiling data presented would benefit from a more thorough description of mouse husbandry conditions prior to tissue collection. Were the mice raised on the same chow diet used by Parks et al. (2013) to establish the phenotypic responses of mice in the HMDP to high fat, high sucrose diet? In fact, the authors' description in the Introduction suggests that RNA-Seq was performed on "mice… fed a high fat high sugar diet." If so, it is unclear if the observed expression differences across the mouse strains drive their unique phenotypic responses to dietary challenge or if these differences are a response to such a challenge. Similarly, since RNA-Seq was performed on as few as 1 subject per strain, the authors should elaborate on steps taken to mitigate other environmental factors that might have influenced gene expression in the mice. Why not increase the power of this study by increasing the numbers on the most commonly used strains with the greatest divergence in metabolic outcomes?

The authors do acknowledge one major weakness which is the pan-hypothalamic approach, thus reducing their ability to identify genotype/phenotype associations with transcripts (Discussion, sixth paragraph). However, the statement at the end of that paragraph: "transcription… was shown to be largely shared among tissues and cell types" is confusing. If this is true, would not the small differences in expression between subpopulations of neurons be the most important in determining their functional specificity and in turn contribute to phenotypic differences? Such small differences are the least likely to be identified by analyzing the whole hypothalamus. Alternatively, the differences could arise from altered synaptic connectivity, which would also be difficult to resolve by expression profiling. This is an important point that is never discussed.

In addition, given that the hypothalamus is a critical region for many important sex-dependent metabolic and behavioral outcomes due to signaling from sexually dimorphic subsets of neurons, one wants to know how these data compare between male and females.

[Editors’ note: what now follows is the decision letter after the authors submitted for further consideration.]

Thank you for resubmitting your work entitled "Hypothalamic transcriptomes of 99 mouse strains reveal trans eQTL hotspots, splicing QTLs and novel non-coding genes" for further consideration at eLife. We were receptive to the basis of your appeal, and the revised article has been favorably evaluated by Mark McCarthy (Senior editor), the original Reviewing editor, and by the more critical of the previous sets of reviewers.

All of us are positive about your revision. However, there are a couple of issues that remain to be addressed before acceptance, as outlined below:

1) The caveat the authors have added to the text is acceptable but the authors should look at Gencode M4 and see if there are overlaps with their genes. This annotation has been available for 2.5 years and is the most current mouse annotation file. M2 dates back to 2013. The rationale that annotation changes too rapidly ("month-to-month") is not supported by the Gencode release dates and since the novelty of these transcripts is a major feature, it should be put in line with the latest (2.5y old) annotation. One could simply overlap novel locations with bed files of latest annotations at a minimum.

2) Another issue that remains is the trans-eQTL results as any SNP correlated with a batch effect can appear as a trans-eQTL hotspot. The trans-eqtl on chr15 relevant to ion transport might be reflecting circadian rhythm. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2819050/

The authors should explore this issue in more detail or properly caveat it too.

https://doi.org/10.7554/eLife.15614.023

Author response

[Editors’ note: the author responses to the first round of peer review follow.]

While all of the reviewers found your data to be of potential interest, they all raised substantive concerns regarding both study design and data presentation. The consensus was that addressing all of these issues would entail substantial work. As you may know, one of the goals of eLife is to avoid protracted cycles of "long-loop" revision, and this means that we try to avoid any recommendations for revision that are likely to take longer than a few weeks.

Reviewer #1:

1) Trans eQTL hotspots analysis is nicely done and the authors relate two well-supported and interesting examples of trans eQTLs. Two comments for this section. Firstly, the method used to identify regions of trans eQTLs was blind to LD; a simple distance (100 kb) based bin was used and the number of trans acting SNPs counted. The authors relate that a 'hotspot' of trans acting SNPs contained several SNPs in LD with each other. This approach is valid, but may skew results to identify regions with a larger number of trans acting SNPs because of the underlying LD structure. In this case LD helps to identify these regions. However, there may be regions with less LD but still strong trans acting SNPs; using the method these regions will be de-emphasized, not due to the strength of trans acting elements but rather due to LD architecture of the genome. The authors may wish to mention this. Second, poorly annotated regions of the genome are not devoid of functional elements and it may be a bit of an overstatement to say the eQTL SNPs in these regions are unlikely to be causative.

We are glad that reviewer found our trans eQTL analysis interesting. Regarding the LD comments, we agree that stronger and longer LD will increase the number of SNPs associated in trans, however it is not likely to increase the number of genes associated with each bin, since essentially linked SNPs will be capturing the same interactions. In our analysis (Figure 4A) we counted the number of different genes associated with any SNP in the bin, and not (as reviewer seems to suggest) the number of trans acting SNPs, thus we believe LD does not significantly impact our identification of hotspots. We are sorry for this confusion, and rewrote the relevant sentences to better clarify our approach (subsection “Genetic regulation of gene expression”), and made it clearer in the Discussion as well.

2) Which lincRNAs were included in the association testing between lincRNAs and HMDP phenotypes? Were both the 381 plus 129 novel tested? In particular it would be interesting to clarify if any of the 129 newly identified lincRNAs associate with HMDP phenotypes. This reviewer found this sentence a little confusing 'none of these lincRNA were previously reported in the hypothalmus'; does this mean they all belonged to the group of 129 novel lincRNAs? Were these 129 totally novel or simply not previously known to be expressed in the hypothalamus? Some clarification would be helpful and would emphasize the findings that are particular to this study.

We thank the reviewer for the comments, and have rewritten the section to clarify these points (subsection “Long non-coding RNAs in the mouse hypothalamus”).

Reviewer #1 (Additional data files and statistical comments):

Please include the top 500 transcripts that associate with a phenotype in supplemental materials listed along with transcript ID, phenotype, and p-value. These are referred to in the first paragraph of the subsection “Association of hypothalamic expression and phenotypes” and are a valuable work product of this study. As such readers may benefit from this information.

We have added the requested file to the supplementary material (Supplementary file 5).

Reviewer #2:

This manuscript focuses on a detailed analysis of hypothalamic transcriptomes from 99 mouse strains to among many things identify eQTL, spicing eQTL and novel non-coding genes and their relationship to metabolic and cardiovascular traits. Many of the aspects of the relevance or tissue-specific findings pertaining to hypothalamus were not well clarified leading to a concern with the overall study design. Many of the analyses suffer from potential interpretation or technical flaws limiting my enthusiasm.

A key feature of the manuscript is the identification of numerous novel isoforms and genes. It is an exceptional feature that the authors are able to confirm their data using peptide data. However, most of these are pseudogenes when manual characterized against NCBI and UCSC. Is this simply a deficiency in the GENCODE M2 annotation? Further, can the authors discount mismapped reads from highly homologous sequences to these pseudogenes. Key aspects of this pipeline are not well presented and I am very skeptical of the "novelty" of the set of genes they have found or even their relevance to the hypothalamus in specific let alone any hypothalamus specific function. Criteria for tissue-specificity should further be better described.

We thank the reviewer for this comment. In the manuscript we present several types of analysis suggesting that the genomic properties of the novel genes, as identified compared to GENCODE M2, are similar to the genomic properties of known non coding transcripts. Specifically, distributions of their potential reading frame length and their splicing pattern indicate that for the most part the novel transcripts would be expected to be non coding and therefore not expected in the peptide data. Thus we did not find it surprising that the very few peptides matching exclusively novel genes can be attributed to pseudogenes. In our opinion this does not imply that the majority of novel transcribed genes are pseudogenes rather that they are probably not translated. We have made this point more clearly in the revised manuscript (subsection “Hypothalamic gene expression and proteomic data reveals multiple new isoforms and novel genes”, third paragraph).

Regarding the reviewers question whether this presents a deficiency in M2 annotation, we would like to add that unlike genome sequence versions, genome annotations tend to develop quite rapidly. For some time GENCODE has released a new set of annotations for the mouse genome every few months. It is impractical to reanalyze the entire dataset every time a new annotation is released. We therefore focused our analysis on M2, which was the annotation we used for our genome indexing and read mapping.

The reviewer raised a question as to whether the expression of novel genes can be attributed to multi-mapped reads. We addressed that question by re-estimating abundances in 3 C57BL/6 samples, using uniquely mapped reads only. Indeed, some of the novel genes (and known genes) show low uniqueness. However, the peptide data supports expression of both uniquely mapped genes and those that arise from multimapping. This observation indicates that a high percentage of multimapping does not necessarily equate to an artifact. We therefore added uniqueness of mapping as an annotation of novel genes and expanded the relevant text to include this analysis (subsection “Hypothalamic gene expression and proteomic data reveals multiple new isoforms and novel genes”, last paragraph and Supplementary file 1).

We agree that tissue specificity of these transcripts cannot be defined based on our data alone, and intend to suggest it as one of the possible explanations. We thank the reviewer for raising this concern, and have rephrased that part of the manuscript to better reflect other potential explanations (Results).

We also thank the reviewer for pointing out that the pipeline is not described in sufficient detail. We have now added a more detailed description, supplementing the already described software versions and specific command lines in the Materials and methods.

The distal eQTL information provides no extra information. Indeed, the authors themselves describe it as most likely a local signal.

We agree with the reviewer that distal eQTLs are redundant to local. Distal eQTLs are only mentioned in our manuscript for the sake of completeness (how we selected the eQTLs for analysis). The distal loci were completely excluded from any QTL analysis. We now made this point clearer (Results).

The correlation of ASE and eQTL is a technical feature that has been reported in several other papers. It shouldn't be listed as a major feature of the manuscript and would be better as supplemental information.

The authors derive their own ASE approach using DEseq. This is not a conventional approach for this tool. Other published ASE-specific software exists. Can the authors explain why they did it this way?

How correlated are ASE sites within a gene? Does this support site aggregation for a gene-centric approach?

We agree with the reviewer that ASE is secondary to our results and have moved the ASE analysis to supplementary material. Concerning our choice of method for ASE analysis, we would like to point out that while we indeed created this approach to ASE, the method was not developed for this study and is not novel. In fact, we published this approach in BMC Genomics (Hasin-Brumshtein et al. 2014), with full details and extensive analysis of concordance of ASE within biologically relevant units (exons, transcripts, and genes). The paper also showed that aggregation of ASE signal over haplotype improves specificity of the results (as compared to single SNP). Therefore, since this approach was already developed in our laboratory, published in peer reviewed journal and readily accessible, we chose to use it for the ASE analysis in this paper as well.

Trans-eQTLs can be spurious due to the chance correlation of a genetic marker with technical factors such as batch or biological factors such as sex or BMI. The authors do not provide enough detail/experimentation as to make these effects believable as true trans-eQTLs compared to spurious correlations.

Furthermore, for RNA-seq data many 1-1 trans-eQTLs can be the product of mismapping and are clearly identified when looking at the distribution of reads across a gene model in a tool like IGV.

The comment on rs31703733 is very speculative. The authors should prove that this is the case otherwise, I am not sure there is anything that can be definitely said about trans-eQTLs in this manuscript.

We concur that many of the 1-1 trans interactions may represent spurious results, even when stringent thresholds are used. However, trans eQTL hotspots are much less likely to represent a technical artifact, or a spurious association since that possibility would require the coincidence of multiple artifacts in non-random manner. Thus we only discuss in depth the two trans eQTL hotspots, and do not consider any 1-1 trans eQTLs.

Mapping artifacts are not generally random and might result in false positive signals. Indeed, as the reviewer suggested, it is possible to inspect any given gene for mapping artifacts in IGV or other visualization tools. However, we find it impractical to go manually through the 900 genes associated with one of the 2 hotspots in this way. Thus to address this issue, we assessed whether mapping artifacts or multi mappers significantly contribute to expression levels of the genes in trans eQTL hotspots. To this end we re-estimated the expression of all transcripts in 3 samples, using only high quality mapping scores (mapping quality =255, corresponding to the highest mapping score and unique mapping). Expression levels estimated by this procedure did not change much (correlation values were >0.96 for all genes) showing that mapping artifacts are not likely to account for the variation in expression levels of the trans eQTL hotspots.

The reviewer raises an important concern about potential confounding factors such as sex, BMI and batch effects. First we would like to point out that all of the mice used in this study were males, of the same age and treated with exactly the same protocol. Thus we consider sex age and treatment not to be major confounders of trans-eQTL mapping. Batch effects can arise from two sources in our data, one is the sequencing process, and the other is mice housing conditions. To minimize the batch effects of sequencing we used a round robin design, where we distributed every sample over 3-4 lanes of sequencing, and biological replicates were largely distributed over different lanes, such that no individual strain data was the result of a single lane sequencing. While ideally we would like to pool all samples into one library and sequence that library as many times as need to achieve desired coverage, this was not technically possible because the number of available barcodes is smaller than the number of samples. The other source of batch effects may arise from mouse housing conditions, although great care was taken to match these as well as possible. Still, mice are housed in cages by strain, and work load in a large study like this is invariably distributed over a period of time, thus not all cages are treated on exactly the same days. These factors typically are considered minor in mouse eQTL studies compared to the effects of genetic variance on gene expression, and cannot be fully addressed in a practical manner for eQTL mapping.

Finally the reviewer points out that location of rs31703733 in an enhancer specific to neuronal cells is not strong evidence for its potentially causative role in changes in gene expression. We agree that this is a weak evidence, however we believe that combining several lines of evidence is an important aspect of using our data as resource. We therefore did not remove that point, but rewrote the Results and Discussion in a more reserved manner to convey the issues more clearly (subsection “Trans eQTL hotspots”, last paragraph).

Some of the statements in the manuscript could benefit from more formally described multiple testing correction or an FDR. For instance "35% of lincRNA.… significantly (p<1e-3) correlate to at least one phenotype in the HMDP". If I do some rudimentary correction assuming 150 traits, I am not sure this is a significant finding. This is an issue for the trait analysis of the novel transcripts too.

We thank the reviewer for pointing out the need for clarification. Association analysis was performed using linear mixed models (Fast-LMM, Lippert et al., 2011), which corrects for the population structure in our data. Statistical thresholds used in this work were determined as 5% FDR by permutations and modeling in previous eQTL and association studies conducted within HMDP (Parks et al. 2015, Bennet et al. 2010). We now added this point explicitly to the Results section to better explain our choice. Importantly, since many of the metabolic phenotypes are not independent, and are significantly correlated, the rudimentary approach suggested by the reviewer would be overly stringent.

How believable are the non A-to-G modification? Is 30% an expected number?

A-to-G modifications and C-to-T modifications (another canonical type of editing catalyzed by APOBEC), together make up ~78% of the detected RNA-DNA differences (RDD) when requiring a minimum of 5 total and 3 edited reads covering the editing position.

Since the paired-end RNA-seq reads are non-strand specific and we used known gene annotations to assign strand, it is likely that some reads originating from un-annotated genes were assigned to a known gene located on the opposite strand. In this case, A-to-G editing events would be identified as T-to-C mismatches, which make up ~8% of the RDD sites and represent the second largest type of nucleotide conversions. Similarly, G-to-A modifications (~6% of sites) could potentially reflect C-to-T editing due to reads assigned to the opposite strand. Together, the four types of modifications (A-to-G, C-to-T, T-to-C, G-to-A) constitute ~92% of all detected RDDs.

However, we cannot exclude the possibility that a few of the non-A-to-G/C-to-T mismatches are indeed SNPs that were not captured in the genotyping data. About 60% of the non-A-to-G and non-C-to-T RDDs were identified in only 1 mouse strain, while only 16% were identified in more than 5 strains. As a comparison, ~44% of A-to-G editing sites were identified in only 1 and ~26% in more than 5 mouse strains.

Importantly, since many of the non-canonical editing sites are identified in only one strain, the ratio of A-to-G editing sites per strain is much larger than the overall mean ratio. Per strain it ranges between 76-87% and is on average 81%.

Generally, the fraction of A-to-G editing sites in mouse is observed to be lower than in humans (mouse: 52-56% Lagarrigue et al., Genetics 2013; mouse: ~60% Gu et al., PLoS One 2012; mouse 96% vs. human 95-99% Porath et al., Nat Commun 2014; mouse 15% vs. human 25% Eisenberg et al., Trends Genet 2005), which may partly be explained by the lower density of Alu elements in the mouse genome (Neeman et al. RNA 2006).

Altogether, we think that the fractions of A-to-G editing sites among all RDDs in our study are as expected and in agreement with previous observations.

Reviewer #3:

In this manuscript, Hasin-Brumshtein and coauthors provide a systems-level analysis of hypothalamic gene expression, its regulation by genetic elements, and its association with metabolic phenotypes. The hypothalamus is composed of diverse and distributed neuronal subpopulations that control multiple aspects of metabolic homeostasis. Here, the authors test the hypothesis that differences in hypothalamic gene expression in a hybrid mouse diversity panel (HMDP) may influence the sensitivity of particular strains to phenotypes associated with diet-induced obesity. The authors provide two specific examples from their data of transcripts (1 annotated and 1 lincRNA) whose expression correlates with a specific metabolic endpoint (Figures 4E and 5D). To increase confidence in the analysis, the authors should validate the hypothalamic expression differences between strains for a handful of these transcripts by in situ hybridization.

Overall, the lack of functional validation of candidate genes with phenotypic outcomes severely limits this as a resource, and greatly diminishes the impact of this study. Nonetheless, this study does begin offering a detailed mapping of genomic regions underlying differential gene expression and RNA processing events in the hypothalamus, which with additional work could enhance our functional understanding of the hypothalamus. However, in its present form, it is not accessible to the research community working in the hypothalamus.

Hasin-Brumshtein et al. characterize the expression of both annotated and novel genes, transcript variants, and lincRNAs from the hypothalamus of mice representing 99 strains from the HMDP. The effort invested in obtaining and analyzing both genomic and proteomic data at this scale is impressive. However, profiling data presented would benefit from a more thorough description of mouse husbandry conditions prior to tissue collection. Were the mice raised on the same chow diet used by Parks et al. (2013) to establish the phenotypic responses of mice in the HMDP to high fat, high sucrose diet? In fact, the authors' description in the Introduction suggests that RNA-Seq was performed on "mice… fed a high fat high sugar diet." If so, it is unclear if the observed expression differences across the mouse strains drive their unique phenotypic responses to dietary challenge or if these differences are a response to such a challenge. Similarly, since RNA-Seq was performed on as few as 1 subject per strain, the authors should elaborate on steps taken to mitigate other environmental factors that might have influenced gene expression in the mice. Why not increase the power of this study by increasing the numbers on the most commonly used strains with the greatest divergence in metabolic outcomes?

The authors do acknowledge one major weakness which is the pan-hypothalamic approach, thus reducing their ability to identify genotype/phenotype associations with transcripts (Discussion, sixth paragraph). However, the statement at the end of that paragraph: "transcription… was shown to be largely shared among tissues and cell types" is confusing. If this is true, would not the small differences in expression between subpopulations of neurons be the most important in determining their functional specificity and in turn contribute to phenotypic differences? Such small differences are the least likely to be identified by analyzing the whole hypothalamus. Alternatively, the differences could arise from altered synaptic connectivity, which would also be difficult to resolve by expression profiling. This is an important point that is never discussed.

In addition, given that the hypothalamus is a critical region for many important sex-dependent metabolic and behavioral outcomes due to signaling from sexually dimorphic subsets of neurons, one wants to know how these data compare between male and females.

We thank the reviewer for thoughtful and thorough assessment of our manuscript. This reviewer raised several important issues which we would like to address:

1) We respectfully want clarify what seems to be a conceptual difference between our and reviewer’s view of this work. Particularly, the reviewer seems to assume that the main focus of our effort should be to identify novel transcripts the expression of which is related to the metabolic phenotypes characterized in the HMDP. However, in our view, the main goal of the manuscript is to provide one of a kind resource for researchers that focus on molecular studies of the hypothalamus, as well as system genetics of metabolism. Our studies provide comprehensive characterization of the hypothalamic transcriptome and examples of how this resource can facilitate testing of existing hypothesis, or generate new ones. The study is aimed at exploratory analysis of hypothalamic transcriptome in the mouse – and we hope that it can provide the basis for forming the type of hypothesis the reviewer wishes us to test. We sincerely regret that it was not made clear enough in our manuscript, and have revised the introduction and discussion to reflect this point.

A major critique from reviewer 3 focuses on a lack of functional validation of our presented examples. We completely agree with the reviewer that an association of gene expression with a particular phenotype does not necessarily imply that those changes play any causative role in the development of the phenotype. Also as the reviewer points out, these expression changes could be, or are even likely to be, reactive. This is a general limitation of the many studies that associate gene expression with phenotypes, and is not unique to this work. Nevertheless, previous studies in humans and mice show that identifying expression changes which correlate with a phenotype can often lead to molecular insights and novel hypothesis that can then be tested in follow up experiments. The specific cases of gene-phenotype associations presented in our manuscript were intended as examples of the utility of our data, not as definitive evidence for novel genes regulating these two phenotypes. As such, we believe that the experimental confirmation of these associations is beyond the scope of this study and should be addressed separately. Importantly, since experimental validation is often time consuming and expensive, we suggest the use of multiple lines of evidence when choosing gene targets for experimental confirmation, rather than relying on the association of gene expression alone. To augment the usability of our results, we have added a section discussing the regulation and association of hypothalamic genes already experimentally implicated in metabolic phenotypes by previous studies in mice, or that are markers of particular processes. We identify the regulatory pathways of some of these genes, and show associations to traits that reflect known biology as well as potentially novel aspects.

2) The reviewer raises an important question of tissue heterogeneity in the hypothalamus. The hypothalamus contains many distinct nuclei, that are central to regulation of different phenotypes. Indeed, as stated in our Discussion, using the entire hypothalamus for gene expression analysis is one of the major limitations of our study and is dictated by the physical impracticality of dissecting specific nuclei or neurons in hundreds of mice. Yet, we recognize that this is an important limitation, and to better assess how heterogeneity is reflected in our data, we added a section discussing the expression of hypothalamic cell markers, and their relative abundance across strains. We show that well established markers of hypothalamic neurons (such as Agrp, Sf1, or Pomc genes), as well as glial markers are highly expressed in our data and the variation and genetic regulation of these genes is captured. We further show that despite using whole hypothalamus, we are able to recapitulate the expected relations between hypothalamic transcripts and metabolic traits.

3) Fourth, the reviewer raises a question about whether using only one mouse per strain in 2 of the 99 strains (AxB-12/PgnJ and I/LnJ), is likely to affect the accuracy of our results. We believe that having 2 strains with only one animal out of 99 is highly unlikely to bias our results. However, to address this concern, we used a subset of transcripts to remap eQTLs using only one expression value per strain. This downsampling did not significantly alter our eQTL results. The reviewer also suggests adding additional mice from specific phenotypically diverse inbred strains to enhance the power of the study. We appreciate this suggestion, however our analysis shows that inter-strain variability of gene expression is considerably larger than intra-strain variability. Therefore, increasing the number of individual mice per strain is not likely to increase the power of genetic mapping, as it would not increase the genetic diversity of our sample.

4) The reviewer asked for more details on mouse husbandry conditions, and the relation between our sample and mice used in the Parks et al., 2013 paper. We are grateful for this comment. The mice used in our study are a subset of mice used in Parks et al. study, although in our work we present analysis of a different tissue (hypothalamus). Therefore, the husbandry conditions, the diet and the obtained phenotypes are exactly as were described in Parks et al. 2013 study. To avoid confusion, we have repeated this point in the Materials and methods section, and also made it more clear in the Introduction.

[Editors’ note: the author responses to the re-review follow.]

Thank you for resubmitting your work entitled "Hypothalamic transcriptomes of 99 mouse strains reveal trans eQTL hotspots, splicing QTLs and novel non-coding genes" for further consideration at eLife. We were receptive to the basis of your appeal, and the revised article has been favorably evaluated by Mark McCarthy (Senior editor), the original Reviewing editor, and by the more critical of the previous sets of reviewers.

All of us are positive about your revision. However, there are a couple of issues that remain to be addressed before acceptance, as outlined below:

1) The caveat the authors have added to the text is acceptable but the authors should look at Gencode M4 and see if there are overlaps with their genes. This annotation has been available for 2.5 years and is the most current mouse annotation file. M2 dates back to 2013. The rationale that annotation changes too rapidly ("month-to-month") is not supported by the Gencode release dates and since the novelty of these transcripts is a major feature, it should be put in line with the latest (2.5y old) annotation. One could simply overlap novel locations with bed files of latest annotations at a minimum.

We agree with the reviewer that M2 is quite outdated, and as suggested compared the novelty of our data to the latest released annotation (GENECODE M10, released January 2016), and find that 50% of our novel genes are novel. Please find the relevant additions in the second paragraph of the subsection “Hypothalamic gene expression and proteomic data reveals multiple new isoforms and novel genes”.

2) Another issue that remains is the trans-eQTL results as any SNP correlated with a batch effect can appear as a trans-eQTL hotspot. The trans-eqtl on chr15 relevant to ion transport might be reflecting circadian rhythm. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2819050/

The authors should explore this issue in more detail or properly caveat it too.

We thank the reviewers for these points and agree that we are not able to exclude completely the possibility of unknown batch effects, especially if those would arise from random environmental rather than technical aspects and thus not recorded. However, in our opinion, such factors are unlikely to cause batch effects in a tissue specific manner – thus we revisited the liver and adipose eQTL mapping data, from Parks BW 2013, that used the same mice. We do not observe these trans eQTL hotspots in either adipose or liver data. Nonetheless, we now discuss the possibility of this caveat at length in the fourth paragraph of the Discussion section. We also completely agree that the trans effects acting on ion transport, do not necessarily directly cause the associated changes in metabolic profile. These associations are complex, and can be mediated by variety of physiological processes, and indeed circadian rhythm plays a major role in metabolic homeostasis. We thank the reviewer for this comment, and clarified this point in the Discussion (third paragraph).

https://doi.org/10.7554/eLife.15614.024

Article and author information

Author details

  1. Yehudit Hasin-Brumshtein

    1. Department of Human Genetics, University of California, Los Angeles, Los Angeles, United States
    2. David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, United States
    3. Department of Microbiology, University of California, Los Angeles, Los Angeles, United states
    4. Department of Immunology and Molecular Genetics, University of California, Los Angeles, Los Angeles, United States
    Contribution
    YH-B, Conception and design, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    For correspondence
    1. yehudit.hasin@gmail.com
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon 0000-0001-7528-603X
  2. Arshad H Khan

    1. Department of Molecular and Medical Pharmacology, University of California, Los Angeles, Los Angeles, United States
    Contribution
    AHK, Conception and design, Acquisition of data, Analysis and interpretation of data
    Competing interests
    The authors declare that no competing interests exist.
  3. Farhad Hormozdiari

    1. Department of Computer Science, University of California, Los Angeles, Los Angeles, United States
    Contribution
    FH, Analysis and interpretation of data
    Competing interests
    The authors declare that no competing interests exist.
  4. Calvin Pan

    1. Department of Human Genetics, University of California, Los Angeles, Los Angeles, United States
    2. David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, United States
    Contribution
    CP, Analysis and interpretation of data
    Competing interests
    The authors declare that no competing interests exist.
  5. Brian W Parks

    1. Department of Human Genetics, University of California, Los Angeles, Los Angeles, United States
    2. David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, United States
    3. Department of Microbiology, University of California, Los Angeles, Los Angeles, United states
    4. Department of Immunology and Molecular Genetics, University of California, Los Angeles, Los Angeles, United States
    Contribution
    BWP, Acquisition of data
    Competing interests
    The authors declare that no competing interests exist.
  6. Vladislav A Petyuk

    1. Biological Sciences Division, Pacific Northwest National Laboratory, Richland, United States
    Contribution
    VAP, Acquisition of data, Analysis and interpretation of data
    Competing interests
    The authors declare that no competing interests exist.
  7. Paul D Piehowski

    1. Biological Sciences Division, Pacific Northwest National Laboratory, Richland, United States
    Contribution
    PDP, Acquisition of data
    Competing interests
    The authors declare that no competing interests exist.
  8. Anneke Brümmer

    1. Department of Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, United States
    Contribution
    AB, Analysis and interpretation of data
    Competing interests
    The authors declare that no competing interests exist.
  9. Matteo Pellegrini

    1. Department of Molecular, Cell, and Developmental Biology, University of California, Los Angeles, Los Angeles, United States
    Contribution
    MP, Conception and design
    Competing interests
    The authors declare that no competing interests exist.
  10. Xinshu Xiao

    1. Department of Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, United States
    Contribution
    XX, Analysis and interpretation of data
    Competing interests
    The authors declare that no competing interests exist.
  11. Eleazar Eskin

    1. Department of Computer Science, University of California, Los Angeles, Los Angeles, United States
    Contribution
    EE, Conception and design, Analysis and interpretation of data
    Competing interests
    The authors declare that no competing interests exist.
  12. Richard D Smith

    1. Biological Sciences Division, Pacific Northwest National Laboratory, Richland, United States
    Contribution
    RDS, Conception and design, Acquisition of data
    Competing interests
    The authors declare that no competing interests exist.
  13. Aldons J Lusis

    1. Department of Human Genetics, University of California, Los Angeles, Los Angeles, United States
    2. David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, United States
    3. Department of Microbiology, University of California, Los Angeles, Los Angeles, United states
    4. Department of Immunology and Molecular Genetics, University of California, Los Angeles, Los Angeles, United States
    Contribution
    AJL, Conception and design, Analysis and interpretation of data, Drafting or revising the article, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
  14. Desmond J Smith

    1. Department of Molecular and Medical Pharmacology, University of California, Los Angeles, Los Angeles, United States
    Contribution
    DJS, Conception and design, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.

Funding

National Institutes of Health (R01GM098273)

  • Yehudit Hasin-Brumshtein
  • Arshad H Khan
  • Calvin Pan
  • Vladislav A Petyuk
  • Paul D Piehowski
  • Richard D Smith
  • Aldons J Lusis
  • Desmond J Smith

National Institute of General Medical Sciences (GM103493)

  • Vladislav A Petyuk
  • Paul D Piehowski
  • Richard D Smith

W.R. Wiley Environmental Molecular Science Laboratory (DE-AC05-76RLO-1830)

  • Vladislav A Petyuk
  • Paul D Piehowski
  • Richard D Smith

National Institutes of Health (R01HG006264)

  • Anneke Brümmer
  • Xinshu Xiao

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Ethics

Animal experimentation: The animal protocol for the study was approved by the Institutional Animal Care and Use Committee (IACUC) at University of California, Los Angeles.

Reviewing Editor

  1. Joel K Elmquist, Reviewing Editor, University of Texas Southwestern Medical Center, United States

Publication history

  1. Received: February 27, 2016
  2. Accepted: September 12, 2016
  3. Accepted Manuscript published: September 13, 2016 (version 1)
  4. Version of Record published: October 6, 2016 (version 2)

Copyright

© 2016, Hasin-Brumshtein et al

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,051
    Page views
  • 301
    Downloads
  • 1
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Comments

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Genomics and Evolutionary Biology
    2. Microbiology and Infectious Disease
    Karthik Hullahalli et al.
    Research Article