Selecting the most appropriate time points to profile in high-throughput studies

  1. Michael Kleyman
  2. Emre Sefer
  3. Teodora Nicola
  4. Celia Espinoza
  5. Divya Chhabra
  6. James S Hagood
  7. Naftali Kaminski
  8. Namasivayam Ambalavanan
  9. Ziv Bar-Joseph  Is a corresponding author
  1. School of Computer Science, Carnegie Mellon University, United States
  2. University of Alabama at Birmingham, United States
  3. University of California, United States
  4. CARady Children’s Hospital San Diego, United States
  5. School of Medicine, Yale University, United States
6 figures, 5 tables and 2 additional files


Figure 1 with 3 supplements
The TPS method.

Clockwise from top left. Given a dense sampling of a selected subset of genes (a) we select an initial set of points (b) using the initialization method described in the text. Next, we fit a spline to the selected points for each gene (c) and evaluate the error on all other points. We perform a greedy search process (d) which iteratively removes and adds points to improve the test data fit resulting in the final set of points (e). The reconstructed curves are fitted to all genes (f) and an overall error is computed and compared to the theoretical limit (noise) to determine the ability of the selected number of points to fit the data.
Figure 1—figure supplement 1
Comparison of performance between TPSand a previous method Singh et al.

(Singh et al., 2005) which used an active learning method based on dynamic programming.
Figure 1—figure supplement 2
Comparison of initialization methods to each other by their final error.

The points labeled metricA, metricB, and metricC all use the dynamic initialization approaches, while the max distance points use static initialization.
Figure 1—figure supplement 3
Comparison of initialization method by their final error compared to selecting random points.
Figure 2 with 4 supplements
Performance of TPS using different sizes for the selected points.

Error comparisons of TPS variants to uniform selection of points and noise. Absolute difference - Greedy iterative addition with absolute difference initialization (Algorithm 1, Appendix Methods). Simulated annealing - Iterating using simulated annealing with absolute difference initialization. Weighted error - Selection based on cluster rather than individual gene errors. See Appendix Methods for details.
Figure 2—figure supplement 1
Average noise in each mRNA expression time point.
Figure 2—figure supplement 2
Comparison of error for the TPS algorithm on full data, 75% random data, and random points chosen on the full data.

The 75% random data was created by replacing 75% of the gene time series with random value time series selected from a Gaussian distribution with mean 0 and standard deviation equal to the noise of the original data.
Figure 2—figure supplement 3
Comparison of TPS and piecewise linear fitting over genes (a) Pdgfra, (b)Eln, (c) Lrat.
Figure 2—figure supplement 4
Comparison of the reconstruction error when using the points selected by TPS and when using the same number of random points from the overall set of sampled points.
Figure 3 with 2 supplements
Reconstructed expression profiles for selected genes.

(a). Pdgfra. , (b). Eln. , (c). Inmt.
Figure 3—figure supplement 1
Expression profiles over several genes (a) Esr2, (b) Nme3, (c) Polr2a.
Figure 3—figure supplement 2
Reconstructed expression proles by eight points over genes (a) Pdgfra, (b) Eln, (c) Inmt.
Figure 4 with 3 supplements
Performance of TPS by on the miRNA data.

(a) TPS reconstruction error when using the mRNA data to select time points for the miRNA experiments. Results of random and uniform selection as well as repeat noise error are also presented for comparison. TPS variants shown are the same two presented in Figure 2. (b) Error of splines with points selected by training TPS on the actual miRNA data itself, using the maximum absolute difference initialization.
Figure 4—figure supplement 1
Observed and reconstructed expression proles for miRNAs (a) mmu-miR-100, (b) mmu-miR-136,c) mmu-miR-152, (d) mmu-miR-219.
Figure 4—figure supplement 2
8 stable miRNA clusters.
Figure 4—figure supplement 3
TPS performance for the proteomics data using different number of time points.

(a) Comparison of the reconstruction error when using the points selected by TPS, uniform selection of points, and when using the same number of random points from the overall set of sampled points. (b) Error comparisons of TPSto noise, and various search and initialization options discussed in Methods.
Figure 5 with 2 supplements
Comparison of gene expression and methylation data for selected genes.

(a). Akt1. , (b). Cdh11. , (c). Tnc.
Figure 5—figure supplement 1
Reconstructed methylation proles over several loci (chromosome, position) with corresponding genes.
Figure 5—figure supplement 2
Bootstrap analysis of Pearson correlation r between expression and methylation datasets over eight time points for each gene.

The red circles are the Pearson correlation over all eight points and the blue triangles are the Pearson correlation for all subsets of 7 points.
Figure 6 with 1 supplement
Comparison of TPS with sampling rates used in previous studies.

Dark green curves are the reconstructed profiles based on the points profiled by prior studies. Light green and red curves are based on the points selected by TPS . As can be seen, even when comparing results from using the same number of points, TPS can identify key events for some of the genes that are missed when using the phenotype based sampling rates. Subfigures a,b, and c are a piecewise linear fit over points 0.5, 7.0, 14.0, 28.0 . Subfigures d,e, and f are a piecewise linear fit over points 0.5, 2.0, 14.0, 28.0. Subfigures g,h, and i are a piecewise linear fit over points 0.5, 4.0, 7.0, 14.0, 28.0.
Figure 6—figure supplement 1
Comparison of gene expression and protein abundance for selected gene protein pairs.

(a) Eln/P54320, (b) F13a1/Q8BH61, (c) Chil1/Q61362.


Table 1

Summary of prior high throughput lung development studies.
ReferenceData typesSelected time points (Days)

[Bonner et al., 2003]

mRNA expression

E9, E4, E17, 0, 7, 14, 28

[Melén et al., 2011]

mRNA expression

E16, E18, 0, 7, 14, 28

[Bhaskaran et al., 2009]

microRNA expression

E16, E19, E21, 0, 6, 14, 60

[Dong et al., 2011]

 mRNA and microRNA expression

E12, E14, E16, 0, 2, 10

[Cox et al., 2007]

Protein expression levels

E12, E14, E18, 2, 14, 56

[Schulz et al., 2013]

 mRNA and miRNA expression

0, 4, 7, 14, 42

[Cormack et al., 2010]

 mRNA expression

0, 7, 14, adult

[Mager et al., 2007]

 mRNA expression

 E15, E17, E19, E21, 1, 14, 84

[Mariani et al., 2002]

 mRNA expression

E18, 1, 4, 7, 10, 14, 21, adult

Appendix 2—table 1

List of genes used for the Nanostring analysis and the rational for their inclusion.
Ensembl gene IDAccession numberGene nameRationale
ENSMUSG00000024130NM_001039581.2Abca3Alveolar Type II cell marker
ENSMUSG00000031378NM_007435.1Abcd1important in other processes (IPF, COPD etc)
ENSMUSG00000029802NM_011920.3Abcg2Mesenchymal cell marker
ENSMUSG00000035783NM_007392.3Acta2Fibroblast cell marker
ENSMUSG00000029580NM_007393.1ActbCommon house-keeping gene
ENSMUSG00000036040NM_029981.1Adamtsl2Altered DNA methylation during septation
ENSMUSG00000015452NM_007425.2AgerAlveolar Type I cell marker
ENSMUSG00000001729NM_001165894.1Akt1Altered DNA methylation during septation
ENSMUSG00000053279NM_013467.3Aldh1a1Important for septation
ENSMUSG00000013584NM_009022.3Aldh1a2Potentially important for septation
ENSMUSG00000022244NM_008537.4Amacrimportant in other processes (IPF , COPD etc)
ENSMUSG00000044217NM_009701.4Aqp5Alveolar Type I cell marker
ENSMUSG00000026576NM_009721.5Atp1b1Lung fluid clearance
ENSMUSG00000060802NM_009735.3B2mCommon house-keeping gene
ENSMUSG00000102037NM_009742.3Bcl2a1aApoptosis regulator
ENSMUSG00000056216NM_009884.3CebpgImportant for lung development
ENSMUSG00000029084NM_007646.4Cd38Airway smooth muscle cell functional responses
ENSMUSG00000018774NM_009853.1Cd68Monocyte cell marker
ENSMUSG00000031673NM_009866.4Cdh1Epithelial cell marker
ENSMUSG00000064246NM_007695.2Chil1Monocyte cell marker
ENSMUSG00000040809NM_009892.1Chil3Increased during septation
ENSMUSG00000022512NM_016674.3Cldn1Tight junction protein
ENSMUSG00000070473NM_009902.4Cldn3Tight junction protein (mostly epithelial)
ENSMUSG00000041378NM_013805.4Cldn5Tight junction protein
ENSMUSG00000018569NM_016887.6Cldn7Tight junction protein (mostly epithelial)
ENSMUSG00000001506NM_007742.3Col1a1Fibroblast cell marker
ENSMUSG00000063063NM_009819.2Ctnna2Altered DNA methylation during septation
ENSMUSG00000031360NM_001168571.1Ctps2important in other processes (IPF , COPD etc)
ENSMUSG00000040856NM_010052.4Dlk1Decreased during septation
ENSMUSG00000020661NM_007872.4Dnmt3aAltered DNA methylation during septation
ENSMUSG00000046179NM_001013368.5E2f8Altered DNA methylation during septation
ENSMUSG00000000303NM_009864.2Cdh1Epithelial cell marker
ENSMUSG00000020122NM_207655.2EgfrImportant for lung development
ENSMUSG00000029675NM_007925.3ElnAltered DNA methylation during septation
ENSMUSG00000045394NM_008532.2EpcamEpithelial cell marker
ENSMUSG00000052504NM_010140.3Epha3Involved in lung development
ENSMUSG00000028289NM_001122889.1Epha7Involved in lung cancer, potential role in development
ENSMUSG00000021055NM_010157.3Esr2Important regulator of multiple processes
ENSMUSG00000061731NM_010162.2Ext1Altered DNA methylation during septation
ENSMUSG00000039109NM_001166391.1F13a1Involved in lung injury , cancer
ENSMUSG00000057967NM_008005.1Fgf18Important for septation
ENSMUSG00000030849NM_010207.2Fgfr2Important regulator of multiple processes
ENSMUSG00000078302NM_008242.2Foxd1Pericyte cell marker
ENSMUSG00000042812NM_010426.1Foxf1Involved in lung development
ENSMUSG00000038402NM_010225.1Foxf2Altered DNA methylation during fibrosis
ENSMUSG00000001020NM_011311.1S100a4Fibroblast cell marker
ENSMUSG00000057666NM_001001303.1GapdhCommon house-keeping gene
ENSMUSG00000005836NM_010258.3Gata6Important regulator of multiple processes
ENSMUSG00000029992NM_013528.3Gfpt1important in other processes (IPF, COPD etc)
ENSMUSG00000041624NM_001033322.2Gucy1a2Important for septation
ENSMUSG00000025534NM_010368.1GusbCommon house-keeping gene
ENSMUSG00000021109NM_010431.2Hif1aHypoxia signaling
ENSMUSG00000058773NM_020034.1Hist1h1bDecreased during septation
ENSMUSG00000061615NM_175660.3Hist1h2abDecreased during septation
ENSMUSG00000032126NM_013551.2HmbsCommon house-keeping gene
ENSMUSG00000029919NM_019455.4Hpgdsimportant in other processes (IPF, COPD etc)
ENSMUSG00000025630NM_013556.2HprtCommon house-keeping gene
ENSMUSG00000020053NM_001111274.1Igf1Regulating miRNA altered during septation
ENSMUSG00000020427NM_008343.2Igfbp3Altered DNA methylation during septation, fibrosis
ENSMUSG00000003477NM_009349.3InmtIncreased during septation
ENSMUSG00000026768NM_001001309.2Itga8Involved in lung development
ENSMUSG00000040029NM_001081113.1Ipo8important in other processes (IPF, COPD etc)
ENSMUSG00000030786NM_001082960.1ItgamMonocyte cell marker
ENSMUSG00000030789NM_021334.2ItgaxMonocyte cell marker
ENSMUSG00000090122NM_021487.1Kcne1limportant in other processes (IPF, COPD etc)
ENSMUSG00000063142.10XM_006518608.1Kcnma1Altered DNA methylation during septation
ENSMUSG00000079852NM_010649.3Klra4Increased during septation
ENSMUSG00000023043NM_010664.2Krt18Epithelial cell marker
ENSMUSG00000061527NM_027011.2Krt5Basal cell marker
ENSMUSG00000029570NM_008494.3LfngImportant for septation
ENSMUSG00000024529NM_010728.2LoxAltered DNA methylation during fibrosis
ENSMUSG00000028003NM_023624.4LratIncreased during septation
ENSMUSG00000027070NM_001081088.1Lrp2Altered DNA methylation during septation
ENSMUSG00000061068NM_010779.2Mcpt4Decreased during septation
ENSMUSG00000026110NM_173870.2Mgat4aInvolved in acute lung injury
ENSMUSG00000043613NM_010809.1Mmp3Increased during septation
ENSMUSG00000018623NM_010810.4Mmp7Important in lung fibrosis
ENSMUSG00000066108XM_006508653.1Muc5bImportant in lung fibrosis
ENSMUSG00000037974NM_010844.1Muc5acEpithelial cell marker
ENSMUSG00000024304NM_007664.4Cdh2Tight Junction/Adhesion
ENSMUSG00000054008NM_008306.4Ndst1Involved in pathologic airway remodeling
ENSMUSG00000031902NM_010901.2Nfatc3Important for lung development
ENSMUSG00000073435NM_019730.2Nme3Apoptosis-related gene
ENSMUSG00000026575NM_138314.3Nme7Important for stem cell renewal
ENSMUSG00000014776NM_030152.4Nol3Regulating miRNA altered during septation
ENSMUSG00000051048NM_177161.4P4ha3Important in lung fibrosis
ENSMUSG00000068039NM_013686.3Tcp1Basal cell marker
ENSMUSG00000029998NM_025823.4Pcyox1important in other processes (IPF , COPD etc)
ENSMUSG00000029231NM_011058.2PdgfraImportant for septation
ENSMUSG00000024620NM_008809.1PdgfrbPericyte cell marker
ENSMUSG00000028583NM_010329.2PdpnAlveolar Type I cell marker
ENSMUSG00000062070NM_008828.2Pgk1important in other processes (IPF , COPD etc)
ENSMUSG00000053398NM_016966.3Phgdhimportant in other processes (IPF, COPD etc)
ENSMUSG00000005198NM_009089.2Polr2aimportant in other processes (IPF, COPD etc)
ENSMUSG00000071866NM_008907.1PpiaCommon house-keeping gene
ENSMUSG00000024997NM_007452.2Prdx3Mitochondrial oxidative stress regulator
ENSMUSG00000026134NM_008922.2Prim2Expressed in placenta and crucial for mammalian growth.
ENSMUSG00000033491NM_178738.3Prss35Decreased during septation
ENSMUSG00000032487NM_011198.3Ptgs2Regulating miRNA altered during septation
ENSMUSG00000056458NM_011973.2MokAlveolar Type I cell marker
ENSMUSG00000037992NM_001177302.1RaraImportant for septation
ENSMUSG00000022883NM_019413.2Robo1Altered DNA methylation during septation
ENSMUSG00000066361NM_008458.2Serpina3cIncreased during septation
ENSMUSG00000022097NM_011359.1SftpcAlveolar Type II cell marker
ENSMUSG00000021795NM_009160.2SftpdAlveolar Type II cell marker
ENSMUSG00000050010NM_001033415.3Shisa3Altered DNA methylation during septation
ENSMUSG00000032402NM_016769.3Smad3Important for septation
ENSMUSG00000042821NM_011427.2Snai1Important for lung development and injury
ENSMUSG00000000567NM_011448.4Sox9Altered DNA methylation during septation
ENSMUSG00000027646NM_001025395.2SrcAltered DNA methylation during septation
ENSMUSG00000014767NM_013684.3TbpCommon house-keeping gene , involved in multiple processes
ENSMUSG00000000094NM_172798.1Tbx4Altered DNA methylation during septation
ENSMUSG00000032228NM_011544.3Tcf12Involved in multiple developmental processes
ENSMUSG00000022797NM_011638.3TfrcCommon house-keeping gene
ENSMUSG00000002603NM_011577.1Tgfb1Important for septation
ENSMUSG00000045691NM_153083.5Thtpaimportant in other processes (IPF, COPD etc)
ENSMUSG00000032011NM_009382.3Thy1Fibroblast cell marker
ENSMUSG00000028364NM_011607.1TncAltered DNA methylation during septation
ENSMUSG00000044986NM_009437.4Tstimportant in other processes (IPF, COPD etc)
ENSMUSG00000026803NM_009442.2Ttf1Important for lung development
ENSMUSG00000008348NM_019639.4UbcCommon house-keeping gene
ENSMUSG00000023951NM_001025250.3VegfaAngiogenesis; Altered DNA methylation during septation
ENSMUSG00000026728NM_011701.4VimMesenchymal cell marker
ENSMUSG00000020218NM_011915.1Wif1Altered DNA methylation during septation
ENSMUSG00000022285NM_011740.2YwhazCommon house-keeping gene
Appendix 2—table 2

Summary of methylation dataset
GeneNumber of lociGeneNumber of loci
Appendix 2—table 3

Target regions for each gene for methylation analysis
GeneEnsembl gene IDEnsembl transcript IDAssay IDTarget locationFwd TmRev Tm% GCCoordinates (GRCm38/mm10)
Akt1ENSMUSG00000001729ENSMUST00000001780ADS33333’ UTR6865.531.5chr12:112654548–112654709
Akt1ENSMUSG00000001729ENSMUST00000001780ADS3332Intron 9/Exon 1068.369.838.3chr12:112657120–112657273
Cdh11ENSMUSG00000031673ENSMUST00000075190ADS3308Intron 366.868.336.9chr8:102677609–102677766
Cdh11ENSMUSG00000031673ENSMUST00000075190ADS3318Intron 164.169.737chr8:102784569–102784722
Dnmt3aENSMUSG00000020661ENSMUST00000020991ADS632Intron 16464.732.2chr12:3834382–3834592
Dnmt3aENSMUSG00000020661ENSMUST00000020991ADS3328Exon 6/Intron 664.76431.8chr12:3901545–3901764
Dnmt3aENSMUSG00000020661ENSMUST00000020991ADS3329Intron 666.866.125.4chr12:3907514–3907765
ElnENSMUSG00000029675ENSMUST00000015138ADS3319Intron 1667.167.947.8chr5:134721191–134721447
ElnENSMUSG00000029675ENSMUST00000015138ADS3309Intron 7/Exon 8/Intron 864.167.437.8chr5:134729221–134729526
Foxf2ENSMUSG00000038402ENSMUST00000042054ADS4505Promoter63.16542.5chr13: 31625470–31625556
Igfbp3ENSMUSG00000020427ENSMUST00000020702ADS3301Exon 4/Intron 470.57033chr11:7208306–7208481
Igfbp3ENSMUSG00000020427ENSMUST00000020702ADS5133Intron 168.368.526.1chr11:7212803–7213043
LoxENSMUSG00000024529ENSMUST00000171470ADS4512Exon 26970.931.3chr18: 52529184–52529315
LoxENSMUSG00000024529ENSMUST00000171470ADS4513Exon 465.764.828.5chr18:52526887–52527023
Sox9ENSMUSG00000000567ENSMUST00000000579ADS3311Intron 169.768.534.7chr11:112783358–112783605
Sox9ENSMUSG00000000567ENSMUST00000000579ADS3310Exon 366.463.126.2chr11:112784760–112784885
SrcENSMUSG00000027646ENSMUST00000109533ADS4514Intron 164.865.935.9chr2:157423925–157424027
SrcENSMUSG00000027646ENSMUST00000109533ADS4515Intron 466.568.837.6chr2:157457351–157457520
SrcENSMUSG00000027646ENSMUST00000109533ADS4516Exon 1465.565.633.7chr2:157469741–157469912
TncENSMUSG00000028364ENSMUST00000107377ADS3324Intron 1463.362.223chr4:63982645–63982818
TncENSMUSG00000028364ENSMUST00000107377ADS3325Intron 1462.561.620.2chr4:63982799–63982986
TncENSMUSG00000028364ENSMUST00000107377ADS3323Exon 36567.535.2chr4:64017478–64017721
VegfaENSMUSG00000023951ENSMUST00000071648ADS3335Intron 2/Exon 364.663.829.5chr17:46025336–46025620
Wif1ENSMUSG00000020218ENSMUST00000020439ADS3303Intron 4/Exon 5/Intron 560.960.131.8chr10:121083800–121083997
Wif1ENSMUSG00000020218ENSMUST00000020439ADS3304Exon 10/3-UTR66.667.524.8chr10:121099752–121099973
Zfp536ENSMUSG00000043456ENSMUST00000056338ADS4509Exon 468.469.935.4chr7:37567973–37568130
Appendix 2—table 4

Mean and standard deviation of mean squared error over all 126 genes by TPS selecting 5 points and piecewise linear fits over 3 sets of points identified heuristically in the literature.
MethodMeanStd dev
TPS (0.5, 6, 9.5, 19 and 28)0.403063359620.2206665163
Piecewise linear over 0.5, 7, 14, 280.5940727194940.399642079492
Piecewise linear over 0.5, 2, 14, 280.7109670613490.721681860787
Piecewise linear over 0.5, 4, 7, 14, 280.5609902305010.364739525724

Additional files

Supplementary file 1

Raw mRNA expression values for the 126 genes studied using nanostring
Supplementary file 2

Raw miRNA expression values from the nanostring analysis.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Michael Kleyman
  2. Emre Sefer
  3. Teodora Nicola
  4. Celia Espinoza
  5. Divya Chhabra
  6. James S Hagood
  7. Naftali Kaminski
  8. Namasivayam Ambalavanan
  9. Ziv Bar-Joseph
Selecting the most appropriate time points to profile in high-throughput studies
eLife 6:e18541.