Building accurate sequence-to-affinity models from high-throughput in vitro protein-DNA binding data using FeatureREDUCE

Abstract
Article and author information
Metrics

Abstract

Transcription factors are crucial regulators of gene expression. Accurate quantitative definition of their intrinsic DNA binding preferences is critical to understanding their biological function. High-throughput in vitro technology has recently been used to deeply probe the DNA binding specificity of hundreds of eukaryotic transcription factors, yet algorithms for analyzing such data have not yet fully matured. Here we present a general framework (FeatureREDUCE) for building sequence-to-affinity models based on a biophysically interpretable and extensible model of protein-DNA interaction that can account for dependencies between nucleotides within the binding interface or multiple modes of binding. When training on protein binding microarray (PBM) data, we use robust regression and modeling of technology-specific biases to infer specificity models of unprecedented accuracy and precision. We provide quantitative validation of our results by comparing to gold-standard data when available.

Article and author information

Author details

Todd R Riley

Department of Biological Sciences, Columbia University, New York, United States

Competing interests
The authors declare that no competing interests exist.
Allan Lazarovici

Department of Biological Sciences, Columbia University, New York, United States

Competing interests
The authors declare that no competing interests exist.
Richard S Mann

Department of Biochemistry and Molecular Biophysics, Columbia University, New York, United States

Competing interests
The authors declare that no competing interests exist.
Harmen J Bussemaker

Department of Biological Sciences, Columbia University, New York, United States

For correspondence
hjb2004@columbia.edu

Competing interests
The authors declare that no competing interests exist.

Reviewing Editor

Nir Friedman, The Hebrew University of Jerusalem, Israel

Version history

Received: January 8, 2015
Accepted: December 20, 2015
Accepted Manuscript published: December 23, 2015 (version 1)
Version of Record published: February 9, 2016 (version 2)

Copyright

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.

Metrics

1,695

views
376

downloads
33

citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Article PDF

Open citations (links to open the citations from this article in various online reference manager services)

Mendeley

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Todd R Riley
Allan Lazarovici
Richard S Mann
Harmen J Bussemaker

(2015)

Building accurate sequence-to-affinity models from high-throughput in vitro protein-DNA binding data using FeatureREDUCE

eLife 4:e06397.

https://doi.org/10.7554/eLife.06397

Categories and tags

1. Computational and Systems Biology
2. Developmental Biology
The MODY-associated KCNK16 L114P mutation increases islet glucagon secretion and limits insulin secretion resulting in transient neonatal diabetes and glucose dyshomeostasis in adults

Arya Y Nakhe, Prasanna K Dadi ... David A Jacobson

Research Article May 3, 2024
The gain-of-function mutation in the TALK-1 K⁺ channel (p.L114P) is associated with maturity-onset diabetes of the young (MODY). TALK-1 is a key regulator of β-cell electrical activity and glucose-stimulated insulin secretion. The KCNK16 gene encoding TALK-1 is the most abundant and β-cell-restricted K⁺ channel transcript. To investigate the impact of KCNK16 L114P on glucose homeostasis and confirm its association with MODY, a mouse model containing the Kcnk16 L114P mutation was generated. Heterozygous and homozygous Kcnk16 L114P mice exhibit increased neonatal lethality in the C57BL/6J and the CD-1 (ICR) genetic background, respectively. Lethality is likely a result of severe hyperglycemia observed in the homozygous Kcnk16 L114P neonates due to lack of glucose-stimulated insulin secretion and can be reduced with insulin treatment. Kcnk16 L114P increased whole-cell β-cell K⁺ currents resulting in blunted glucose-stimulated Ca²⁺ entry and loss of glucose-induced Ca²⁺ oscillations. Thus, adult Kcnk16 L114P mice have reduced glucose-stimulated insulin secretion and plasma insulin levels, which significantly impairs glucose homeostasis. Taken together, this study shows that the MODY-associated Kcnk16 L114P mutation disrupts glucose homeostasis in adult mice resembling a MODY phenotype and causes neonatal lethality by inhibiting islet insulin secretion during development. These data suggest that TALK-1 is an islet-restricted target for the treatment for diabetes.
1. Computational and Systems Biology
Predicting metabolic modules in incomplete bacterial genomes with MetaPathPredict

David Geller-McGrath, Kishori M Konwar ... Jason E McDermott

Tools and Resources May 2, 2024
The reconstruction of complete microbial metabolic pathways using ‘omics data from environmental samples remains challenging. Computational pipelines for pathway reconstruction that utilize machine learning methods to predict the presence or absence of KEGG modules in incomplete genomes are lacking. Here, we present MetaPathPredict, a software tool that incorporates machine learning models to predict the presence of complete KEGG modules within bacterial genomic datasets. Using gene annotation data and information from the KEGG module database, MetaPathPredict employs deep learning models to predict the presence of KEGG modules in a genome. MetaPathPredict can be used as a command line tool or as a Python module, and both options are designed to be run locally or on a compute cluster. Benchmarks show that MetaPathPredict makes robust predictions of KEGG module presence within highly incomplete genomes.
1. Computational and Systems Biology
2. Evolutionary Biology
Experimental evolution for the recovery of growth loss due to genome reduction

Kenya Hitomi, Yoichiro Ishii, Bei-Wen Ying

Research Article May 1, 2024
As the genome encodes the information crucial for cell growth, a sizeable genomic deficiency often causes a significant decrease in growth fitness. Whether and how the decreased growth fitness caused by genome reduction could be compensated by evolution was investigated here. Experimental evolution with an Escherichia coli strain carrying a reduced genome was conducted in multiple lineages for approximately 1000 generations. The growth rate, which largely declined due to genome reduction, was considerably recovered, associated with the improved carrying capacity. Genome mutations accumulated during evolution were significantly varied across the evolutionary lineages and were randomly localized on the reduced genome. Transcriptome reorganization showed a common evolutionary direction and conserved the chromosomal periodicity, regardless of highly diversified gene categories, regulons, and pathways enriched in the differentially expressed genes. Genome mutations and transcriptome reorganization caused by evolution, which were found to be dissimilar to those caused by genome reduction, must have followed divergent mechanisms in individual evolutionary lineages. Gene network reconstruction successfully identified three gene modules functionally differentiated, which were responsible for the evolutionary changes of the reduced genome in growth fitness, genome mutation, and gene expression, respectively. The diversity in evolutionary approaches improved the growth fitness associated with the homeostatic transcriptome architecture as if the evolutionary compensation for genome reduction was like all roads leading to Rome.