In embryonic stem cells (ESCs), a core transcription factor (TF) network establishes the gene expression program necessary for pluripotency. To address how interactions between four key TFs contribute to cis-regulation in mouse ESCs, we assayed two massively parallel reporter assay (MPRA) libraries composed of binding sites for SOX2, POU5F1 (OCT4), KLF4, and ESRRB. Comparisons between synthetic cis-regulatory elements and genomic sequences with comparable binding site configurations revealed some aspects of a regulatory grammar. The expression of synthetic elements is influenced by both the number and arrangement of binding sites. This grammar plays only a small role for genomic sequences, as the relative activities of genomic sequences are best explained by the predicted affinity of binding sites, regardless of binding site identity and positioning. Our results suggest that the effects of transcription factor binding sites (TFBS) are influenced by the order and orientation of sites, but that in the genome the overall occupancy of TFs is the primary determinant of activity.
Sequencing data has been deposited in GEO under accession code GSE120240.Any additional data generated during this study are included in the manuscript and supporting files.
Massively Parallel Reporter Assay for pluripotency factors in mESCsNCBI Gene Expression Omnibus, GSE120240.
Mapping of transcription factor binding sites in mouse embryonic stem cellsNCBI Gene Expression Omnibus, GSE11431.
Genome-wide maps of REST and its cofactors in mouse E14 cellsNCBI Gene Expression Omnibus, GSE28233.
MTF2 recruits Polycomb Repressive Complex 2 by helical shape-selective DNA bindingNCBI Gene Expression Omnibus, GSE94300.
The landscape of accessible chromatin in mammalian pre-implantation embryos (ATAC-Seq)NCBI Gene Expression Omnibus, GSE66581.
The conserved organization of the human and mouse transcriptomesNCBI Gene Expression Omnibus, GSE49417.
- Barak Cohen
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
- Patricia J Wittkopp, University of Michigan, United States
© 2020, King et al.
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Predicting the dynamics and functions of microbiomes constructed from the bottom-up is a key challenge in exploiting them to our benefit. Current models based on ecological theory fail to capture complex community behaviors due to higher order interactions, do not scale well with increasing complexity and in considering multiple functions. We develop and apply a long short-term memory (LSTM) framework to advance our understanding of community assembly and health-relevant metabolite production using a synthetic human gut community. A mainstay of recurrent neural networks, the LSTM learns a high dimensional data-driven non-linear dynamical system model. We show that the LSTM model can outperform the widely used generalized Lotka-Volterra model based on ecological theory. We build methods to decipher microbe-microbe and microbe-metabolite interactions from an otherwise black-box model. These methods highlight that Actinobacteria, Firmicutes and Proteobacteria are significant drivers of metabolite production whereas Bacteroides shape community dynamics. We use the LSTM model to navigate a large multidimensional functional landscape to design communities with unique health-relevant metabolite profiles and temporal behaviors. In sum, the accuracy of the LSTM model can be exploited for experimental planning and to guide the design of synthetic microbiomes with target dynamic functions.
Deubiquitinating enzymes (DUBs), ~100 of which are found in human cells, are proteases that remove ubiquitin conjugates from proteins, thereby regulating protein turnover. They are involved in a wide range of cellular activities and are emerging therapeutic targets for cancer and other diseases. Drugs targeting USP1 and USP30 are in clinical development for cancer and kidney disease respectively. However, the majority of substrates and pathways regulated by DUBs remain unknown, impeding efforts to prioritize specific enzymes for research and drug development. To assemble a knowledgebase of DUB activities, co-dependent genes, and substrates, we combined targeted experiments using CRISPR libraries and inhibitors with systematic mining of functional genomic databases. Analysis of the Dependency Map, Connectivity Map, Cancer Cell Line Encyclopedia, and multiple protein-protein interaction databases yielded specific hypotheses about DUB function, a subset of which were confirmed in follow-on experiments. The data in this paper are browsable online in a newly developed DUB Portal and promise to improve understanding of DUBs as a family as well as the activities of incompletely characterized DUBs (e.g. USPL1 and USP32) and those already targeted with investigational cancer therapeutics (e.g. USP14, UCHL5, and USP7).