1. Physics of Living Systems
Download icon

Deciphering the regulatory genome of Escherichia coli, one hundred promoters at a time

  1. William T Ireland
  2. Suzannah M Beeler
  3. Emanuel Flores-Bautista
  4. Nicholas S McCarty
  5. Tom Röschinger
  6. Nathan M Belliveau
  7. Michael J Sweredoski
  8. Annie Moradian
  9. Justin B Kinney
  10. Rob Phillips  Is a corresponding author
  1. Department of Physics, California Institute of Technology, United States
  2. Division of Biology and Biological Engineering, California Institute of Technology, United States
  3. Division of Chemistry and Chemical Engineering, California Institute of Technology, United States
  4. Proteome Exploration Laboratory, Division of Biology and Biological Engineering, Beckman Institute, California Institute of Technology, United States
  5. Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, United States
Research Article
Cite this article as: eLife 2020;9:e55308 doi: 10.7554/eLife.55308
21 figures, 12 tables, 3 data sets and 10 additional files

Figures

The E. coli regulatory genome.

Illustration of the current ignorance with respect to how genes are regulated in E. coli. Genes with previously annotated regulation (as reported on RegulonDB [Gama-Castro et al., 2016]) are denoted with blue ticks and genes with no previously annotated regulation denoted with red ticks. The 113 genes explored in this study are labeled in gray, and their precise genomic locations can be found in Figure 1—source data 1.

Figure 1—source data 1

Locations of TSS for all promoters in Figure 1.

In Figure 1 the locations of all promoters studied in Reg-Seq are displayed along the E. coli genome. The source data contains the exact position of the '0' position of each mutagenized promoter region.

https://cdn.elifesciences.org/articles/55308/elife-55308-fig1-data1-v2.csv
Schematic of the Reg-Seq procedure as used to recover a repressor-binding site.

The process is as follows: After constructing a promoter library driving expression of a randomized barcode (an average of five barcodes for each promoter), RNA-Seq is conducted to determine the frequency of these mRNA barcodes across different growth conditions (list included in Appendix 1 Section 'Growth conditions'). By computing the mutual information between DNA sequence and mRNA barcode counts for each base pair in the promoter region, an 'information footprint' is constructed that yields a regulatory hypothesis for the putative binding sites (with the RNAP-binding region highlighted in blue and the repressor-binding site highlighted in red). Energy matrices, which describe the effect that any given mutation has on DNA-binding energy, as well as sequence logos, are inferred for the putative transcription-factor-binding sites. Next, we identify which transcription factor preferentially binds to the putative binding site via DNA-affinity chromatography followed by mass spectrometry. This procedure culminates in a coarse-grained, cartoon-level view of our regulatory hypothesis for how a given promoter is regulated.

A summary of four direct comparisons of measurements from Sort-Seq and Reg-Seq.

We show the identified regulatory regions as well as quantitative comparisons between inferred position weight matrices. (A) CRP binds upstream of RNAP in the lacZYA promoter. Despite the different measurement techniques for the two inferred position weight matrices, the CRP-binding sites have a Pearson correlation coefficient of r=0.98. (B) The dgoRKADT promoter is activated by CRP in the presence of galactonate and is repressed by DgoR. For Sort-Seq and Reg-Seq, type II activator-binding sites can be identified based on the signals in the information footprint in the area indicated in green. Additionally, the quantitative agreement between the CRP position weight matrices are strong, with r=0.9. (C) The relBE promoter is repressed by RelBE as can be identified algorithmically in both Sort-Seq and Reg-Seq. The inferred logos for the two measurement methods have r=0.8. (D) The marRAB promoter is repressed by MarR. The inferred energy matrices (data not shown) and sequence logos shown have r=0.78. The right most MarR site overlaps with a ribosome-binding site. The overlap has a stronger obscuring effect on the sequence specificity of the Sort-Seq measurement, which measures protein levels directly, than it does on the output of the Reg-Seq measurement. Numeric values for the displayed data can be found in Figure 3—source data 1.

All regulatory architectures uncovered in this study.

For each regulated promoter, activators and their binding sites are labeled in green, repressors and their binding sires are labeled in red, and RNAP-binding sites are labeled in blue. All cartoons are displayed with the transcription direction to the right. Only one RNAP site is depicted per promoter. The transcription-factor-binding sites displayed have either been identified by the method described in the Section 'Automated putative binding site algorithm' or have additional evidence for their presence as described in Table 2. Binding sites found for these promoters in the EcoCyc or RegulonDB databases are only depicted in these cartoons if the sites are within the 160 bp mutagenized region studied, and are detected by Reg-Seq.

Examples of the insight gained by Reg-Seq in the context of promoters with no previously known regulatory information.

Activator-binding regions are highlighted in green, repressor binding regions in red, and RNAP binding regions in blue. (A) From the information footprint of the ykgE promoter under different growth conditions, we can identify a repressor-binding site downstream of the RNAP-binding site. From the enrichment of proteins bound to the DNA sequence of the putative repressor as compared to a control sequence, we can identify YieP as the transcription factor bound to this site as it has a much higher enrichment ratio than any other protein. Lastly, the binding energy matrix for the repressor site along with corresponding sequence logo shows that the wild-type sequence is the strongest possible binder and it displays an imperfect inverted repeat symmetry. (B) Illustration of a comparable dissection for the phnA promoter. Numeric values for the displayed data can be found in Figure 5—source data 1.

Figure 5—source data 1

Data for information footprints, energy matrices, PWMs, and mass spectrometry in Figure 5.

https://cdn.elifesciences.org/articles/55308/elife-55308-fig5-data1-v2.xlsx
A summary of regulatory architectures discovered in this study.

(A) The cartoons display a representative example of each type of architecture, along with the corresponding shorthand notation. (B) Counts of the different regulatory architectures discovered in this study. We exclude the 'gold-standard' promoters (listed in Appendix 2—table 1) unless new transcription factors are also discovered in the promoter. If, for example, one repressor was newly discovered and two activators were previously known, then the architecture is still counted as a (2,1) architecture. (C) Distribution of positions of binding sites discovered in this study for activators and repressors. Only newly discovered binding sites are included in this figure. The position of the transcription-factor-binding sites are calculated relative to the estimated TSS location, which is based on the location of the associated RNAP site. Numeric values for the binding locations can be found in Figure 6—source data 1.

GlpR as a widely acting regulator.

(A) Information footprints for the promoters which we found to be regulated by GlpR, all of which were previously unknown. Activator-binding regions are highlighted in green, repressor-binding regions in red, and RNAP-binding regions in blue. (B) GlpR was demonstrated to bind to rhlE by mass spectrometry. (C) Sequence logos for GlpR-binding sites. Binding sites in the promotes of tff, tig, maoP, rhlE, and rapA have similar DNA binding preferences as seen in the sequence logos and each transcription-factor-binding site binds strongly only in the presence of glucose (As shown in (A)). These similarities suggest that the same transcription factor binds to each site. To test this hypothesis, we knocked out GlpR and ran the Reg-Seq experiments for tff, tig, and maoP. In (A), we see that knocking out GlpR removes the binding signature of the transcription factor. Numeric values for the binding locations can be found in Figure 7—source data 1.

Figure 7—source data 1

Data for information footprints, PWMs, and mass spectrometry in Figure 7.

https://cdn.elifesciences.org/articles/55308/elife-55308-fig7-data1-v2.xlsx
FNR as a global regulator.

FNR is known to be upregulated in anaerobic growth, and here we found it to regulate a suite of six genes. In aerobic growth conditions, the putative FNR sites are weakened. (A) Information footprints for the six regulated promoters. Activator binding regions are highlighted in green, repressor-binding regions in red, and RNAP binding regions in blue. (B) Sequence logos for the FNR-binding sites displayed in (A). The DNA binding preference of the six sites are shown to be similar from their sequence logos. Numeric values for the binding locations can be found in Figure 8—source data 1.

Inspection of a genetic circuit.

(A) Here, the information footprint of the arcA promoter is displayed along with the energy matrix describing the discovered FNR-binding site. (B) Intra-operon regulation of fdhE by both FNR and ArcA. The information footprint of fdhE is displayed. The discovered sites for FNR and ArcA are highlighted and the energy matrix for ArcA is displayed. A TOMTOM (Gupta et al., 2007) search of the binding motif found that ArcA was the most likely candidate for the transcription factor. The displayed information footprint from a knockout of ArcA demonstrates that the binding signature of the site, and its associated RNAP site, are no longer determinants of gene expression. (C) Sequence logos for FNR generated from both the sites cataloged in RegulonDB, as well as the discovered sites regulating arcA and fdhE. (D) Sequence logos for ArcA from sites contained in RegulonDB and the ArcA site regulating fdhE. Numeric values for the binding locations can be found in Figure 9—source data 1.

Figure 9—source data 1

Data for information footprints, energy matrices, and PWMs in Figure 9B.

https://cdn.elifesciences.org/articles/55308/elife-55308-fig9-data1-v2.xlsx
Representative view of the interactive figure that is available online.

This interactive figure captures the entirety of our dataset. Each figure features a drop-down menu of genes and growth conditions. For each such gene and growth condition, there is a corresponding information footprint revealing putative binding sites, an energy matrix that shows the strength of binding of the relevant transcription factor to those binding sites and a cartoon that schematizes the newly-discovered regulatory architecture of that gene. Numeric values for the binding locations can be found in Figure 10—source data 1.

Figure 10—source data 1

Data for information footprints, energy matrices, and PWMs in Figure 10.

https://cdn.elifesciences.org/articles/55308/elife-55308-fig10-data1-v2.xlsx
Procedure to identify binding site regions automatically.

First, an information footprint is generated for the target region. Next, the information footprint is smoothed over a 15 base pair sliding window and a threshold of 2.5×104 bits is applied to identify regions of interest. RNAP-binding sites are first identified (in blue), and the remainder of the regulatory regions are identified as repressor-binding sites (if they tend to increase expression on mutation from wild type) or activator-binding sites (if they tend to decrease expression upon mutation).

Appendix 1—figure 1
Schematic of the genetic construct used in this study.

Mutated DNA libraries for each regulatory region were expressed from a pSC101 plasmid with kanamycin resistance (kanR). Each mutated sequence is 160 bp in length, which includes 45 bp downstream and 115 bp upstream of a given TSS. Each mutated sequence is flanked by primer-binding sites to facilitate cloning. The genetic construct also contains a random barcode, a ribosome-binding site (RBS), a GFP gene, and a terminator labeled with a large 'T'.

Appendix 2—figure 1
Mock data comparing Sort-Seq and Reg-Seq sequence logo values.

These data have a Pearson correlation coefficient of r=0.997. This high correlation is also indicated by the data deviating little from the x=y line.

Appendix 2—figure 2
A visual comparison of the literature binding sites (left panel) and the extent of the binding sites discovered by our algorithmic approach (right panel).

RNAP-binding sites are also labeled in the right panel, but RNAP-binding sites are not included in the false positive analysis. Numeric values for the displayed data can be found in Appendix 2—figure 2—source data 1.

Appendix 2—figure 2—source data 1

Data for information footprints and identified regions in Appendix 2—figure 2.

https://cdn.elifesciences.org/articles/55308/elife-55308-app2-fig2-data1-v2.xlsx
Appendix 2—figure 3
A continuation of the visual comparison of the literature binding sites (left panel) and the binding sites discovered by our algorithmic approach (right panel) begun in Appendix 2—figure 2.
Appendix 2—figure 3—source data 1

Data for information footprints and identified regions in Appendix 2—figure 3.

https://cdn.elifesciences.org/articles/55308/elife-55308-app2-fig3-data1-v2.xlsx
Appendix 2—figure 4
A visual display of the results of the TOMTOM motif comparison between the discovered binding sites and known sequence motifs from RegulonDB and our prior Sort-Seq experiment (Belliveau et al., 2018).

Each dot in a given panel represents a comparison between the target position weight matrix (given in the figure title) and a position weight matrix for a given transcription factor. The p-value is calculated using the null hypothesis, that both motifs are drawn independently from the same underlying probability distribution. The red dotted line is displayed at a p-value of 5×10-4. The line represents a p-value threshold of 0.05 that has been corrected for multiple hypothesis testing using the Bonferroni correction (95 motifs were compared against the target for a p-value threshold of 0.0595=5×10-4). Numeric values for the displayed data can be found in Appendix 2—figure 4-source data 1.

Appendix 3—figure 1
Pearson correlation as a function of the number of unique DNA sequences (as explained in Appendix 2 Section 'Comparison between Reg-Seq by RNA-Seq and 2uorescent sorting').

For seven different genes, we studied how the number of mutated DNA sequences affects the reproducibility of our MCMC inference models. As the number of unique sequences increases, so too does the Pearson correlation value, approaching 1.0. Numeric values for the displayed data can be found in Appendix 3—figure 1—source data 1.

Appendix 3—figure 2
Motif comparison using TOMTOM for the two PhoP-binding sites in the ybjX promoter.

Searching our energy motifs against the RegulonDB database using TOMTOM allowed us to guide our transcription factor knockout experiments. Here, we show the sequence logos of the PhoP transcription factor from RegulonDB (top) and the ones generated from the ybjX promoter energy matrix. E-value = 0.01 using Euclidean distance as a similarity matrix.

Appendix 4—figure 1
Two cases in which we see transcription-factor-binding sites that we have found to regulate both of the two divergently transcribed genes.

(A) An information footprint and regulatory cartoon for the divergently transcribed bdcA and bdcR promoters. A single NsrR site regulates both promoters. (B) An information footprint and regulatory cartoon for the ilvC and ilvY promoters. Both promoters are repressed by IlvY when grown without acetolactate. Only the IlvY site is labeled on the information footprint.

Appendix 4—figure 2
A comparison of the types of architectures found in RegulonDB (Santos-Zavaleta et al., 2019) to the architectures with newly discovered binding sites found in the Reg-Seq study.

For each type of architecture, labeled as (number of activators, number of repressors), the fraction that architecture comprises of the total number of operons is given both for the data found in RegulonDB and from the results of the Reg-Seq experiment. Numeric values for the displayed data can be found in Appendix 4—figure 2—source data 1.

Appendix 4—figure 2—source data 1

Source data for the percentage composition of regulatory architectures.

The source data file contains the percentage composition of each regulatory architecture displayed in the Figure. This includes architectures more complicated than those displayed in the histogram.

https://cdn.elifesciences.org/articles/55308/elife-55308-app4-fig2-data1-v2.csv
Author response image 1
Figure 10 from Rydenfelt et al., 2014b.

Distribution of activating and repressing binding sites bound by global TFs and specific TFs, respectively. The y-axis shows the number of binding sites overlapping each nucleotide position, after aligning all promoters with respect to their transcription start site (TSS) for the different kinds of TFs.

Tables

Table 1
All promoters examined in this study, categorized according to type of regulatory architecture.

Those promoters which have no recognizable RNAP site are labeled as inactive rather than constitutively expressed (0, 0).

ArchitectureTotal number
of promoters
Number of promoters with
at least one newly
discovered binding site
All Architectures11348
(0,0)340
(0,1)2621
(1,0)1110
(1,1)43
(0,2)44
(2,0)32
(1,2)43
(2,1)22
(2,2)11
(3,0)31
(0,3)21
(0,4)10
inactive180
Table 2
All genes investigated in this study categorized according to their regulatory architecture, given as (number of activators, number of repressors).

The regulatory architectures as listed reflect only the binding sites that would be able to be recovered within our 160 bp constructs, but include both newly discovered and previously known binding sites. In those cases where binding sites that appear in RegulonDB or Ecocyc are omitted from this tally, the Section 'Explanation of included binding sites' in Appendix 4 has the reasoning, for each relevant gene, why the binding sites are not shown. The table also lists the number of newly discovered binding sites, previously known binding sites, and number of identified transcription factors. The evidence used for the transcription factor identification is given in the final column. 'Bioinformatic' evidence implies that discovered position weight matrices were compared to known transcription factor position weight matrices. The literature sites column contains only those sites that are both expected to be and are, in actuality, observed in the Reg-Seq data.

ArchitecturePromoterNewly discovered binding sitesLiterature binding sitesIdentified binding sitesEvidence
(0, 0)acuI000
aegA000
arcB000
cra000
dnaE000
ecnB000
fdoH000
holC000
hslU000
htrB000
minC000
modE000
ycgB000
mscL000
pitA000
poxB000
rlmA000
rumB000
sbcB000
sdaB000
tar000
ybdG000
ybiP000
ybjT000
yehT000
yfhG000
ygdH000
ygeR000
yggW000
ynaI000
yqhC000
zapB000
zupT000
amiC000
(0, 1)araC010
bdcR101Known binding location (NsrR) (Partridge et al., 2009)
coaA100
dicC010
dinJ100
ybeZ100
idnK101Mass- Spectrometry (YgbI)
leuABCD101Mass- Spectrometry (YgbI)
mscM100
yedK101Mass- Spectrometry (TreR)
rapA101Growth condition Knockout (GlpR), Bioinformatic (GlpR)
sdiA100
tff-rpsB-tsf101Growth condition Knockout (GlpR), Bioinformatic (GlpR), Knockout (GlpR)
thiM100
tig101Growth condition Knockout (GlpR), Bioinformatic (GlpR), Knockout (GlpR)
ybiO100
ydjA100
yedJ100
phnA101Mass- Spectrometry (YciT)
mutM100
rhlE101Growth condition Knockout (GlpR), Bioinformatic (GlpR), Mass- Spectrometry (GlpR)
uvrD101Bioinformatic (LexA)
dusC100
ftsK010
znuA010
znuCB010
(1, 0)waaA-coaD100
rcsF100
groSL100
mscS100
thrLABC100
yeiQ101Growth condition Knockout (FNR), Bioinformatic (FNR)
ycbZ100
ygjP100
lac010Bioinformatic (CRP)
yehS100
yehU101Growth condition Knockout (FNR), Bioinformatic (FNR)
(0, 2)pcm200
yecE201Mass- Spectrometry (HU)
yjjJ201Growth condition Knockout (MarA), Bioinformatic (MarA)
dcm201Mass- Spectrometry (HNS)
(1, 1)arcA202Growth condition Knockout (FNR), Bioinformatic (FNR), Mass- Spectrometry (FNR, CpxR)
dgoR020Bioinformatic (CRP) Bioinformatic (DgoR)
ykgE202Growth condition Knockout (FNR), Bioinformatic (FNR), Mass- Spectrometry(YieP) Knockout (YieP)
ymgG200
(2, 0)asnA200
fdhE202Growth condition Knockout (FNR, ArcA), Bioinformatic (FNR, ArcA), Knockout (ArcA)
xylF020
(1, 2)marR030Mass- Spectrometry (MarR)
aphA302Growth condition Knockout (FNR), Bioinformatic (FNR), Mass- Spectrometry (DeoR)
iap300
ilvC301Mass- Spectrometry (IlvY) (Rhee et al., 1998)
(2, 1)maoP303Growth condition Knockout (GlpR), Bioinformatic (GlpR), Knockout (PhoP, HdfR, GlpR)
rspA121Mass- Spectrometry (DeoR)
(2, 2)ybjX404Bioinformatic (2 PhoP sites), Mass- Spectrometry (HNS, StpA)
(3, 0)araAB030
xylA030
yicI300
(0, 3)ompR030
ybjL300
(0, 4)relBE040Mass- Spectrometry (RelBE)
Appendix 1—table 1
All growth conditions used in the Reg-Seq study.

Heat shocked cells were exposed to 42°C for 5 min upon reaching OD 0.3 as this is known to induce transcription by σ32 (Arsène et al., 2000). Low oxygen growth cells were grown in a flask sealed with parafilm with minimal oxygen, although some was present as no anaerobic chamber was used. This level of oxygen stress was still sufficient to activate FNR binding, thus activating anaerobic metabolism. For cells grown with iron, upon reaching OD of 0.3 iron was added and cells were incubated for 10 min before harvesting RNA. Growth without cAMP was accomplished by the use of the JK10 strain (Kinney et al., 2010) which does not maintain its cAMP levels.

Growth conditions
M9 with glucose (0.5%)
M9 with acetate (0.5%)
M9 with arabinose (0.5%)
M9 with xylose (0.5%) and arabinose (0.5%)
M9 with succinate (0.5%)
M9 with trehalose (0.5%)
M9 with glucose (0.5%) and 5 mM sodium salycilate
LB
heat shock in M9 with glucose (0.5%)
LB in low oxygen
zinc, 5 mM ZnCl in M9 with glucose (0.5%)
iron, 5 mM FeCL in M9 with glucose (0.5%)
no cAMP in M9 with glucose (0.5%)
Appendix 2—table 1
A suite of experimentally validated and high-evidence binding sites used to test our automated binding site finding algorithm.

Specifically, this list of genes was used to test the false negative rate of our Reg-Seq method by examining what fraction of high-evidence sites were also identified with Reg-Seq.

GeneTranscription factorTranscription factor type
rspACRPactivator
rspAYdfHrepressor
araABAraC (two sites)activator
znuCBZurrepressor
xylACRPactivator
xylAXylR (two sites)activator
xylFXylR (two sites)activator
dicCDicArepressor
relBERelBErepressor
ftsKLexArepressor
znuAZurrepressor
lacCRPactivator
marRFisactivator
marRMarAactivator
marRMarR (two sites)repressor
dgoRCRPactivator
dgoRDgoR (right site)repressor
ompRIHF (three sites)repressor
ompRCRPrepressor
dicADicArepressor
araCAraC (two sites)repressor
araCAraC (two sites)activator
araCCRPactivator
araCXylR (two sites)repressor
Appendix 2—table 2
The results of the comparison between experimentally verified, high-evidence binding sites and Reg-Seq-binding sites.

A visual illustration of the comparison can be found in Appendix 2—figures 2 and 3.

GeneTranscription factorWas the region classified correctly?
rspACRPYes
rspAYdfHYes
araABAraC (two sites)Yes
znuCBZurYes
xylACRPYes
xylAXylR (two sites)Yes
xylFXylR (two sites)Yes
dicCDicAYes
relBERelBEYes
ftsKLexAYes
znuAZurYes
lacCRPYes
marRFisNo
marRMarAYes
marRMarR (two sites)Yes
dgoRCRPYes
dgoRDgoR (right site)No
ompRIHF (three sites)Yes
ompRCRPNo
dicADicANo
araCAraC (four sites)one site identified
araCCRPNo
araCXylR (two sites)No
Appendix 3—table 1
Example dataset of four nucleotide sequences, and the corresponding counts from the plasmid library and mRNAs.
SequenceLibrary sequencing countsmRNA counts
ACTA523
ATTA53
CCTG1111
TAGA123
GTGC20
CACA87
AGGC73
Appendix 3—table 2
Global, absolute quantification for most transcription factors identified in this study, as determined for E. coli K12 grown in both glucose (5 g/L concentration in M9 minimal media) and LB medias.

The values in this table are reprinted from Schmidt et al., 2016 Supplemental Table S6.

Transcription factor nameGlucoseLB
FNR6091101
YieP158261
YciT82104
NsrR872136
LexA5601027
DeoR2634
CRP20483450
YdfH96154
ArcA33675464
Zur70130
GlpR75145
PhoP29673132
HNS2254147133
StpA68635241
DicA2025
YgbI26
XylR18
Appendix 3—table 3
Example energy matrix.

This matrix is in arbitrary units, and the process to obtain absolute units (in kBT) is described in Appendix 3 Section 'Inference of scaling factors for energy matrices'.

PosACGT
0−0.01−0.01−0.010.03
10.0020.05−0.060.008
2−0.0002−0.040.0080.03
3−0.020.02−0.010.01
Appendix 3—table 4
Example dataset with energy predictions.

Energy predictions are made by applying the example energy matrix in Appendix 3—table 3 to the example dataset in Appendix 3—table 1 according to Equation (26).

μ=0μ=1Energy (kBT
5230.05
530.008
11110.09
123−0.03
200.03
87−0.07
73−0.04
Appendix 3—table 5
A table showing scaling factors to convert arbitrary units to absolute units in kBT.

Growth conditions indicate the energy matrix and dataset used in the fit. In some growth condition additional regulatory features will be present, meaning specify condition is important.

GeneGrowthScaling factor A
tff-rpsB-tsfHeat shock8.1 kBT
tigHeat shock26.3 kBT
yjjJHeat shock11.3 kBT
bdcRHeat shock9.9 kBT
fdhEAnaerobic growth6.34 kBT
ykgEArabinose12.1 kBT
dicCArabinose15.1 kBT
rspAArabinose5.5 kBT
Appendix 5—key resources table
Reagent type (species)
or resource
DesignationSource or referenceIdentifiersAdditional information
Cell line (Escherichia coli)E. coli K12E. coli Stock Center
Cell line (Escherichia coli)E. coli ΔYiePE. coli Stock Center
Cell line (Escherichia coli)E. coli ΔGlpRE. coli Stock Center
Cell line (Escherichia coli)E. coli ΔArcAE. coli Stock Center
Cell line (Escherichia coli)E. coli ΔLrhAE. coli Stock Center
Cell line (Escherichia coli)E. coli ΔPhoPE. coli Stock Center
Cell line (Escherichia coli)E. coli ΔHdfRE. coli Stock Center
Strain, strain background
(Escherichia coli)
E. coli ΔGlpR
in K12 strain
This paperKnockout transferred to E. coli K12
Strain, strain background
(Escherichia coli)
E. coli ΔArcA
in K12 strain
This paperKnockout transferred to E. coli K12
Strain, strain background
(Escherichia coli)
E. coli ΔLrhA
in K12 strain
This paperKnockout transferred to E. coli K12
Strain, strain background
(Escherichia coli)
E. coli ΔPhoP
in K12 strain
This paperKnockout transferred to E. coli K12
Strain, strain background
(Escherichia coli)
E. coli ΔHdfR
in K12 strain
This paperKnockout transferred to E. coli K12
Chemical compound, drugQ5 PolymeraseQiagenCat. : M0491L
Chemical compound, drugqPCR master mixQuantaBioCat. : 101414–166
Chemical compound, drugLysyl EndopeptidaseWako ChemicalsCat. : 125–05061
Commercial assay or kitRNEasy Mini kitQiagenCat. : 74104
Chemical compound, drugRNAprotect
bacteria reagent
QiagenCat. : 76506
Software, algorithmmpathicKinney Lab Ireland and Kinney, 2016
Software, algorithmFastXHannon LabRRID:SCR_005534
Software, algorithmFLASHCBCBRRID:SCR_005531
OtherOligo PoolTwist Bioscience
Sequence-based reagentfwd oligo 101IDTTTCGTCTTCACCT CGAGCACGCTTATT CGTGCCGTGTTAT
Sequence-based reagentfwd oligo 102IDTTTCGTCTTCACCTC GAGCACTTTGCTT CAGTCAGATTCGC
Sequence-based reagentfwd oligo 103IDTTTCGTCTTCACCT CGAGCACGTCGAGT CCTATGTAACCGT
Sequence-based reagentfwd oligo 104IDTTTCGTCTTCACCT CGAGCACGTAAGAT GGAAGCCGGGATA
Sequence-based reagentfwd oligo 105IDTTTCGTCTTCACCT CGAGCACGGTGTCGC AACATGATCTAC
Sequence-based reagentfwd oligo 106IDTTTCGTCTTCACCT CGAGCACGTGCTAAG TCACACTGTTGG
Sequence-based reagentfwd oligo 107IDTTTCGTCTTCACCT CGAGCACTCTAAACA GTTAGGCCCAGG
Sequence-based reagentfwd oligo 108IDTTTCGTCTTCACCT CGAGCACGTCTTTAT ACTTGCCTGCCG
Sequence-based reagentfwd oligo 109IDTTTCGTCTTCACCT CGAGCACCACCGCGA TCAATACAACTT
Sequence-based reagentfwd oligo 110IDTTTCGTCTTCACCT CGAGCACTTCGGATA GACTCAGGAAGC
Sequence-based reagentfwd oligo 111IDTTTCGTCTTCACCT CGAGCACCCATTGAT AGATTCGCTCGC
Sequence-based reagentfwd oligo 112IDTTTCGTCTTCACCT CGAGCACTTTTCTAC TTTCCGGCTTGC
Sequence-based reagentfwd oligo 113IDTTTCGTCTTCACCT CGAGCACATGACTAT TGGGGTCGTACC
Sequence-based reagentfwd oligo 114IDTTTCGTCTTCACCT CGAGCACTCGACAAT AGTTGAGCCCTT
Sequence-based reagentfwd oligo 115IDTTTCGTCTTCACCT CGAGCACGAGCCATG TGAAATGTGTGT
Sequence-based reagentfwd oligo 116IDTTTCGTCTTCACCT CGAGCACCGTATACG TAAGGGTTCCGA
Sequence-based reagentfwd oligo 117IDTTTCGTCTTCACCT CGAGCACTTATGATG TCCGGATACCCG
Sequence-based reagentfwd oligo 118IDTTTCGTCTTCACCT CGAGCACTCTTAGAA ATCCACGGGTCC
Sequence-based reagentrev oligo 101IDTTGTAAAACGACGG CCAGTGACTAGCGC TGAGGAGAAGCCT AATAGGGCACAGC AATCAAAAGTA
Sequence-based reagentrev oligo 102IDTTGTAAAACGACG GCCAGTGAGGAGCGC TGAGGAGAAGCC TAATACCGGGATT CAGTGATTGAAC
Sequence-based reagentrev oligo 103IDTTGTAAAACGACG GCCAGTGAGTCCC GCTGAGGAGAAG CCTAATATGAAGAT ATGACGACCCCTG
Sequence-based reagentrev oligo 104IDTTGTAAAACGACGG CCAGTGACCGACGCT GAGGAGAAGCCTAA TATTCCACAGCTC TATGAGGTG
Sequence-based reagentrev oligo 105IDTTGTAAAACGACGG CCAGTGATTGGCGCT GAGGAGAAGCCTA ATAGCAAACATGA CTAGGAACCG
Sequence-based reagentrev oligo 106IDTTGTAAAACGACGG CCAGTGAGATACGC TGAGGAGAAGCC TAATACCGGGACG AGATTAGTACAA
Sequence-based reagentrev oligo 107IDTTGTAAAACGACGGC CAGTGAACTCCGCT GAGGAGAAGCCTA ATACACGCCAGTT GTGAACATAA
Sequence-based reagentrev oligo 108IDTTGTAAAACGACG GCCAGTGATACTCGC TGAGGAGAAGC CTAATACAAAGGC CAAATCAGTTCCA
Sequence-based reagentrev oligo 109IDTTGTAAAACGACGGC CAGTGACCAACGCT GAGGAGAAGCCT AATAGGTGCATGGG AGGAACTATA
Sequence-based reagentrev oligo 110IDTTGTAAAACGACG GCCAGTGAAGGCCGC TGAGGAGAAGCCT AATATGCATGGGT CTGTCTATTGT
Sequence-based reagentrev oligo 111IDTTGTAAAACGACGGC CAGTGAAATTCGC TGAGGAGAAGCCT AATACTCCTATGCT AGCTCGACTC
Sequence-based reagentrev oligo 112IDTTGTAAAACGACG GCCAGTGATTGT CGCTGAGGAGAAG CCTAATAATGGTA AGAAGCTCCCACAA
Sequence-based reagentrev oligo 113IDTTGTAAAACGACGGC CAGTGATTTACGCT GAGGAGAAGCCTA ATACTATGGTCA TTCCCGTACGA
Sequence-based reagentrev oligo 114IDTTGTAAAACGACGGC CAGTGAACCGCGCT GAGGAGAAGCCTA ATATAATCGGCT ACGTTGTGTCT
Sequence-based reagentrev oligo 115IDTTGTAAAACGACGGC CAGTGATGGCCGC TGAGGAGAAGC CTAATATGACTCGA TCCTTTAGTCCG
Sequence-based reagentrev oligo 116IDTTGTAAAACGACGG CCAGTGAGGCCCGC TGAGGAGAAGC CTAATAACGCTTT GTGTTATCCGATG
Sequence-based reagentrev oligo 117IDTTGTAAAACGACGG CCAGTGAGGTGCG CTGAGGAGAAG CCTAATAACCACG GTGGAGTATACATC
Sequence-based reagentrev oligo 118IDTTGTAAAACGACG GCCAGTGACAATCG CTGAGGAGAAGC CTAATAGGCACCA GGTACATATCTCA
Sequence-based reagentmRNA revIDTGCAGGGGATAA TATTGCCCA
Sequence-based reagentfwd sequencing 94IDTAATGATACGGCGACCAC CGAGATCT ACACTCTT TCCCTACACGACGC TCTTCCGATCTGACC TATTAGGCTT CTCCTCAGCG
Sequence-based reagentfwd sequencing 95IDTAATGATACGGCGACCAC CGAGATCT ACACTCTT TCCCTACACGACGC TCTTCCGATCTCAGT TATTAGGCTT CTCCTCAGCG
Sequence-based reagentfwd sequencing 96IDTAATGATACGGCGACCAC CGAGATCT ACACTCTT TCCCTACACGACGC TCTTCCGATCTTCTA TATTAGGCTT CTCCTCAGCG
Sequence-based reagentfwd sequencing 97IDTAATGATACGGCGACCAC CGAGATCT ACACTCTT TCCCTACACGACGC TCTTCCGATCTAGAG TATTAGGCTT CTCCTCAGCG
Sequence-based reagentfwd sequencing 98IDTAATGATACGGCGACCAC CGAGATCT ACACTCTT TCCCTACACGACGC TCTTCCGATCTGCAT TATTAGGCTT CTCCTCAGCG
Sequence-based reagentfwd sequencing 99IDTAATGATACGGCGACCAC CGAGATCT ACACTCTT TCCCTACACGACGC TCTTCCGATCTCTTA TATTAGGCTT CTCCTCAGCG
Sequence-based reagentfwd sequencing 100IDTAATGATACGGCGACCAC CGAGATCT ACACTCTT TCCCTACACGACGC TCTTCCGATCTTAGC TATTAGGCTT CTCCTCAGCG
Sequence-based reagentfwd sequencing 101IDTAATGATACGGCGACCAC CGAGATCT ACACTCTT TCCCTACACGACGC TCTTCCGATCTCAAG TATTAGGCTT CTCCTCAGCG
Sequence-based reagentfwd sequencing 102IDT AATGATACGGCGACCAC CGAGATCT ACACTCTT TCCCTACACGACGC TCTTCCGATCTGTAC TATTAGGCTT CTCCTCAGCG
Sequence-based reagentfwd sequencing 103IDTAATGATACGGCGACCAC CGAGATCT ACACTCTT TCCCTACACGACGC TCTTCCGATCTTGAA TATTAGGCTT CTCCTCAGCG
Sequence-based reagentfwd sequencing 104IDT AATGATACGGCGACCAC CGAGATCT ACACTCTT TCCCTACACGACGC TCTTCCGATCTTCGT TATTAGGCTT CTCCTCAGCG
Sequence-based reagentfwd sequencing 105IDTAATGATACGGCGACCAC CGAGATCT ACACTCTT TCCCTACACGACGC TCTTCCGATCTATGC TATTAGGCTT CTCCTCAGCG
Sequence-based reagentfwd sequencing 106IDTAATGATACGGCGACCAC CGAGATCT ACACTCTT TCCCTACACGACGC TCTTCCGATCTGTCA TATTAGGCTT CTCCTCAGCG
Sequence-based reagentfwd sequencing 107IDTAATGATACGGCGACCAC CGAGATCT ACACTCTT TCCCTACACGACGC TCTTCCGATCTCTCA TATTAGGCTT CTCCTCAGCG
Sequence-based reagentfwd sequencing 108IDTAATGATACGGCGACCAC CGAGATCT ACACTCTT TCCCTACACGACGC TCTTCCGATCTAGTA TATTAGGCTT CTCCTCAGCG
Sequence-based reagentrev sequencingIDTAAGCAGAAGACGGCAT ACGAGATCGGT CTCG GCATTCCTGCTGAACC GCTCTTCCGATCTCAAA GCAGGGGATAA TATTGCCCA
OtherStreptavin coated dynabeadsThermo FisherCat. : 65601
DatabaseRegulonDBRRID:SCR_003499
DatabaseEcoCycRRID:SCR_002433
Author response table 1
posACGT
0-0.005387-0.011758-0.0101760.027322
10.0023380.049826-0.0580300.005866
2-0.000259-0.0372240.0080210.029461
3-0.0174940.015760-0.0121840.013918

Data availability

Sequencing data has been deposited in the SRA under accession no.PRJNA599253 and PRJNA603368 Mass spectrometry data is deposited in the CalTech data repository at https://doi.org/10.22002/d1.1336 Model files and inferred information footprints are deposited in the CalTech data repository at https://doi.org/10.22002/D1.1331 Processed sequencing data sets and analysis software are available in the GitHub repository available at https://doi.org/10.5281/zenodo.3953312.

The following data sets were generated
  1. 1
  2. 2
  3. 3
    NCBI BioProject
    1. W Ireland
    2. S Beeler
    3. E Flores-Bautista
    4. N Belliveau
    5. M Sweredoski
    6. A Moradian
    7. J Kinney
    8. R Phillips
    (2019)
    ID PRJNA603368. Sequencing Data for mapping mutated constructs.

Additional files

Supplementary file 1

This file contains the presumed location of the TSS for each promoter region in Reg-Seq.

It additionally contains the logic behind the choice of TSS when there are multiple options.

https://cdn.elifesciences.org/articles/55308/elife-55308-supp1-v2.xlsx
Supplementary file 2

This file contains all primers used in the Reg-Seq experiment.

Additionally, it contains the flanking sequences of the mutated inserts and the barcodes used to label the growth conditions in the Reg-Seq experiment.

https://cdn.elifesciences.org/articles/55308/elife-55308-supp2-v2.xlsx
Supplementary file 3

This file contains all transcription-factor-binding sites identified either through the automated binding site algorithm or which were identified manually and have additional evidence for binding.

The starting and ending base pairs for each binding site, and whether the transcription factor acts as an activator or repressor are listed.

https://cdn.elifesciences.org/articles/55308/elife-55308-supp3-v2.xlsx
Source code 1

This file contains custom python scripts used in the processing and analysis of sequencing data.

https://cdn.elifesciences.org/articles/55308/elife-55308-code1-v2.tar.gz
Transparent reporting form
https://cdn.elifesciences.org/articles/55308/elife-55308-transrepform-v2.docx
Appendix 2—figure 2—source data 1

Data for information footprints and identified regions in Appendix 2—figure 2.

https://cdn.elifesciences.org/articles/55308/elife-55308-app2-fig2-data1-v2.xlsx
Appendix 2—figure 3—source data 1

Data for information footprints and identified regions in Appendix 2—figure 3.

https://cdn.elifesciences.org/articles/55308/elife-55308-app2-fig3-data1-v2.xlsx
Appendix 2—figure 4—source data 1

All p-values displayed in Appendix 2—figure 4.

https://cdn.elifesciences.org/articles/55308/elife-55308-app2-fig4-data1-v2.csv
Appendix 3—figure 1—source data 1

Pearson correlation values for Appendix 3—figure 1.

https://cdn.elifesciences.org/articles/55308/elife-55308-app3-fig1-data1-v2.txt
Appendix 4—figure 2—source data 1

Source data for the percentage composition of regulatory architectures.

The source data file contains the percentage composition of each regulatory architecture displayed in the Figure. This includes architectures more complicated than those displayed in the histogram.

https://cdn.elifesciences.org/articles/55308/elife-55308-app4-fig2-data1-v2.csv

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)