The majority of promoters emerge and evolve within a subset of preexisting promoter motifs.
(A) We calculated the mutual information Ii(b, f) between nucleotide identity (b = A,T,C,G) and fluorescence scores rounded to the nearest whole number ( f = 1,2,3,4 a.u.) for each position i in a parent sequence. In essence, the calculation compares the probability pi(b) of a base b occurring at position i , and the probability p(f) that a sequence has a fluorescence score f. The joint probability pi(b, f) is the probability that a sequence with base b at position i has fluorescence score f. The greater the joint probability is compared to the individual probabilities, the more important the base at this position is for promoter activity. See methods. (B) An example of how to interpret mutual information using position-weight matrix (PWM) scores of predicted -10 and -35 box motifs. Top: we plot the mutual information Ii(b, f) for P19’s top strand. P19 is an active promoter on both DNA strands. Solid line: mean mutual information. Shaded region: ± 1 standard deviation when the dataset is randomly split into three equally sized subsets (methods). Bottom: position-weight matrix (PWM) predictions for the -10 box motifs (magenta trapezoids) and -35 box motifs (orange trapezoids) along the wild-type parent sequence. We define hotspots as mutual information peaks greater than or equal to the 90th percentile of total mutual information (methods), and highlight them with dashed rectangles. (C) Stacked bar plots depicting the percentage of hotspots overlapping with -10 box motifs only (magenta), -35 box motifs only (orange), both -10 and -35 box motifs (red), or with neither (gray). We plot this both for active parents (left) and inactive parents (right). (D) Analogous to (B) but for the bottom strand of P3. P3 is an inactive parent sequence. Hotspot overlaps with a -10 box motif. (E) Analogous to (B) but for the top strand of P18. P18 is an inactive parent sequence. Hotspots overlap with (from left to right) a -35 box motif, both a -10 and a -35 box motif, and neither (None). Fig S6 shows analogous mutual information plots for daughters derived from each parent sequence.