Chromosomes and Gene Expression

Chromosome Structure I: Loop extrusion or boundary:boundary pairing?

Xinyang Bing
Wenfan Ke
Miki Fujioka
Amina Kurbidaeva
Sarah Levitt
Mike Levine
Paul Schedl author has email address
James B. Jaynes

Lewis Sigler Institute, Princeton University, Princeton, NJ. 08544
Present address: BlueRock Therapeutics, Cambridge, Ma 20142
Department of Molecular Biology, Princeton University, Princeton NJ 08544
Department of Biochemistry and Molecular Biology, Thomas Jefferson University, Philadelphia, PA 19107
Present address: Center for Genomics and Systems Biology, New York University, NY 10003

https://doi.org/10.7554/eLife.94070.2

Open access
Copyright information

Figures and data

Diagrams of possible loop topologies generated by loop extrusion and boundary:boundary pairing.
A) Cohesin embraces a loop at a loading site somewhere within the TAD-to-be and then extrudes a loop at an equal rate on both strands. Extrusion continues until cohesin encounters boundary roadblocks on both strands. B) In one model, the orientation of the roadblock is important. As a consequence, the chromatin fiber will be organized into a series of stem-loops separated from each other by unanchored loops. C) If the presence of a boundary roadblock, but not its orientation, is sufficient to halt extrusion, the chromatin fiber will be organized into a series of linked stem-loops. The main axis of the chromosome will be defined by the cohesin:CTCF roadblocks. D) The pairing of two boundaries head-to-tail generates a stem-loop. E) If boundary pairing interactions are strictly pairwise, head-to-tail pairing will generate a series of stem-loops separated from each other by unanchored loops. F) If boundaries can engage in multiple head-to-tail pairing interactions, the chromosome will be organized into a series of linked stem-loops. The main axis of the chromosome will be defined by a series of paired boundaries. G) The pairing of two boundaries head-to-head generates a circle-loop. H) If boundary pairing interactions are strictly pairwise there will be a series of circle-loops separated from each other by unanchored loops. I) If boundaries can engage in multiple head-to-head pairing interactions, the chromatin fiber will be organized into a series of circle-loops connected to each other at their base. Pairing interactions between boundaries #1 and #2 need not be in register with pairing of boundaries #2 and #3. In this case, the main axis of the chromosome may bend and twist, and this could impact the relative orientations of the circle-loops.

TAD organization of the even-skipped locus and neighboring sequences.
A) The eve TAD is a volcano with a plume that is anchored by nhomie and homie. ChIP-seq data below the MicroC map indicate that many of the known fly chromosomal architectural proteins are associated with the two eve boundary elements in vivo (Cuartero et al. 2014; Duan et al. 2021; Gaskill et al. 2021; Li et al. 2015; Sun et al. 2015; Ueberschär et al. 2019; Zolotarev et al. 2016). B) The eve locus forms a stem-loop structure. In this illustration, nhomie pairs with homie head-to-tail, and this forms a stem-loop which brings sequences upstream of nhomie and downstream of homie into contact, as is observed in insulator bypass assays (Kyrchanova et al. 2008a). The eve locus is shown assembled into a coiled “30 nM” chromatin fiber. C) MicroC contact pattern for the chromosomal region spanning the attP site at –142 kb and the eve locus. Like the eve volcano, this contact pattern was generated using aggregated previously published NC14 data (Batut et al. 2022; Levo et al. 2022). A black arrow indicates the –142kb locus where transgenes are integrated into the genome. A blue arrow indicates the position of the Etf-QO gene. Note the numerous TADs between the –142kb hebe locus from the eve “volcano TAD”. Individual TADs are labeled TA—TM. Black boxes indicate positions of sequences that, based on ChIP experiments, are bound by one or more known insulator proteins in vivo.

Schematics of boundary pairing and loop extrusion.
A) GeimohL and GhomieL transgenes and the eve locus on a linear map. The same color codes are used throughout. In the eve locus, nhomie (blue arrow) and homie (red arrow) are oriented, by convention, so that they are pointing toward the right. This convention is maintained in the two transgenes. In GeimohL, homie (top) is in the opposite orientation from homie in the eve TAD and so is pointing away from the eve TAD. In GhomieL (bottom) homie is in the same orientation as homie in the eve locus and so is pointing towards the eve TAD. B) Predicted boundary pairing interactions between GeimohL and the eve TAD. homie in the transgene pairs with homie in the eve TAD head-to-head. Since homie in the transgene is pointing in the opposite orientation from homie in the eve TAD, a stem-loop will be generated. C) If homie in the transgene also pairs with nhomie in the eve TAD head-to-tail, a loop structure like that shown in C will be generated. In this topology, eve-lacZ is in close proximity to eve enhancers, and eve-gfp is in contact with the hebe enhancer. D) Loop-extrusion model for GeimohL. Transgene homie and endogenous homie determine the endpoints of the extruded eveMammoth (eveMa) loop. Like B, the topology of eveMa is a stem-loop. This topology brings eve-lacZ into close proximity to the eve enhancers, while eve-gfp is in another loop that contains the hebe enhancers. E) Predicted boundary pairing between GhomieL and the eve locus. The GhomieL transgene is inserted in the same chromosomal orientation as GeimohL; however, the homie boundary is inverted so that it is in the same orientation as the eve homie, and so it is pointing towards the eve TAD. homie in the transgene will pair with homie in the eve locus head-to-head, and this generates a circle-loop. F) If homie in the transgene also pairs with nhomie in the eve TAD, a loop structure like that shown in F will be generated. In this topology eve-gfp will be activated by both the eve and hebe enhancers. G) Loop-extrusion model for GhomieL. The cohesin complex bypasses transgenic homie and is stopped at an upstream boundary X, to generate a novel eveGarantuan (eveGa) loop. Both eve-gfp and eve-lacZ are located within the same eveGa TAD, and thus should interact with eve. Since the hebe enhancers are within this TAD as well, they would also be able to activate both reporters.

The lacZ reporter is activated by eve enhancers in the GeimohL insert.
A) Expression patterns of endogenous eve in early (stage 4-5) and late (stage 14-16) embryos by smFISH. eve (Atto 633) probes were used to hybridize with eve mRNA. eve is shown in yellow and DAPI in blue. N>3. B) Schematics of transgenes. C) Expression patterns of transgenic reporters in early-stage embryos. Top: GlambdaL; bottom: GeimohL. GFP is in green, lacZ is in orange, and DAPI is in blue, here and in E. D) Quantification of normalized stripe signals for transgenes shown in C. The reporter smFISH intensity (signal to background), here and in F, was measured, normalized, and plotted as described in Methods. N>3. n=27 for GlambdaL and n=42 for GeimohL. E) Expression patterns of transgenic reporters in late-stage embryos. Top: GlambdaL; bottom: GeimohL. F) Quantification of normalized stripe signals for transgenes shown in E. N>3. n=27 for GlambdaL, n=56 for GeimohL. Scale bars = 200μm. N = # of independent biological replicates. n = # of embryos. The paired two-tailed t-test was used for statistical analysis. ****p < 0.0001, ns: not significant. Raw measurements are available in the Source Data files.

The GFP reporter is activated by both eve and hebe enhancers in the GhomieL insert.
A) Schematics of transgenes. B) Expression patterns of transgenic reporters in early-stage embryos. Top: GlambdaL; bottom: GhomieL. GFP is in green, lacZ is in orange, and DAPI is in blue, here and in D. C) Quantification of normalized stripe signals as represented in B. The reporter smFISH intensity (signal to background) was measured, normalized, and plotted as described in Methods, here and in E. N>3. n=27 for GlambdaL, n=46 for GeimohL. D) Expression patterns of transgenic reporters in late-stage embryos. Top: GlambdaL; bottom: GhomieL. E) Quantification of normalized stripe signals for transgenes shown in D. N>3. n=27 for GlambdaL, n=59 for GeimohL. Scale bars = 200μm. N = # of independent biological replicates. n = # of embryos. The paired two-tailed t-test were used for statistical analysis. ****p < 0.0001, ns: not significant. Raw measurements are available in the Source Data files.

Expression of reporters in the LeimohG and LhomieG inserts.
A) Schematics of GE transgenes. B) Boundary pairing for LeimohG transgene long-range interactions. In this topology, GFP is in close proximity to eve enhancers, and lacZ is in contact with the hebe enhancer region. C) Boundary pairing for the LhomieG transgene. In this topology, lacZ is in close proximity to both the eve enhancers and the hebe enhancers. D) Loop-extrusion model for LeimohG. Transgenic homie and endogenous homie flank the eveMammoth loop. In this topology, GFP is in close proximity to eve enhancers, and lacZ is close to the hebe enhancers. E) Loop-extrusion model for the LhomieG transgene. The cohesin complex bypasses transgenic homie and is stopped at an upstream boundary X, to create the eveGarantuan loop. Both GFP and lacZ are at a similar physical distance from eve enhancers and hebe enhancers in this topology. F) Expression patterns of transgenic reporters in early-stage embryos. Top: LeimohG; bottom: LhomieG. GFP is in green, lacZ is in orange, and DAPI is in blue. N=3, n=24 for both LeimohG and LhomieG. G) Expression patterns of transgenic reporters in late-stage embryos. Top: LeimohG; bottom: LhomieG. N=3, n=24 for both LeimohG and LhomieG. Scale bars = 200μm, N = # of independent biological replicates, n = # of embryos. Statistical analysis is in Table Supplemental #1.

homie in transgenes LhomieG and LhomieG mediates long-range physical interactions with the eve TAD.
Diagrams for panels A, B, and C. yellow box: 5’ end of the hebe transcription unit. Transgene: green box: eve-gfp reporter, orange box: eve-lacZ reporter, black box: lambda DNA, red block arrow: homie DNA. eve TAD: blue block arrow: endogenous nhomie, red block arrow: endogenous homie; the directions of block arrows follows the established convention for nhomie and homie; gray boxes: eve enhancers, yellow box: eve transcription unit. A) MicroC contact profile of the control GlambdaL. Inset shows a blow-up of the contacts between the GlambdaL transgene and the eve TAD. Note a slight increase in interaction frequency (compare to Figure 1A). B, C) MicroC map of GeimohL and GhomieL, respectively. The key difference between the two inserts is the contacts between transgenes and sequences in the eve TAD. Note the changed pattern of interaction with the endogenous eve locus due to the orientation switch. D) “Virtual 4C” maps obtained from MicroC maps of “B” (top panel) and “C” (bottom panel). Viewpoints are shown in both panels from either the lacZ gene (orange) or the GFP gene (green).

homie in the transgene interacts with eve homie and nhomie.
“Virtual 4C” maps of the GeimohL and LhomieG (A) or GhomieL and LeimohG (B) transgenes oriented so that lacZ (blue) or gfp (orange) is located on the eve side of the homie boundary. Viewpoints are taken from the transgenic homie sequence. Note that transgenic homie preferentially interacts with endogenous homie in both orientations.

Circle-loop and stem-loop meta-loops.
Distant boundary elements can find and pair with each other, forming large meta-loops. Panel A shows a circle-loop meta-loop formed by the head-to-head pairing of the blue (blue arrow) and purple (purple arrow) boundaries. As described in the text, the blue boundary separates a small TAD that contains the most distal Lar promoter (indicated by the black arrowhead) and a larger TAD that contains part of the Lar transcription unit (and is upstream of an internal Lar promoter: green arrowhead). The purple boundary is located between two TADS, Pa and Pb, in a gene-poor region of the 2^nd chromosome approximately 600 kb away downstream of the Lar gene. When the blue and purple boundaries pair, sequences in the small TAD upstream of the blue boundary come into contact with sequences in the Pa TAD upstream of the purple boundary. This interaction generates the dark rectangle on the upper left side of the interaction plot. Sequences downstream of the blue boundary in the TAD containing part of the Lar transcription unit are also brought into contact with sequences in the Pb TAD downstream of the purple boundary. This interaction generates the larger rectangular box on the bottom right side of the interaction plot. Shown in the diagram on the right is a schematic illustrating that the head-to-head pairing of the blue and purple boundaries generates a ∼600 kb circle-loop. In the circle loop configuration, sequences in the TADs upstream of the blue and purple boundaries come into contact with each other, as do sequences downstream of the blue and purple boundaries (blue double arrows). Panel B shows a stem-loop meta-loop formed by the head-to-tail pairing of a boundary (blue arrow) located upstream of the beat IV gene with another boundary (purple arrow) located ∼6 Mb away, between the Sid and snu genes. The blue boundary separates a small TAD containing three UDP-glycosyltransferase genes from a larger TAD containing the CG10164 gene and the distal-most promoter of the beat IV gene. The purple boundary separates a ∼20 kb TAD containing Sid and four other genes from a small TAD that contains the most distal snu promoter (green arrowhead). When the blue and purple boundaries pair with each other head-to-tail (see diagram on right), this interaction brings sequences in the large TAD downstream of the blue boundary into contact with sequences upstream of the purple boundary in the 20 kb Sid TAD, and this generates the large rectangular box to the lower left of interaction map. The stem-loop configuration also brings sequences upstream of blue boundary in the small TAD containing the three UDP-glycosyltransferase genes into contact with the TAD containing the distal-most snu promoter. This interaction gives a small rectangular box with a high contact density. Sequences in the TAD upstream of the UDP-glycosyltransferase TAD are also brought into contact with the small TAD containing the snu promoter. These two interacting “zones” are indicated by blue double arrows in the diagram on right. In addition to boundary:boundary pairing, both meta-loops have a prominent dot in the larger regular boxes indicated by the green (panel A) and blue (panel B) arrows. Based on the ChIP signature of the sequences giving rise to these dots, they likely correspond to elements that are bound by the GAF containing LBC complex, as indicated in the figure (“LBC” in both panels). LBC elements in other contexts function to bring enhancers into contact with promoters (Batut et al. 2022; Kyrchanova et al. 2023; Kyrchanova et al. 2019a; Kyrchanova et al. 2019b; Levo et al. 2022).

TAD organization of the even-skipped gene and upstream region.
MicroC contact pattern for the chromosomal region spanning the attP site at –142 kb and the eve locus. Like the eve volcano, this contact pattern was generated using aggregated previously published NC14 data (Batut et al., 2022; Levo et al., 2022). The large black arrow indicates the –142kb locus where transgenes are integrated into the genome. LDC domains are labeled according to the interacting TADs: for example, H-I indicates crosslinking events generated by physical interactions between sequences in two neighboring TADs, TH and TI, while I-J indicates interactions between sequences in TI and TJ. Likewise, G-I indicates crosslinking events generated by physical interactions between sequences in TADs TG and TI, which are separated from each other by TAD TH. Note that unlike the TADs to the left of eve, the eve TAD does not interact frequently with its neighbors. The small black arrow indicates an interaction “dot” between the boundary marking the border of TA and TB and the boundary marking the border of TI and TJ. Red, blue and green arrowheads mark boundaries that may engage in partner switching (see text).

Trouble ahead, trouble behind.
A) Illustration of a broken zipper—zipping in front but not closing behind. B) Crosslinking pattern expected for cohesin loading at a site very close to the middle of the TAD-to-be: The vertical line of linked DNA is generated when the two chromatin fibers in the TAD-to-be are transiently linked together as cohesin passes. When cohesin bumps into the CTCF roadblocks (BE) at the boundaries of the TAD, the vertical line stops. C) Cohesin loading site is offset from center of TAD-to-be to the left. The crosslinking initiates at the loading site and extends vertically until the cohesin complex bumps into the CTCF roadblock on the left. At that point, extrusion of the left chromatin fiber halts, while the chromatin fiber on the right continues to be extruded. The line of crosslinked sequences then extends at a 45° angle until cohesin comes to a halt at the CTCF roadblock on the right. D and E) The loading site is at or close to the boundary on the left (D) or the right (E). In this case extrusion will be unidirectional, and the DNAs crosslinked as cohesin passes will form a line that extends at a 45° angle until cohesin comes to a halt at the CTCF roadblock. F) Cohesin is able to load at multiple sites in the TAD-to-be. In this case, there will be a series of vertical lines whose intensity will be correlated with the relative frequency that a give loading site is used. The vertical lines will extend in each case until cohesin comes to a halt at the closest CTCF roadblock. At that point, the line of crosslinked DNA will extend at a 45° angle until that particular cohesin complex comes to halt at the CTCF roadblock that is farther away. The 45° line of crosslinked DNA will become progressive more intense as it approaches the apex of the triangle. These vertical lines might follow other paths if the loop extrusion is not at equal rates on the two sides, or if it varies stochastically, but even in these cases, the expectation that the 45° lines will be more intense toward the top of the “peak” is still valid.

Models for the formation of LDC domains.
Several different mechanisms could explain the LDC domains. A-D) In the loop-extrusion model, the LDC domains arise because cohesin sometimes breaks through a CTCF roadblock. In A) there are three primary TADs,: TAD1, TAD2, and TAD3. B and C) Occasional breakthrough events can generate a secondary TAD, LDC1:2 (TAD1+TAD2) or LDC2:3 (TAD2+TAD3). D) Even less frequent breakthrough events can generate a tripartite TAD (TAD1+TAD2+TAD3) that will correspond to LDC1:3. E-G) LDC domains arise because TADs sometimes bump into their neighbors. In E, TAD1 bumps into TAD2 to give LDC1:2, while in F, TAD2 bumps into TAD3 to give LDC2:3. G) Less frequently, TAD1 bumps into TAD3 to give LDC1:3. H-J) LDC domains arise because boundaries can switch partners. H and I) In H, boundary A (dark green arrow) skips boundary B (red arrow) and pairs with boundary C (tan arrow). This generates a new TAD, TAD1+TAD2, which corresponds to LDC domain 1:2. In I, boundary B (red arrow) skips boundary C (tan arrow) and pairs with D (dark blue arrow). This generates a new TAD, TAD2+TAD3, which corresponds to LDC domain 2:3. In J, boundary A (dark green arrow) skips both boundary B (red arrow) and boundary C (tan arrow), which don’t pair with each other, and pairs with boundary D (dark blue arrow). This generates a new TAD, TAD1+TAD2+TAD3, which corresponds to LDC domain 1:3.

Loop extrusion: eveElephant.
A) Loop extrusion orientation-dependent model for LeimohG transgene as the sketched in Figure 3D. This configuration generates eveMa. B) Instead of running through nhomie, the cohesin complex comes to a halt at the nhomie boundary, generating eveElephant. A second cohesin complex generates the eve TAD by loading at a site within the eve locus. Both eveElephant and the eve TAD are stem-loops. However, while this double stem-loop configuration would explain why the transgene homie makes contact with both nhomie and homie, the eve enhancers are isolated in their own loop and would not interact with the two reporters in the transgene.

MicroC of LeimohG.
A) MicroC contact map of LhomieG. Diagram below the contact map shows the homie transgene and the eve TADs. Gray lines are masked regions due to repeat sequences. Interactions between the transgene and the eve TAD are indicated by the boxed region, and this boxed region is also shown in the inset on the right. The inset shows that the physical interactions between LhomieG and sequences in the eve TAD are biased toward the eve-gfp reporter. B) “Virtual 4C” map from either the lacZ (brown line) or GFP (green line) gene body viewpoints.

Micro C of LhomieG.
A) MicroC map of LhomieG. Diagram below the contact map shows the homie transgene and the eve TADs. Gray lines are masked regions due to repeat sequences. The interaction pattern is the opposite of that observed for LeimohG (Figure Sup. 5). Instead of interacting with the eve-gfp reporter, sequences in the eve TAD interact preferentially with the eve-lacZ reporter. The relevant region is boxed, and also shown in the inset. B) “Virtual 4C” map from either the lacZ (brown line) or GFP (green line) gene body viewpoints.

Blowup of interactions generated by the transgene with sequences neighboring –142 kb and the eve TAD.
The interaction pattern indicates that the loop topology changes along with the orientation of transgenic homie. Blow-up of off-diagonal interactions, focused on the interaction between the endogenous eve locus and the –142kb transgenic insert with A) LhomieG, B) GhomieL, C) GeimohL, or D) LeimohG. Note the differences in preferential interaction between regions adjacent to transgenic homie and endogenous homie (indicated by the double arrow pairs). In the two cases where transgenic homie is in the same orientation in the genome as endogenous homie (A and B), regions downstream of the transgene preferentially interact with regions downstream of endogenous homie in the TER94 TAD. In contrast, when transgenic homie is in the opposite orientation in the genome as endogenous homie (C and D), sequences upstream of the transgene preferentially interact with sequences downstream of endogenous homie in the TER94 TAD.

Sign up for email alerts