Massively parallel reporter assay for mapping gene-specific regulatory regions at single-nucleotide resolution
Figures
Overview of the locus-specific massively parallel reporter assay (LS-MPRA).
(A) Schematic representation of LS-MPRA. Bacterial artificial chromosomes (BACs) containing genomic regions of interest are enzymatically fragmented to generate a high-complexity DNA library. Size-selected fragments are cloned into a vector containing a minimal promoter that will drive expression of GFP and a barcode positioned within the 3′ untranslated region (UTR). The LS-MPRA library is then electroporated into retinal cells, where cis‐regulatory module (CRM) activity is inferred by quantifying barcode enrichment in transcribed mRNA. (B) Illustration of fragment-barcoding strategy and necessary elements to sequence, wherein each DNA fragment is uniquely barcoded. (C) Preparation and sequencing of the plasmid library, associating fragment-barcode pairs, and establishing a baseline for barcode abundance, which is used to normalize barcode counts after mapping of barcode-labeled fragments to genomic coordinates. ORF, open reading frame; BC, barcode. Partially created with BioRender.com.
Rho locus-specific massively parallel reporter assay (LS-MPRA) to identify cis‐regulatory modules (CRMs) in the neonatal mouse retina.
(A) LS-MPRA barcode enrichment plots for the Rho locus aligned to a genome browser track from in vivo and ex vivo experiments (N=3 experimental replicates, combined), and annotated with: (i) known Rho regulatory regions, (ii) the coverage of the barcode–fragment association library across the locus, (iii) log₂-transformed base conservation among 60 vertebrate and 40 placental mammal species, (iv) regions of open chromatin in the P8 mouse retina, and (v) RefSeq gene models for the locus. (B) Expanded view of a region of interest from the Rho LS-MPRA plot, showing peaks (i-v) that correspond to known regulatory regions (red bars), genomic conservation, and areas of open chromatin.
Grm6, Vsx2, and Cabp5 LS-MPRAs to identify cis‐regulatory modules (CRMs) in the neonatal mouse retina.
Barcode enrichment plots from the Grm6 (A), Vsx2 (B), and Cabp5 (C) Locus-specific massively parallel reporter assays (LS-MPRAs), aligned with genome browser tracks and annotated with (i) known regulatory regions, (ii) coverage of the barcode-fragment association library across the locus, (iii) log2 base conservation across 60 vertebrate or 40 placental mammal species, (iv) regions of open chromatin in the P8 mouse retina, and (v) RefSeq gene models. Expanded regions of interest (yellow dashed boxes) show peaks (i-iii) or regions that align with known regulatory elements for each gene.
Olig2 locus-specific massively parallel reporter assay (LS-MPRA) to identify candidate cis‐regulatory modules (CRMs) in the developing mouse retina.
(A) Barcode enrichment plot from the Olig2 LS-MPRA, aligned with a genome browser track and annotated with (i) previously described Olig2 regulatory regions identified in mouse embryonic stem cells, fertilized murine oocytes, a mouse lymphoma cell line, or in mouse ventral neural tube (Chen et al., 2008; Fan et al., 2023; Friedli et al., 2010; Sun et al., 2023), (ii) coverage of the barcode-fragment association library across the locus, (iii) log2 base conservation across 60 vertebrate or 40 placental mammal species, (iv) regions of open chromatin in the E14 mouse retina, and (v) RefSeq gene models. (B) Two expanded regions of interest from the Olig2 LS-MPRA, showing peaks that align with known regulatory elements, genomic conservation, and open chromatin. Candidate CRMs identified within blue bars.
Dynamic cis‐regulatory module (CRM) activity and Olig2 expression following inhibition of Notch signaling.
(A) Bar plot showing relative endogenous Olig2 RNA expression in the retina over 0–24 hr of treatment with the γ-secretase inhibitor LY411575 or DMSO, normalized to the 0 hr control. Unpaired t-tests with Holm’s multiple comparisons correction were performed for each Control vs. Treated (n=3) timepoint. (B) Barcode enrichment plot from the Olig2 locus-specific massively parallel reporter assay (LS-MPRA), aligned with a genome browser track after 0–28 hr of LY411575 treatment. Notch inhibitor-responsive CRM candidate regions (NR1–3, blue boxes) displayed differential barcode enrichment across the treatment period. Statistical analysis was performed using one-way Welch’s ANOVA with Dunnett’s T3 multiple comparisons test on AUC values, normalized to the 0 hr timepoint. Error bars in (A) and (B) represent SEM. Experimental replicates (n=3): each pooled 8 retinas. Asterisks indicate statistical significance (*p≤0.05, **p≤0.01, ***p≤0.001).
Olig2 locus-specific massively parallel reporter assay (LS-MPRA) analysis using unique barcodes.
(A) Barcode enrichment plot from a single LS-MPRA replicate comparing analyses with uniquely mapped barcodes to all mapped barcodes, including barcode collisions, displayed with the same genome browser tracks and annotations shown in Figure 3.
Co-localization of GFP driven by Olig2 cis‐regulatory modules (CRMs) and Olig2 in retinal cells.
(A) Schematic of plasmid constructs containing one or all Notch inhibitor-responsive regions driving GFP expression with the endogenous Olig2 minimal promoter, alongside the experimental workflow. (B) Representative transverse sections of E14 retinas electroporated with these plasmids, incubated in vitro for 24 hr, and stained using immunohistochemistry (IHC) for GFP and Olig2. Merged and single-channel images show GFP colocalization with Olig2 (orange arrow) as well as GFP+ cells lacking detectable Olig2 expression (yellow arrow). (C) Quantification of GFP and Olig2 co-localization, represented as pie charts showing (i) the percentage of electroporated Olig2+ cells that express GFP protein, (ii) the percentage of GFP protein+ cells that express Olig2, and (iii) the percentage of electroporated Olig2- cells that express GFP protein in retinas electroporated at E14 with plasmids containing one or all Notch inhibitor-responsive regions driving GFP. Scale bar: 20 µm. Partially created with BioRender.com.
Activity of backbone plasmids containing the Olig2 minimal promoter and EGFP.
(A) Representative transverse sections of E14 retinas incubated in vitro for 24 hr show sparse GFP RNA and GFP fluorescence driven by control plasmids containing either EGFP alone or EGFP under the Olig2 minimal promoter. Pie charts of the percentage of electroporated cells with detectable GFP expression. Partially created with BioRender.com.
Co-localization of cis‐regulatory module (CRM)-directed RNA and Olig2 RNA or protein in retinal cells.
(A–C) Representative transverse sections of E14 retinas incubated in vitro for 24 hr for analysis of localization patterns of Olig2 and/or GFP. (A) Sections stained for Olig2 protein (cyan, yellow arrows) using immunohistochemistry (IHC) and Olig2 RNA (red, magenta arrows) using FISH. A 100% stacked column plot quantifies the overlap between Olig2 protein and Olig2 RNA expression. (B) Intrinsic GFP fluorescence and GFP RNA driven by Olig2-NR1 (yellow and magenta arrows). (C) In retinas electroporated with plasmids containing one or all Notch inhibitor-responsive regions, Olig2 RNA co-localized with GFP protein only (yellow arrow), GFP RNA only (magenta arrow), or both (orange arrow). (D) Pie charts quantifying (column 1) the percentage of electroporated Olig2 RNA+ cells that expressed GFP, (column 2) the percentage of GFP RNA+ cells that expressed Olig2 RNA, (column 3) the percentage of GFP protein+ cells that expressed Olig2 RNA, and (column 4) the percentage of Olig2 RNA- cells that expressed GFP in retinas electroporated. Scale bar: 20 µm.
Degenerate massively parallel reporter assays (MPRAs) to identify functional residues within candidate cis‐regulatory modules (CRMs).
(A) Schematic of d-MPRA library assembly. Point mutations were introduced into Olig2 CRM fragments via error-prone PCR, followed by intraplasmid duplication (IPD) to generate constructs with duplicated mutant CRMs flanking a minimal promoter and GFP ORF. The 3′ CRM copy, located in the 3′ untranslated region (UTR), served as a barcode, with a WPRE sequence included to potentially stabilize transcripts. (B) Conceptual diagram illustrating expected d-MPRA results, showing predicted changes in CRM activity upon disruption of enhancer or repressor binding sites. (C–E) d-MPRA plots displaying log₂ fold changes in mutational frequencies using a 5-base pair sliding window average, normalized across the Olig2-NR1 (C), Olig2-NR2 (D), and Olig2-NR3 (E) regions.
Transcription factor (TF) binding sites within the Olig2-NR2 cis‐regulatory module (CRM).
(A) TF binding motifs predicted using HOMER identified within the Olig2-NR2 CRM candidate, aligned with the average d-massively parallel reporter assay (MPRA) plot for this region (from Figure 7). (B–D) Position frequency matrices of TF binding sites aligned to motifs predicted by HOMER in Olig2-NR2: Mybl1 to Motif 1 (B), Foxn4 and Pax6 to Motif 3 (C), and Otx2 to Motif 14 (D). (E) UMAP visualization of scRNA profiles from E14 mouse retinas, with cell types previously identified by marker genes by Clark et al., 2019. (F) Expression of Olig2 on UMAP. (G) Co-expression (pink) of Olig2 (blue) with Mybl1, Foxn4, Pax6, or Otx2 (red) on UMAP. Percentages of respective populations that co-express Olig2 are indicated (* denotes significance using Fisher’s exact test, p≤0.05). (H) TF occupancy track aligned to (i) Olig2 locus-specific massively parallel reporter assay (LS-MPRA) barcode enrichment after 12 hr LY411575 treatment (replicate of Figure 4B) and (ii) gene models. Occupancy peaks for Mybl1, Foxn4, and Pax6 align with the Olig2-NR2 CRM candidate (blue box).
Transcription factor (TF) Binding sites in Olig2-NR1 and NR3 cis‐regulatory modules (CRMs).
(A, H) TF binding motifs identified within Olig2-NR1 (A) and Olig2-NR3 (H) CRM candidates, aligned with the average d-MPRA plot (from Figure 7). (B–F, I–K) Position frequency matrices of transcription factors aligning to Olig2-NR1: Sox4/11 to Motifs 13 and 16 (B), Lhx2 and Dlx2 to Motif 11 (C), Isl1 to Motif 8 (D), Foxp1 to Motif 18 (E), Mybl1 to Motif 2 (F), and Otx2 to Motif 12 (F); or Olig2-NR3: Ngn2 to Motifs 3 and 5 (I), Bhlhe22 to Motif 13 (J), and Lhx9 to Motif 15 (K). (G, L) Co-expression (pink) of Olig2 (blue) with Sox11, Sox4, Lhx2, Dlx2, Isl1, or Foxp1 (red) for Olig2-NR1 (G) and Ngn2, Bhlhe22, or Lhx9 (red) for Olig2-NR3 (L) on UMAP visualization of E14 mouse retinal gene expression. Percentages of respective populations that co-express Olig2 are indicated (* denotes significance using Fisher’s exact test, p≤0.05).
Ngn2 LS-MPRA to identify cis‐regulatory module (CRM) candidates active in mouse retinal cells expressing Ngn2.
(A) Ngn2 locus-specific massively parallel reporter assay (LS-MPRA) barcode enrichment plot aligned with a genome browser track, annotated with (i) known Ngn2 regulatory regions, (ii) coverage of the barcode-fragment association library, (iii) log₂ base conservation across 60 vertebrate or 40 placental mammal species, (iv) regions of open chromatin in E14 mouse retina, (v) gene models, and (vi) CRM candidate regions 1–4 (blue boxes). (B) Representative transverse sections of E14 retinas incubated for 24 hr in vitro, showing localization of Ngn2 RNA with GFP RNA only (magenta arrow) or both intrinsic GFP and GFP RNA (orange arrows) in retinas electroporated with plasmids containing one of the CRM1-4 regions. (C) Pie charts showing the percentage of Ngn2 RNA+ cells expressing GFP (column 1), GFP RNA+ cells expressing Ngn2 RNA (column 2), GFP protein+ cells expressing Ngn2 RNA (column 3), and Ngn2 RNA- cells expressing GFP (column 4) in retinas electroporated at E14 with plasmids containing CRM1-4 regions driving GFP. (D) Co-localization analysis of Ngn2-CRM3 plasmid and GFP following 16 hr incubation in vitro. Scale bar: 20 µm.
OLIG2 locus-specific massively parallel reporter assay (LS-MPRA) to identify cis‐regulatory module (CRM) candidates in chick embryos.
(A) OLIG2 LS-MPRA barcode enrichment plot following electroporation into E5 chick retinal explants, and E2 spinal cords and E4 spinal cords in ovo, aligned with a genome browser track and annotated with (i) coverage of the barcode-fragment association library, (ii) log₂ base conservation across 77 vertebrate species, (iii) gene models, and (iv) CRM candidate regions 1–3 (blue boxes). (B) Representative transverse sections of E5 chick retinas incubated for 24 hr in vitro, showing localization of chick OLIG2 RNA with GFP RNA only (magenta arrow) or both intrinsic GFP fluorescence and GFP RNA (orange arrows) driven by plasmids containing one of the CRM1-3 regions. (C) Pie charts showing the percentage of OLIG2 RNA+ cells expressing GFP (column 1), GFP RNA+ cells expressing OLIG2 RNA (column 2), GFP protein+ cells expressing OLIG2 RNA (column 3), and OLIG2 RNA- cells expressing GFP (column 4) in retinas electroporated at E5 with plasmids containing CRM1-3 regions driving GFP and visualized by FISH. Scale bar: 20 µm.
Activity of OLIG2 cis‐regulatory modules (CRMs) in OLIG2+ cells within embryonic chick spinal cords.
(A) Schematic of an E2 chick transverse spinal cord with the field of view (red outline) shown in B. (B) Representative transverse hemi-sections of ventral E2 chick spinal cords incubated for 24 hr in ovo, showing (i) OLIG2 RNA localized with GFP RNA (orange arrow), (ii) OLIG2 expression in GFP- cells (yellow arrows), and (iii) GFP RNA expression in OLIG2- cells (magenta arrows). GFP expression is driven by plasmids containing one of the CRM candidate regions (CRM1-3). Scale bar: 20 µm.
Activity of Olig2 cis‐regulatory module (CRM) candidates in postnatal retinal cells expressing Olig2.
(A) Representative transverse sections of postnatal retinas electroporated at P0 and incubated in vivo for 24 hr, stained with antibodies against Olig2 and GFP to identify co-localized expression (orange arrows). GFP expression was driven by plasmids containing one or more Notch-inhibitor responsive regions (NR1-3). (B) Analysis of GFP and Olig2 co-localization, shown with pie charts depicting the percent of Olig2+ electroporated cells expressing GFP (column 1), the percent of GFP+ cells expressing Olig2 (column 2), and the percent of Olig2- electroporated cells expressing GFP in retinas electroporated at P0 with plasmids containing one or more Notch-inhibitor responsive regions driving GFP. Scale bar: 20 µm.
Tables
| Reagent type (species) or resource | Designation | Source or reference | Identifiers | Additional information |
|---|---|---|---|---|
| Strain, strain background (Mus musculus) | CD-1 | Charles River Laboratories | RRID:IMSR_CRL:022 | |
| Strain, strain background (Gallus gallus) | SPF Eggs | AVS Bio | Material Number: 10100330 | |
| Strain, strain background (Escherichia coli) | DH10β Cells | Thermo Fisher | cat. #18290015 | Electrocompetent cells |
| Gene (Mus musculus) | Rho | NCBI GenBank | NM_145383.2 | |
| Recombinant DNA reagent | Rho BAC clone | BACPAC Resources | RP23-219M6 | |
| Gene (Mus musculus) | Grm6 | NCBI GenBank | NM_173372.2 | |
| Recombinant DNA reagent | Grm6 BAC clone | BACPAC Resources | RP23-417M10 | |
| Gene (Mus musculus) | Vsx2 | NCBI GenBank | NM_001301427.1 | |
| Recombinant DNA reagent | Vsx2 BAC clone | BACPAC Resources | RP23-127O21 | |
| Gene (Mus musculus) | Cabp5 | NCBI GenBank | NM_013877.4 | |
| Recombinant DNA reagent | Cabp5 BAC clone | BACPAC Resources | RP24-125H22 | |
| Gene (Mus musculus) | Olig2 | NCBI GenBank | NM_016967.2 | |
| Recombinant DNA reagent | Olig2 BAC clone | BACPAC Resources | CH29-613 | |
| Gene (Mus musculus) | Ngn2 | NCBI GenBank | NM_009718.4 | Formal gene name: Neurog2 |
| Recombinant DNA reagent | Ngn2 BAC clone | BACPAC Resources | RP23-182M12 | |
| Gene (Gal gallus) | OLIG2 | NCBI GenBank | NM_001031526.1 | |
| Recombinant DNA reagent | OLIG2 BAC clone | BACPAC Resources | CH261-60J3 | |
| Recombinant DNA reagent | Stagintbc7 | This paper | Deposited to Addgene; used to create LS-MPRA libraries | |
| Recombinant DNA reagent | Statadual-WPRE | This paper | Deposited to Addgene; used to create d-MPRA libraries | |
| Chemical compound, drug | Fast Green FCF | Millipore Sigma | cat. #F7252 | |
| Chemical compound, drug | L-glutamine | Sigma Aldrich | cat. #G3126 | |
| Chemical compound, drug | Penicillin/ streptomycin | Invitrogen/Gibco | cat. #15140–122 | |
| Chemical compound, drug | Minimum Essential Medium | Millipore Sigma | cat. #51,412 C | |
| Chemical compound, drug | HBSS | Thermo Fisher Scientific | cat. #14-025-092 | |
| Chemical compound, drug | horse serum | Thermo Fisher Scientific | cat. #26050–088 | |
| Chemical compound, drug | HEPES | Invitrogen/Gibco | cat. #15630–080 | |
| Chemical compound, drug | buprenorphine | PAR Pharma | Custom formulation | |
| Chemical compound, drug | proparacaine hydrochloride | Bausch & Lomb | cat. #24208-730-06 | |
| Chemical compound, drug | LY411575 | Millipore Sigma | SML0506 | γ-secretase inhibitor |
| Chemical compound, drug | TRI Reagent | Millipore Sigma | cat. #T9424 | |
| Commercial assay or kit | Zymo Directzol Microprep Kit | Zymo Research | cat. #R2061 | |
| Commercial assay or kit | Dynabeads mRNA Purification Kit | Thermo Fisher Scientific | cat. #61006 | |
| Chemical compound, drug | papain solution | Worthington Biochemical | cat. #LS003126 | |
| Commercial assay or kit | HCR Buffers [v3.0] | Molecular Instruments | v3.0 reagents (discontinued) | |
| Chemical compound, drug | donkey serum | Jackson ImmunoResearch | cat. #017-000-121 | |
| Antibody | Pax6 (mouse monoclonal) | Developmental Studies Hybridoma Bank | RRID:AB_528427 | CUT&RUN primary antibody; 500 ng per sample |
| Antibody | Foxn4 (mouse monoclonal) | Santa Cruz Biotechnologies | cat# sc-377166 | CUT&RUN primary antibody; 500 ng per sample |
| Antibody | Mybl1 (rabbit polyclonal) | Millipore-Sigma | RRID:AB_1078540; cat# HPA008791 | CUT&RUN primary antibody; 500 ng per sample |
| Antibody | Olig2 (mouse monoclonal) | Millipore | RRID:AB_10807410; Cat# MABN50 | Primary antibody for IHC; dilution 1:500. |
| Antibody | GFP (chicken polyclonal) | Abcam | RRID:AB_300798; cat# ab13970 | Primary antibody for IHC; dilution 1:1000. |
| Other | DAPI | Invitrogen | RRID:AB_2629482; cat# D1306 | Nuclear counterstain; used at 1:1000 in IHC. |
| Commercial assay or kit | Zymo DNA Clean & Concentrator Kit | Zymo Research | cat. #D4013 | |
| Commercial assay or kit | Large construct DNA isolation kit | Qiagen | cat. #12462 | |
| Commercial assay or kit | Qubit dsDNA HS Assay Kit | Thermo Fisher Scientific | cats. #Q32851 and # Q32856 | |
| Commercial assay or kit | NEBNext Ultra II FS DNA Module kit | NEB | cat. #E7810S | |
| Chemical compound, drug | T4 DNA ligase buffer | NEB | cat. #B0202S | |
| Commercial assay or kit | Blunt/TA Ligase Master Mix | NEB | cat. #M0367 | |
| Chemical compound, drug | NEBNext Ultra II Q5 Master Mix | NEB | cat. #M0544 | |
| Chemical compound, drug | Agarose Dissolving Buffer | Zymo Research | cat. #D4001-1-100 | |
| Commercial assay or kit | Plasmid Maxi kit | Qiagen | cat. #12163 | |
| Commercial assay or kit | GeneMorph II Random Mutagenesis Kit | Agilent | cat. #200550 | |
| Chemical compound, drug | Bst 2.0 WarmStart Polymerase | NEB | cat. #M0538 | |
| Chemical compound, drug | Taq DNA polymerase | NEB | cat. #M0273 | |
| Commercial assay or kit | CUTANA ChIC/CUT&RUN Kit | EpiCypher | cat# 14–1048 | |
| Commercial assay or kit | NEBNext UltraII DNA Library Prep Kit | NEB | cat. #E7103 | |
| Commercial assay or kit | NEBNext Multiplex Oligos for Illumina Dual Index Sets | NEB | cats. #E7600S or #E7780S | |
| Commercial assay or kit | Quick Ligation reaction | NEB | cat. #M2200 | |
| Chemical compound, drug | Exonuclease | Lucigen | cat. #E3101K | |
| Commercial assay or kit | Protoscript II Kit | NEB | cat. #E6560 | |
| Commercial Assay or kit | Zymo Oligo Clean & Concentrator Kit | Zymo Research | cat. #D4060 | |
| Commercial assay or kit | LunaScript RT SuperMix Kit | NEB | cat. #E3010 | |
| Software/algorithm | Integrative Genomics Viewer | PMID:21221095 | RRID:SCR_011793 | |
| Software/algorithm | Prism | GraphPad | RRID:SCR_002798 | |
| Software/algorithm | FIJI | ImageJ | RRID:SCR_002285 | |
| Software/algorithm | R | The R Foundation for Statistical Computing | RRID:SCR_001905 |
Additional files
-
Supplementary file 1
PCR primers, genomic coordinates, and cell type specificity quantification.
(a) PCR primers used in all experiments. (b) Genomic coordinates for all regions of interest. (c–g) Quantification of EGFP expression from electroporated plasmid constructs and colocalization with cell-type-specific gene expression.
- https://cdn.elifesciences.org/articles/107565/elife-107565-supp1-v1.docx
-
MDAR checklist
- https://cdn.elifesciences.org/articles/107565/elife-107565-mdarchecklist1-v1.pdf
-
Source data 1
LS-MPRA library characteristics and supplemental QC plots.
- https://cdn.elifesciences.org/articles/107565/elife-107565-data1-v1.pdf
-
Source data 2
D-MPRA library supplemental QC plots.
- https://cdn.elifesciences.org/articles/107565/elife-107565-data2-v1.pdf