Massively parallel reporter assay for mapping gene-specific regulatory regions at single-nucleotide resolution

  1. Alastair J Tulloch
  2. Ryan Nicholas Delgado
  3. Rinaldo Catta-Preta
  4. Constance L Cepko  Is a corresponding author
  1. Department of Genetics, Harvard Medical School, United States
  2. Howard Hughes Medical Institute, Harvard Medical School, United States
  3. Department of Ophthalmology, Harvard Medical School, United States
11 figures, 1 table and 4 additional files

Figures

Overview of the locus-specific massively parallel reporter assay (LS-MPRA).

(A) Schematic representation of LS-MPRA. Bacterial artificial chromosomes (BACs) containing genomic regions of interest are enzymatically fragmented to generate a high-complexity DNA library. Size-selected fragments are cloned into a vector containing a minimal promoter that will drive expression of GFP and a barcode positioned within the 3′ untranslated region (UTR). The LS-MPRA library is then electroporated into retinal cells, where cis‐regulatory module (CRM) activity is inferred by quantifying barcode enrichment in transcribed mRNA. (B) Illustration of fragment-barcoding strategy and necessary elements to sequence, wherein each DNA fragment is uniquely barcoded. (C) Preparation and sequencing of the plasmid library, associating fragment-barcode pairs, and establishing a baseline for barcode abundance, which is used to normalize barcode counts after mapping of barcode-labeled fragments to genomic coordinates. ORF, open reading frame; BC, barcode. Partially created with BioRender.com.

Figure 2 with 1 supplement
Rho locus-specific massively parallel reporter assay (LS-MPRA) to identify cis‐regulatory modules (CRMs) in the neonatal mouse retina.

(A) LS-MPRA barcode enrichment plots for the Rho locus aligned to a genome browser track from in vivo and ex vivo experiments (N=3 experimental replicates, combined), and annotated with: (i) known Rho regulatory regions, (ii) the coverage of the barcode–fragment association library across the locus, (iii) log₂-transformed base conservation among 60 vertebrate and 40 placental mammal species, (iv) regions of open chromatin in the P8 mouse retina, and (v) RefSeq gene models for the locus. (B) Expanded view of a region of interest from the Rho LS-MPRA plot, showing peaks (i-v) that correspond to known regulatory regions (red bars), genomic conservation, and areas of open chromatin.

Figure 2—figure supplement 1
Grm6, Vsx2, and Cabp5 LS-MPRAs to identify cis‐regulatory modules (CRMs) in the neonatal mouse retina.

Barcode enrichment plots from the Grm6 (A), Vsx2 (B), and Cabp5 (C) Locus-specific massively parallel reporter assays (LS-MPRAs), aligned with genome browser tracks and annotated with (i) known regulatory regions, (ii) coverage of the barcode-fragment association library across the locus, (iii) log2 base conservation across 60 vertebrate or 40 placental mammal species, (iv) regions of open chromatin in the P8 mouse retina, and (v) RefSeq gene models. Expanded regions of interest (yellow dashed boxes) show peaks (i-iii) or regions that align with known regulatory elements for each gene.

Olig2 locus-specific massively parallel reporter assay (LS-MPRA) to identify candidate cis‐regulatory modules (CRMs) in the developing mouse retina.

(A) Barcode enrichment plot from the Olig2 LS-MPRA, aligned with a genome browser track and annotated with (i) previously described Olig2 regulatory regions identified in mouse embryonic stem cells, fertilized murine oocytes, a mouse lymphoma cell line, or in mouse ventral neural tube (Chen et al., 2008; Fan et al., 2023; Friedli et al., 2010; Sun et al., 2023), (ii) coverage of the barcode-fragment association library across the locus, (iii) log2 base conservation across 60 vertebrate or 40 placental mammal species, (iv) regions of open chromatin in the E14 mouse retina, and (v) RefSeq gene models. (B) Two expanded regions of interest from the Olig2 LS-MPRA, showing peaks that align with known regulatory elements, genomic conservation, and open chromatin. Candidate CRMs identified within blue bars.

Figure 4 with 1 supplement
Dynamic cis‐regulatory module (CRM) activity and Olig2 expression following inhibition of Notch signaling.

(A) Bar plot showing relative endogenous Olig2 RNA expression in the retina over 0–24 hr of treatment with the γ-secretase inhibitor LY411575 or DMSO, normalized to the 0 hr control. Unpaired t-tests with Holm’s multiple comparisons correction were performed for each Control vs. Treated (n=3) timepoint. (B) Barcode enrichment plot from the Olig2 locus-specific massively parallel reporter assay (LS-MPRA), aligned with a genome browser track after 0–28 hr of LY411575 treatment. Notch inhibitor-responsive CRM candidate regions (NR1–3, blue boxes) displayed differential barcode enrichment across the treatment period. Statistical analysis was performed using one-way Welch’s ANOVA with Dunnett’s T3 multiple comparisons test on AUC values, normalized to the 0 hr timepoint. Error bars in (A) and (B) represent SEM. Experimental replicates (n=3): each pooled 8 retinas. Asterisks indicate statistical significance (*p≤0.05, **p≤0.01, ***p≤0.001).

Figure 4—figure supplement 1
Olig2 locus-specific massively parallel reporter assay (LS-MPRA) analysis using unique barcodes.

(A) Barcode enrichment plot from a single LS-MPRA replicate comparing analyses with uniquely mapped barcodes to all mapped barcodes, including barcode collisions, displayed with the same genome browser tracks and annotations shown in Figure 3.

Figure 5 with 1 supplement
Co-localization of GFP driven by Olig2 cis‐regulatory modules (CRMs) and Olig2 in retinal cells.

(A) Schematic of plasmid constructs containing one or all Notch inhibitor-responsive regions driving GFP expression with the endogenous Olig2 minimal promoter, alongside the experimental workflow. (B) Representative transverse sections of E14 retinas electroporated with these plasmids, incubated in vitro for 24 hr, and stained using immunohistochemistry (IHC) for GFP and Olig2. Merged and single-channel images show GFP colocalization with Olig2 (orange arrow) as well as GFP+ cells lacking detectable Olig2 expression (yellow arrow). (C) Quantification of GFP and Olig2 co-localization, represented as pie charts showing (i) the percentage of electroporated Olig2+ cells that express GFP protein, (ii) the percentage of GFP protein+ cells that express Olig2, and (iii) the percentage of electroporated Olig2- cells that express GFP protein in retinas electroporated at E14 with plasmids containing one or all Notch inhibitor-responsive regions driving GFP. Scale bar: 20 µm. Partially created with BioRender.com.

Figure 5—figure supplement 1
Activity of backbone plasmids containing the Olig2 minimal promoter and EGFP.

(A) Representative transverse sections of E14 retinas incubated in vitro for 24 hr show sparse GFP RNA and GFP fluorescence driven by control plasmids containing either EGFP alone or EGFP under the Olig2 minimal promoter. Pie charts of the percentage of electroporated cells with detectable GFP expression. Partially created with BioRender.com.

Co-localization of cis‐regulatory module (CRM)-directed RNA and Olig2 RNA or protein in retinal cells.

(A–C) Representative transverse sections of E14 retinas incubated in vitro for 24 hr for analysis of localization patterns of Olig2 and/or GFP. (A) Sections stained for Olig2 protein (cyan, yellow arrows) using immunohistochemistry (IHC) and Olig2 RNA (red, magenta arrows) using FISH. A 100% stacked column plot quantifies the overlap between Olig2 protein and Olig2 RNA expression. (B) Intrinsic GFP fluorescence and GFP RNA driven by Olig2-NR1 (yellow and magenta arrows). (C) In retinas electroporated with plasmids containing one or all Notch inhibitor-responsive regions, Olig2 RNA co-localized with GFP protein only (yellow arrow), GFP RNA only (magenta arrow), or both (orange arrow). (D) Pie charts quantifying (column 1) the percentage of electroporated Olig2 RNA+ cells that expressed GFP, (column 2) the percentage of GFP RNA+ cells that expressed Olig2 RNA, (column 3) the percentage of GFP protein+ cells that expressed Olig2 RNA, and (column 4) the percentage of Olig2 RNA- cells that expressed GFP in retinas electroporated. Scale bar: 20 µm.

Degenerate massively parallel reporter assays (MPRAs) to identify functional residues within candidate cis‐regulatory modules (CRMs).

(A) Schematic of d-MPRA library assembly. Point mutations were introduced into Olig2 CRM fragments via error-prone PCR, followed by intraplasmid duplication (IPD) to generate constructs with duplicated mutant CRMs flanking a minimal promoter and GFP ORF. The 3′ CRM copy, located in the 3′ untranslated region (UTR), served as a barcode, with a WPRE sequence included to potentially stabilize transcripts. (B) Conceptual diagram illustrating expected d-MPRA results, showing predicted changes in CRM activity upon disruption of enhancer or repressor binding sites. (C–E) d-MPRA plots displaying log₂ fold changes in mutational frequencies using a 5-base pair sliding window average, normalized across the Olig2-NR1 (C), Olig2-NR2 (D), and Olig2-NR3 (E) regions.

Figure 8 with 1 supplement
Transcription factor (TF) binding sites within the Olig2-NR2 cis‐regulatory module (CRM).

(A) TF binding motifs predicted using HOMER identified within the Olig2-NR2 CRM candidate, aligned with the average d-massively parallel reporter assay (MPRA) plot for this region (from Figure 7). (B–D) Position frequency matrices of TF binding sites aligned to motifs predicted by HOMER in Olig2-NR2: Mybl1 to Motif 1 (B), Foxn4 and Pax6 to Motif 3 (C), and Otx2 to Motif 14 (D). (E) UMAP visualization of scRNA profiles from E14 mouse retinas, with cell types previously identified by marker genes by Clark et al., 2019. (F) Expression of Olig2 on UMAP. (G) Co-expression (pink) of Olig2 (blue) with Mybl1, Foxn4, Pax6, or Otx2 (red) on UMAP. Percentages of respective populations that co-express Olig2 are indicated (* denotes significance using Fisher’s exact test, p≤0.05). (H) TF occupancy track aligned to (i) Olig2 locus-specific massively parallel reporter assay (LS-MPRA) barcode enrichment after 12 hr LY411575 treatment (replicate of Figure 4B) and (ii) gene models. Occupancy peaks for Mybl1, Foxn4, and Pax6 align with the Olig2-NR2 CRM candidate (blue box).

Figure 8—figure supplement 1
Transcription factor (TF) Binding sites in Olig2-NR1 and NR3 cis‐regulatory modules (CRMs).

(A, H) TF binding motifs identified within Olig2-NR1 (A) and Olig2-NR3 (H) CRM candidates, aligned with the average d-MPRA plot (from Figure 7). (B–F, I–K) Position frequency matrices of transcription factors aligning to Olig2-NR1: Sox4/11 to Motifs 13 and 16 (B), Lhx2 and Dlx2 to Motif 11 (C), Isl1 to Motif 8 (D), Foxp1 to Motif 18 (E), Mybl1 to Motif 2 (F), and Otx2 to Motif 12 (F); or Olig2-NR3: Ngn2 to Motifs 3 and 5 (I), Bhlhe22 to Motif 13 (J), and Lhx9 to Motif 15 (K). (G, L) Co-expression (pink) of Olig2 (blue) with Sox11, Sox4, Lhx2, Dlx2, Isl1, or Foxp1 (red) for Olig2-NR1 (G) and Ngn2, Bhlhe22, or Lhx9 (red) for Olig2-NR3 (L) on UMAP visualization of E14 mouse retinal gene expression. Percentages of respective populations that co-express Olig2 are indicated (* denotes significance using Fisher’s exact test, p≤0.05).

Ngn2 LS-MPRA to identify cis‐regulatory module (CRM) candidates active in mouse retinal cells expressing Ngn2.

(A) Ngn2 locus-specific massively parallel reporter assay (LS-MPRA) barcode enrichment plot aligned with a genome browser track, annotated with (i) known Ngn2 regulatory regions, (ii) coverage of the barcode-fragment association library, (iii) log₂ base conservation across 60 vertebrate or 40 placental mammal species, (iv) regions of open chromatin in E14 mouse retina, (v) gene models, and (vi) CRM candidate regions 1–4 (blue boxes). (B) Representative transverse sections of E14 retinas incubated for 24 hr in vitro, showing localization of Ngn2 RNA with GFP RNA only (magenta arrow) or both intrinsic GFP and GFP RNA (orange arrows) in retinas electroporated with plasmids containing one of the CRM1-4 regions. (C) Pie charts showing the percentage of Ngn2 RNA+ cells expressing GFP (column 1), GFP RNA+ cells expressing Ngn2 RNA (column 2), GFP protein+ cells expressing Ngn2 RNA (column 3), and Ngn2 RNA- cells expressing GFP (column 4) in retinas electroporated at E14 with plasmids containing CRM1-4 regions driving GFP. (D) Co-localization analysis of Ngn2-CRM3 plasmid and GFP following 16 hr incubation in vitro. Scale bar: 20 µm.

Figure 10 with 1 supplement
OLIG2 locus-specific massively parallel reporter assay (LS-MPRA) to identify cis‐regulatory module (CRM) candidates in chick embryos.

(A) OLIG2 LS-MPRA barcode enrichment plot following electroporation into E5 chick retinal explants, and E2 spinal cords and E4 spinal cords in ovo, aligned with a genome browser track and annotated with (i) coverage of the barcode-fragment association library, (ii) log₂ base conservation across 77 vertebrate species, (iii) gene models, and (iv) CRM candidate regions 1–3 (blue boxes). (B) Representative transverse sections of E5 chick retinas incubated for 24 hr in vitro, showing localization of chick OLIG2 RNA with GFP RNA only (magenta arrow) or both intrinsic GFP fluorescence and GFP RNA (orange arrows) driven by plasmids containing one of the CRM1-3 regions. (C) Pie charts showing the percentage of OLIG2 RNA+ cells expressing GFP (column 1), GFP RNA+ cells expressing OLIG2 RNA (column 2), GFP protein+ cells expressing OLIG2 RNA (column 3), and OLIG2 RNA- cells expressing GFP (column 4) in retinas electroporated at E5 with plasmids containing CRM1-3 regions driving GFP and visualized by FISH. Scale bar: 20 µm.

Figure 10—figure supplement 1
Activity of OLIG2 cis‐regulatory modules (CRMs) in OLIG2+ cells within embryonic chick spinal cords.

(A) Schematic of an E2 chick transverse spinal cord with the field of view (red outline) shown in B. (B) Representative transverse hemi-sections of ventral E2 chick spinal cords incubated for 24 hr in ovo, showing (i) OLIG2 RNA localized with GFP RNA (orange arrow), (ii) OLIG2 expression in GFP- cells (yellow arrows), and (iii) GFP RNA expression in OLIG2- cells (magenta arrows). GFP expression is driven by plasmids containing one of the CRM candidate regions (CRM1-3). Scale bar: 20 µm.

Activity of Olig2 cis‐regulatory module (CRM) candidates in postnatal retinal cells expressing Olig2.

(A) Representative transverse sections of postnatal retinas electroporated at P0 and incubated in vivo for 24 hr, stained with antibodies against Olig2 and GFP to identify co-localized expression (orange arrows). GFP expression was driven by plasmids containing one or more Notch-inhibitor responsive regions (NR1-3). (B) Analysis of GFP and Olig2 co-localization, shown with pie charts depicting the percent of Olig2+ electroporated cells expressing GFP (column 1), the percent of GFP+ cells expressing Olig2 (column 2), and the percent of Olig2- electroporated cells expressing GFP in retinas electroporated at P0 with plasmids containing one or more Notch-inhibitor responsive regions driving GFP. Scale bar: 20 µm.

Tables

Key resources table
Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional information
Strain, strain background (Mus musculus)CD-1Charles River LaboratoriesRRID:IMSR_CRL:022
Strain, strain background (Gallus gallus)SPF EggsAVS BioMaterial Number: 10100330
Strain, strain background (Escherichia coli)DH10β CellsThermo Fishercat. #18290015Electrocompetent cells
Gene (Mus musculus)RhoNCBI GenBankNM_145383.2
Recombinant DNA reagentRho BAC cloneBACPAC ResourcesRP23-219M6
Gene (Mus musculus)Grm6NCBI GenBankNM_173372.2
Recombinant DNA reagentGrm6 BAC cloneBACPAC ResourcesRP23-417M10
Gene (Mus musculus)Vsx2NCBI GenBankNM_001301427.1
Recombinant DNA reagentVsx2 BAC cloneBACPAC ResourcesRP23-127O21
Gene (Mus musculus)Cabp5NCBI GenBankNM_013877.4
Recombinant DNA reagentCabp5 BAC cloneBACPAC ResourcesRP24-125H22
Gene (Mus musculus)Olig2NCBI GenBankNM_016967.2
Recombinant DNA reagentOlig2 BAC cloneBACPAC ResourcesCH29-613
Gene (Mus musculus)Ngn2NCBI GenBankNM_009718.4Formal gene name: Neurog2
Recombinant DNA reagentNgn2 BAC cloneBACPAC ResourcesRP23-182M12
Gene (Gal gallus)OLIG2NCBI GenBankNM_001031526.1
Recombinant DNA reagentOLIG2 BAC cloneBACPAC ResourcesCH261-60J3
Recombinant DNA reagentStagintbc7This paperDeposited to Addgene; used to create LS-MPRA libraries
Recombinant DNA reagentStatadual-WPREThis paperDeposited to Addgene; used to create d-MPRA libraries
Chemical compound, drugFast Green FCFMillipore Sigmacat. #F7252
Chemical compound, drugL-glutamineSigma Aldrichcat. #G3126
Chemical compound, drugPenicillin/ streptomycinInvitrogen/Gibcocat. #15140–122
Chemical compound, drugMinimum Essential MediumMillipore Sigmacat. #51,412 C
Chemical compound, drugHBSSThermo Fisher Scientificcat. #14-025-092
Chemical compound, drughorse serumThermo Fisher Scientificcat. #26050–088
Chemical compound, drugHEPESInvitrogen/Gibcocat. #15630–080
Chemical compound, drugbuprenorphinePAR PharmaCustom formulation
Chemical compound, drugproparacaine hydrochlorideBausch & Lombcat. #24208-730-06
Chemical compound, drugLY411575Millipore SigmaSML0506γ-secretase inhibitor
Chemical compound, drugTRI ReagentMillipore Sigmacat. #T9424
Commercial assay or kitZymo Directzol Microprep KitZymo Researchcat. #R2061
Commercial assay or kitDynabeads mRNA Purification KitThermo Fisher Scientificcat. #61006
Chemical compound, drugpapain solutionWorthington Biochemicalcat. #LS003126
Commercial assay or kitHCR Buffers [v3.0]Molecular Instrumentsv3.0 reagents (discontinued)
Chemical compound, drugdonkey serumJackson ImmunoResearchcat. #017-000-121
AntibodyPax6 (mouse monoclonal)Developmental Studies Hybridoma BankRRID:AB_528427CUT&RUN primary antibody; 500 ng per sample
AntibodyFoxn4 (mouse monoclonal)Santa Cruz Biotechnologiescat# sc-377166CUT&RUN primary antibody; 500 ng per sample
AntibodyMybl1 (rabbit polyclonal)Millipore-SigmaRRID:AB_1078540; cat# HPA008791CUT&RUN primary antibody; 500 ng per sample
AntibodyOlig2 (mouse monoclonal)MilliporeRRID:AB_10807410; Cat# MABN50Primary antibody for IHC; dilution 1:500.
AntibodyGFP (chicken polyclonal)AbcamRRID:AB_300798; cat# ab13970Primary antibody for IHC; dilution 1:1000.
OtherDAPIInvitrogenRRID:AB_2629482; cat# D1306Nuclear counterstain; used at 1:1000 in IHC.
Commercial assay or kitZymo DNA Clean & Concentrator KitZymo Researchcat. #D4013
Commercial assay or kitLarge construct DNA isolation kitQiagencat. #12462
Commercial assay or kitQubit dsDNA HS Assay KitThermo Fisher Scientificcats. #Q32851 and # Q32856
Commercial assay or kitNEBNext Ultra II FS DNA Module kitNEBcat. #E7810S
Chemical compound, drugT4 DNA ligase bufferNEBcat. #B0202S
Commercial assay or kitBlunt/TA Ligase Master MixNEBcat. #M0367
Chemical compound, drugNEBNext Ultra II Q5 Master MixNEBcat. #M0544
Chemical compound, drugAgarose Dissolving BufferZymo Researchcat. #D4001-1-100
Commercial assay or kitPlasmid Maxi kitQiagencat. #12163
Commercial assay or kitGeneMorph II Random Mutagenesis KitAgilentcat. #200550
Chemical compound, drugBst 2.0 WarmStart PolymeraseNEBcat. #M0538
Chemical compound, drugTaq DNA polymeraseNEBcat. #M0273
Commercial assay or kitCUTANA ChIC/CUT&RUN KitEpiCyphercat# 14–1048
Commercial assay or kitNEBNext UltraII DNA Library Prep KitNEBcat. #E7103
Commercial assay or kitNEBNext Multiplex Oligos for Illumina Dual Index SetsNEBcats. #E7600S or #E7780S
Commercial assay or kitQuick Ligation reactionNEBcat. #M2200
Chemical compound, drugExonucleaseLucigencat. #E3101K
Commercial assay or kitProtoscript II KitNEBcat. #E6560
Commercial Assay or kitZymo Oligo Clean & Concentrator KitZymo Researchcat. #D4060
Commercial assay or kitLunaScript RT SuperMix KitNEBcat. #E3010
Software/algorithmIntegrative Genomics ViewerPMID:21221095RRID:SCR_011793
Software/algorithmPrismGraphPadRRID:SCR_002798
Software/algorithmFIJIImageJRRID:SCR_002285
Software/algorithmRThe R Foundation for Statistical ComputingRRID:SCR_001905

Additional files

Supplementary file 1

PCR primers, genomic coordinates, and cell type specificity quantification.

(a) PCR primers used in all experiments. (b) Genomic coordinates for all regions of interest. (c–g) Quantification of EGFP expression from electroporated plasmid constructs and colocalization with cell-type-specific gene expression.

https://cdn.elifesciences.org/articles/107565/elife-107565-supp1-v1.docx
MDAR checklist
https://cdn.elifesciences.org/articles/107565/elife-107565-mdarchecklist1-v1.pdf
Source data 1

LS-MPRA library characteristics and supplemental QC plots.

https://cdn.elifesciences.org/articles/107565/elife-107565-data1-v1.pdf
Source data 2

D-MPRA library supplemental QC plots.

https://cdn.elifesciences.org/articles/107565/elife-107565-data2-v1.pdf

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Alastair J Tulloch
  2. Ryan Nicholas Delgado
  3. Rinaldo Catta-Preta
  4. Constance L Cepko
(2026)
Massively parallel reporter assay for mapping gene-specific regulatory regions at single-nucleotide resolution
eLife 14:RP107565.
https://doi.org/10.7554/eLife.107565.3