Structural insights into the assembly and polyA signal recognition mechanism of the human CPSF complex

  1. Marcello Clerici
  2. Marco Faini
  3. Ruedi Aebersold
  4. Martin Jinek  Is a corresponding author
  1. University of Zurich, Switzerland
  2. Institute of Molecular Systems Biology, ETH Zurich, Switzerland
6 figures, 2 tables and 2 additional files

Figures

Figure 1 with 1 supplement
Characterization of the CPSF160-WDR33M1-K410-CPSF30-Fip1 complex.

(A) Equilibrium binding of CPSF160FL-WDR33M1-K410-CPSF30FL -Fip1FL to an Atto532-labelled, 16-nucleotide RNA containing the PAS hexanucleotide (AAUAAA), measured by fluorescence polarization. The polarization amplitude is normalized to 1. Error bars indicate standard error of means (SEM) for five consecutive measurements of a single representative sample. Each binding experiment was performed in triplicate and the mean KDvalues ± SEM are reported in Table 2. (B) Inter-subunit cross-link map of the CPSF160FL-WDR33M1-G474-CPSF30FL-Fip1FL complex. Observed inter-molecular cross-links between CPSF160FL, WDR33M1-G474 and CPSF30FL (upper panel) and between FipFL and the other three CPSF subunits CPSF160FL, WDR33M1-G474 and CPSF30FL (lower panel) are represented as dashed lines. A list of the cross-linked peptides identified by mass spectrometry is reported in Figure 1—source data 1. Proteins are indicated as schematic diagrams (not to scale) with featured domains highlighted in color. (CD, Fip1 conserved domain; ZF1-5, CPSF30 zinc finger domains 1 to 5; ZK, CPSF30 zinc knuckle domain; CTD, CPSF160 C-terminal domain; bP1-3, CPSF160 beta-propeller domains 1 to 3; NTD, WDR33 N-terminal domain. CPSF160: Uniprot Q10570; WDR33: Uniprot Q9C0J8-1; CPSF30: Uniprot O95639-3; Fip1: Uniprot Q6UN15-4).

https://doi.org/10.7554/eLife.33111.002
Figure 1—source data 1

Table of cross-linking MS identification.

The table contains a list of the unique and validated cross-linked sites obtained by the xQuest identification search. The ‘id’ column contains the sequence of the two cross-linked peptides followed by the position of their respective cross-linked lysine residue. The ‘Protein1’ and ‘Protein2’ columns list the names of the proteins corresponding to the first and second peptide, respectively. The ‘AbsPos1’ and ‘AbsPos2’ columns indicate the position of the cross-linked residue in the respective proteins according to the Uniprot numbering. The ‘ld.Score’ is a measurement of the quality of the match between a candidate MS2 spectrum and the theoretical spectrum of a query cross-linked peptide pair. The ‘FDR’ is the false discovery rate at the cross-link identification level expressed as a fraction of 100%.

https://doi.org/10.7554/eLife.33111.004
Figure 1—figure supplement 1
Reconstitution of the CPSF core module and intra-subunit cross-links.

(A) CPSF160, WDR33, CPSF30 and Fip1 are indicated as schematic diagrams (not to scale) with featured domains highlighted in color. Constructs used in this study are reported as grey lines below the corresponding protein scheme and domain boundaries are indicated (CD = Fip1 conserved domain; ZF1-5, CPSF30 zinc finger domains 1 to 5; ZK, CPSF30 zinc knuckle domain; CTD, CPSF160 C-terminal domain; bP1-3, CPSF160 beta-propeller domains 1 to 3; NTD, WDR33 N-terminal domain). Uniprot accession numbers are indicated in Figure 1 legend. (B) Size exclusion chromatography elution profile (left panel) and corresponding SDS-PAGE analysis (right panel) of the CPSF160FL-WDR33M1-G474-CPSF30FL-Fip1FL CPSF core module. The indicated elution volume (10.08 ml) corresponds to the complex elution peak. The pooled fractions, as indicated with a light blue box, were concentrated and loaded on SDS-PAGE for analysis. (C) Intra-subunit cross-link map of the CPSF160FL-WDR33M1-G474-CPSF30FL-Fip1FL complex. Observed intra-molecular cross-links are represented as dashed lines. A list of the cross-linked peptides identified by mass spectrometry is reported in Figure 1—source data 1.

https://doi.org/10.7554/eLife.33111.003
Figure 2 with 2 supplements
Crystal structure of the CPSF160-WDR33 complex.

(A) Cartoon representation of the overall structure of the CPSF160FL-WDR33Q35-K410 complex. (B) Cartoon representation of the isolated WDR33 WD40 (light orange) and NTD (dark orange) domains. The N and C-termini are indicated as dashed lines. (C) CPSF160 beta-propeller domains bP1 and bP3 form a deep cavity that binds the WDR33 N-terminal domain (NTD). A vertical cross-section through the CPSF160 propeller domains (represented as surface) is shown. WDR33 is represented as cartoon.

https://doi.org/10.7554/eLife.33111.005
Figure 2—figure supplement 1
The CPSF160-WDR33 heterodimer resembles the DDB1-DDB2 complex.

(A) Structural superposition of CPSF160 with two different conformations of DDB1, differing in the position of beta-propeller bP2. In the first conformation (PDB 2B5M), DDB1 bP2 propeller is oriented as in CPSF160 (left panel), whereas in the second conformation (PDB 2B5L) DDB1 bP2 is rotated by ~90 degrees relative to CPSF160 bP2. In both DDB1 conformations, the position of the bP1 and bP3 domains does not vary and superimposes to the respective domains of CPSF160. (B) Superposition of CPSF160 and DDB1 within the CPSF160-WDR33 and DDB1-DDB2 (PDB 3EI3) complexes. (C) The N-terminal helices of WDR33 and DDB2 show an almost perfect overlap even though their respective propeller domains are shifted by ~10 Å. WDR33 and DDB2 are oriented as in (B).

https://doi.org/10.7554/eLife.33111.006
Figure 2—figure supplement 2
WDR33 is buried within the cavity formed by CPSF160 bP1 and bP3 domains.

(A) The CPSF160 elongated loops el1, el2 and el3, projecting from the bP1 and bP3 propellers, participate in locking WDR33 in place. el1, el2 and el3 are colored in red. (B) Surface representation of CPSF160, showing that WDR33 NTD is completely buried within CPSF160 and excluded from the solvent. WDR33 is represented as cartoon.

https://doi.org/10.7554/eLife.33111.007
Figure 3 with 2 supplements
Molecular topology of the CPSF complex.

(A) CPSF30 ZF1 is necessary and sufficient for CPSF160FL-WDR33M1-K410 binding. Pull-down assay of GFP-labeled CPSF30 constructs interacting with CPSF160FL-WDR33M1-K410. Input and bound proteins were analyzed on a 4–12% gradient SDS-PAGE and visualized by Coomassie staining (upper panel) and by in-gel GFP fluorescence (middle and lower panels). The asterisk indicates free GFP present in the input lysate. (B) Fip1 is tethered to the CPSF complex by its short conserved domain (Fip1CD). Pull-down assay of GFP-labeled Fip1FL and Fip1CD interacting with the CPSF160FL-WDR33M1-K410-CPSF30FL complex. Input and bound proteins were analyzed on a 4–12% gradient SDS-PAGE and visualized by Coomassie staining (upper panel) and by in-gel GFP fluorescence (middle and lower panels). The asterisk indicates free GFP present in the input lysate. (C) CPSF30 zinc finger domains ZF4 and ZF5 are necessary and sufficient to mediate the interaction with Fip1. Pull-down assay of GFP-labeled Fip1CD interacting with CPSF160FL-WDR33M1-K410-CPSF30. Input and bound proteins were analyzed on a 4–12% gradient SDS-PAGE and visualized by Coomassie staining (upper panel) and by in-gel GFP fluorescence (middle and lower panels). In lane 6, His6-(StrepII)2-CPSF30ZF4-5 was used for the precipitation of GFP-Fip1CD, in the absence of WDR33M1-K410 and CPSF160FL.

https://doi.org/10.7554/eLife.33111.009
Figure 3—figure supplement 1
The conserved middle domain of Fip1 (Fip1CD) interacts with CPSF30 zinc finger domains ZF4 and ZF5.

(A) Fip1CD is highly conserved from human to yeast and parasites (Ciona and Encephalitozoon). Sequence alignment of the conserved central region of Fip1 from different organisms with numbering corresponding to the human sequence. The boundaries of the construct Fip1CD are shown as black box. The alignment was generated using Clustal Omega (Sievers et al., 2011) and displayed using ESPript (Robert and Gouet, 2014). (B) SDS-PAGE analysis of input and pull-down experiments performed using GST-Fip1CD immobilized on glutathione beads and incubated with MBP-tagged CPSF30 constructs.

https://doi.org/10.7554/eLife.33111.010
Figure 3—figure supplement 2
Mapping of CPSF30 and Fip1 cross-links onto the structure of CPSF160-WDR33.

CPSF160 and WDR33 lysine residues involved in cross-links with Fip1 and CPSF30 are represented in red and green, respectively. CPSF30 and Fip1 are represented as schematic diagrams with featured domains highlighted in color. Dashed lines indicate cross-links.

https://doi.org/10.7554/eLife.33111.011
Figure 4 with 1 supplement
CPSF30 ZF3 domain and WDR33 N-terminal motif are principal determinants of AAUAAA motif.

Equilibrium binding measured by fluorescence polarization of CPSF complex variants containing different CPSF30 constructs (A) and WDR33M1-K410 mutants (B) to an Atto532-labelled 16-nucleotide RNA containing the PAS hexamer (AAUAAA). The polarization amplitude is normalized to 1. Error bars indicate standard error of means (SEM) for five consecutive measurements of the same sample. Error bars indicate standard error of means (SEM) for five consecutive measurements of a single representative sample. Each binding experiment was performed in triplicate and the mean KDvalues ± SEM are reported in Table 2.

https://doi.org/10.7554/eLife.33111.012
Figure 4—figure supplement 1
Quantification of AAUAAA motif binding by CPSF complexes.

(A) Equilibrium binding of CPSF complex variants featuring different CPSF30 constructs and WDR33M1-K410 mutants to an Atto532-labelled 16-nucleotide RNA containing the wild-type (AAUAAA) or mutated (AAGAAA) PAS hexamer measured by fluorescence polarization. The polarization amplitude is normalized to 1. Error bars indicate standard error of means (SEM) for five consecutive measurements of a single representative sample. Each binding experiment was performed in triplicate and the mean KDvalues ± SEM are reported in Table 2. (B) Sequence alignment of the N-terminal region of WDR33 from different organisms with numbering corresponding to the human sequence. The conserved lysine/arginine-rich motif located N-terminally of WDR33 NTD is highlighted as a black box. Secondary structure elements are indicated above the sequence. The alignment was generated using Clustal Omega (Sievers et al., 2011) and displayed using ESPript (Robert and Gouet, 2014) (C) Size exclusion chromatography elution profile (left panel) and SDS-PAGE analysis (right panel) of the CPSF160FL-WDR33M1-K410-CPSF30ZF1-3 complex containing wild-type (wt) and mutant (K46A/R47A and R49A/K50A) WDR33M1-K410.

https://doi.org/10.7554/eLife.33111.013
Schematic model of the mammalian core CPSF complex bound to the PAS hexamer motif.

CPSF160 and WDR33 form heterodimer acting as a structural scaffold of the core CPSF complex. WDR33 interacts with CPSF160 through its NTD domain which is inserted in deep cavity formed by CPSF160 N- and C-terminal beta-propeller domains (bP1 and bP3). CPSF30 connects CPSF160-WDR33 to Fip1, interacting with the former via its N-terminus (including the zinc finger domain ZF1) and with the latter via its C-terminal zinc finger domains ZF4-5. The hexanucleotide AAUAAA sequence is bound by CPSF30 ZF2-3 and a positively charged lysine/arginine-rich motif located N-terminally of the NTD domain in WDR33. The latter is also a lysine cross-linking hot spot (indicated by a yellow star) with CPSF30 and CPSF160. The dark dashed lines indicate the CPSF30 and CPSF160 domains contacted by the cross-linking hotspot.

https://doi.org/10.7554/eLife.33111.015
Author response image 1
ld-score distribution of cross-link sites.

The 153 cross-linking sites are depicted in green bins whereas the cross-links that are violating the distance threshold of 35 Å are colored in red bins. The cross-links that are violating the distance threshold have generally poorer scores.

https://doi.org/10.7554/eLife.33111.019

Tables

Table 1
Data collection and refinement statistics.
https://doi.org/10.7554/eLife.33111.008
CPSF160-WDR33
DatasetNativeSulfur SADTa6Br12 SAD
X-ray sourceSLS X06DA (PXIII)SLS X06DA (PXIII)SLS X06DA (PXIII)
Space groupP1P1P1
Cell dimensions
a, b, c (Å)67.91 77.40 104.0267.88 77.58 104.1467.52 76.79 104.02
α, β, γ (o)87.56 76.41 67.0087.39 76.60 66.7687.36 76.72 66.30
Wavelength (Å)1.00002.07331.2548
Resolution (Å)*47.27–2.50 (2.59–2.50)44.25–3.00 (3.11–3.00)47.26–3.60 (3.73–3.60)
Rmerge*0.090 (0.773)0.211 (2.934)0.140 (0.669)
CC1/2*0.999 (0.834)0.999 (0.847)0.998 (0.926)
I/σI*18.3 (2.5)29.3 (2.5)20.6 (4.6)
Observations*488656 (34882)2456831 (165849)300048 (29753)
Unique reflections*65410 (6519)37770 (3688)21497 (2126)
Multiplicity*7.5 (5.4)65.0 (45.0)14.0 (14.0)
Completeness (%)*100.0 (100.0)100.0 (96.9)99.9 (99.6)
Refinement
Resolution (Å)47.27–2.50
No. reflections65395 (6517)
Rwork/Rfree0.228/0.263
No. atoms
Protein12162
Water102
B-factors
mean69.35
Protein69.52
Water49.03
R.m.s. deviations
Bond lengths (Å)0.002
Bond angles (o)0.56
Ramachandran plot
% favored94.6
% allowed5.4
% outliers0.0
Table 2
Equilibrium dissociation constants of CPSF variants binding to wild-type or mutated PAS hexamer RNA.

The equilibrium dissociation constants were determined using a fluorescence polarization binding assay, as shown in Figures 1A and 4A,B and Figure 4—figure supplement 1A. The values reported represent mean ±SEM of three independent measurements. For measurements denoted n.m. (not measurable), the KD was above the range measurable by the assay (>>1 μM).

https://doi.org/10.7554/eLife.33111.014
CPSF variantKD AAUAAA RNAKD AAGAAA RNA
CPSF160FL-WDR33M1-K410-CPSF30FL/Fip1FL0.65 ± 0.90 nM120 ± 23 nM
CPSF160FL-WDR33M1-K410-CPSF30ZF1-5/Fip1CD0.11 ± 0.03 nM70 ± 6 nM
CPSF160FL-WDR33M1-K410-CPSF30ZF1-3<0.1 nM43 ± 2 nM
CPSF160FL-WDR33M1-K410-CPSF30ZF1-2>200 nMn.m.
CPSF160FL-WDR33M1-K410-CPSF30ZF1>200 nMn.m.
CPSF160FL-WDR33M1-K410>200 nMn.m.
CPSF160 FL-WDR33M1-K410(K46A-R47A)-CPSF30ZF1-37.78 ± 0.78 nM>200 nM
CPSF160 FL-WDR33M1-K410(R49A-K50A)-CPSF30ZF1-32.42 ± 0.66 nM>500 nM

Additional files

Supplementary file 1

The table contains a list of the protein constructs expressed in Sf9 insect cells, their respective expression tags and the vector in which the constructs were cloned.

The table also specifies which protein constructs were combined in a single vector using the MacroBac system (see Materials and methods), and whether a single baculovirus was used for protein expression or a combination of two baculoviruses.

https://doi.org/10.7554/eLife.33111.016
Transparent reporting form
https://doi.org/10.7554/eLife.33111.017

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Marcello Clerici
  2. Marco Faini
  3. Ruedi Aebersold
  4. Martin Jinek
(2017)
Structural insights into the assembly and polyA signal recognition mechanism of the human CPSF complex
eLife 6:e33111.
https://doi.org/10.7554/eLife.33111