Comprehensive mutagenesis library design and production assay. (A) Organization of the AAV2 genome and Rep protein domains. Top: single-stranded DNA genome, middle: RNA transcripts, bottom: Rep proteins. Dotted lines indicate mutated regions. (B) Density plot of barcode counts in the pCMV-Rep78/68 plasmid library. (C) Overview of production assay for the pCMV-Rep78/68 library and calculation of wild-type normalized production fitness values (s’). (D) Amino acid level production fitness values from replicate transfections of the pCMV-Rep78/68 library. Pearson R correlation coefficient calculated after log transformation. (E) Density plot of production fitness values for wild-type (black) and premature stop codon (blue) controls for the pCMV-Rep78/68 library. *p < 10-20 (Mann Whitney U test)

Effects of all single amino acid substitutions and deletions in the Rep78/68 proteins on AAV2 production. Amino acid level production fitness values from the pCMV-Rep78/68 production assay were calculated as in Figure 1C by summing barcode counts for synonymous mutations. Rectangles are colored by mutational effect on the production of genome-containing particles, with black indicating deleterious mutations and red indicating beneficial mutations. Colored bars above the heatmaps indicate protein domains. Black: origin-binding domain, light blue: helicase domain, gray: nuclear localization signal, and navy blue: zinc -finger domain. Black dots indicate wild-type amino acid identity.

Beneficial substitutions cluster in DNA-interacting residues. (A) Average production fitness values for all substitutions at each position mapped onto the structure of the origin-binding domain in complex with the AAVS1 Rep binding site (PDB 4ZQ9). (B) Close up view of origin-binding domain-Rep binding site interactions. Residues where substitutions to positively charged residues are beneficial are shown as sticks. (C) Average production fitness values mapped onto the structure of the origin-binding domain in complex with the single stranded inverted terminal repeat hairpin (PDB 6XB8). DNA interacting residues are shown as sticks. Residues are colored by mutability, with red indicating higher mutability and black indicating lower mutability.

Comparison of comprehensive mutagenesis measurements to variation in nature. (A) Distribution of wild-type normalized production fitness values for conserved variants (blue) and variants found only in the library (gray). (B) Total number of variants and number of conserved variants with s’ greater than wild-type (s’ > 1). *p < 10-20 (Mann Whitney U test)

Validation of AAV2 library production assay results. (A) DNase-resistant particle titers for fourteen single amino acid pCMV-Rep78/68-inverted terminal repeat variants produced individually. Titers for previously characterized variants are plotted to the left of the dotted line. (B) Relationship between normalized production fitness values from library experiments and DNase-resistant particle titers from individual transfections. (C) DNase-resistant particle titers for four rAAV genomes produced with the indicated Rep variants. Titers for previously characterized variants are plotted to the left of the dotted line. (D) Relationship between rAAV DNase-resistant particle titers and normalized production fitness values from library experiments. (E) Expression of Rep and VP proteins from variant pRepCap plasmids by Western blot. For panels A and C, asterisks indicate significance of titer differences between the Rep variant and the relevant wild-type control. *p < 0.05, **p < 0.01, ***p < 0.001 (Welch’s t-test)

Mutations to AAV2 rep have similar effects on AAV2, AAV5, and AAV9 capsid production. Wild-type normalized production fitness values from the library production assay with (A) AAV5 and AAV2 capsids and (B) AAV9 and AAV2 capsids. Pearson R correlation coefficient calculated after log transformation.

Comprehensive mutagenesis library design and production assay results for WT AAV2 format library. (A) Density plot of barcode counts in the WT AAV2 plasmid library. (B) Amino acid level production fitness values from replicate transfections for the WT AAV2 library. Pearson R correlation coefficient calculated after log transformation. (C) Density plot of wild-type (black) and premature stop codon (gray) controls for the WT AAV2 library. *p < 10-20 (Mann Whitney U test)

Percent of expected variants sequenced in plasmid and viral libraries.

Effects of all single amino acid substitutions and deletions in Rep78, Rep68, Rep52, and Rep40 on AAV2 production. Amino-acid level production fitness values from the WT AAV2 library production assay were calculated as in Figure 1C by summing barcode counts for synonymous variants. Rectangles are colored by mutational effect on the production of genome-containing particles, with black indicating deleterious mutations and red indicating beneficial mutations. Colored bars above the heatmaps indicate protein domains. Black: origin-binding domain, light blue: helicase domain, gray: nuclear localization signal, and navy blue: zinc-finger domain. Black dots indicate wild-type amino acid identity.

Mutations to AAV2 rep have similar effects on AAV2 production in pCMV-Rep78/68 and WT AAV2 format libraries. (A) Production fitness values for library production assay with pCMV-Rep78/68 and WT AAV2 format libraries, Pearson R correlation coefficient calculated after log transformation, (B) DNase-resistant particle titers for AAV2 capsids produced with pCMV-Rep78/68-inverted terminal repeat or p5-RepCap-inverted terminal repeat plasmids containing the indicated Rep substitutions. *p < 0.05, **p < 0.01 (Welch’s t test)

Comparison of Rep variant and wild-type production fitness values. For each amino acid variant, mean production fitness values were calculated by averaging the production fitness values for all uniquely barcoded variants coding for the relevant amino acid change. The mean production fitness value for each amino acid variant was compared to the mean production fitness value for the wild-type controls using a t-test. (A) Data for the WT AAV2 library. (B) Data for the pCMV-Rep78/68 library. The solid black line indicates the significance threshold after Bonferroni correction. Data points are colored according to their position in Rep where “none” refers to variants that fall outside of an annotated protein domain.

Average production fitness values from the pCMV-Rep78/68 library production assay mapped onto (A) the structure of the origin-binding domain active site (PDB 5DCX) and (B) the structure of the helicase domain (PDB 1S9H). H90, H92, and the active site nucleophile, Y156, are shown as sticks in (A). Residues are colored by mutability, with red indicating higher mutability and black indicating lower mutability.

Codon level production fitness values for the pCMV-Rep78/68 format library. Codon level production fitness values were calculated as in Figure 1C by summing counts for all barcodes corresponding to a given codon variant. Rectangles are colored by mutational effect on the production of genome-containing particles, with black indicating deleterious mutations and red indicating beneficial mutations. Black dots indicate wild-type codon identity.

Codon level production fitness values for the WT AAV2 format library. Codon level production fitness values were calculated as in Figure 1C by summing counts for all barcodes corresponding to a given codon variant. Rectangles are colored by mutational effect on the production of genome-containing particles, with black indicating deleterious mutations and red indicating beneficial mutations. Black dots indicate wild-type codon identity.

The distribution of production fitness values is narrower for synonymous variants than for nonsynonymous variants. (A) Positional mean centered fitness values for all codon variants in the WT AAV2 library and (B) Synonymous codon mean centered fitness values for all codon variants in the WT AAV2 library. For each codon variant, positional mean centered fitness values were calculated by normalizing to the average selection value of all codon variants at that position. Synonymous mean centered fitness values were calculated by normalizing the fitness values of each codon variant to the average fitness value of all its synonymous codons.

Inclusion of synonymous codon variants enables interrogation of nucleotide-level effects. Analysis of pCMV-Rep78/68 variants with premature stop codons in +1 (A) and +2 (B) reading frames does not identify any frameshifted open reading frames. Pink dots: average production fitness for mutations that introduce stop codons into +1 or +2 frame, black dots: average production fitness for mutations synonymous to +1 or +2 stop codon mutations in Rep open reading frame, lines indicate 10 bp rolling averages. (C) Average production fitness at each nucleotide position for variants that introduce the indicated nucleotide change in the pCMV-Rep78/68 library, orange: T, blue, G, green: C, red: A. (D) Average production fitness for variants that introduce the indicated nucleotide change in the WT AAV2 library.

Effect of single amino acid Rep substitutions on the viral genome titer, physical particle titer, and relative transduction efficiency of affinity purified rAAV2. (A) Capsid (gray) and viral genome (black) titers for pAAV-EF1a-FLuc-WPRE-HGHpA rAAV2 produced with the indicated Rep variant and affinity purified with AVB Sepharose. For each variant, samples A and B represent replicate transfections and affinity purifications. (B) Relative transduction efficiency of purified rAAV preps as measured by luciferase signal. Asterisks indicate significance of difference between relevant prep and WT A prep. *p < 0.05, **p < 0.01, ***p < 0.001 (Welch’s t-test)

Viral genome and physical particle titers for rAAV2 produced with Rep variants

Effects of all single amino acid substitutions and deletions in the Rep78/68 proteins on AAV5 capsid production. Amino acid level production fitness values from the pCMV-Rep78/68 production assay were calculated as in Figure 1C by summing barcode counts for synonymous mutations. Rectangles are colored by mutational effect on the production of genome-containing particles, with black indicating deleterious mutations and red indicating beneficial mutations. Colored bars above the heatmaps indicate protein domains. Black: origin-binding domain, light blue: helicase domain, gray: nuclear localization sig nal, and navy blue: zinc-finger domain. Black dots indicate wild-type amino acid identity.

Effects of all single amino acid substitutions and deletions in the Rep78/68 proteins on AAV9 capsid production. Amino acid level production fitness values from the pCMV-Rep78/68 production assay were calculated as in Figure 1C by summing barcode counts for synonymous mutations. Rectangles are colored by mutational effect on the production of genome-containing particles, with black indicating deleterious mutations and red indicating beneficial mutations. Colored bars above the heatmaps indicate protein domains. Black: origin-binding domain, light blue: helicase domain, gray: nuclear localization signal, and navy blue: zinc-finger domain. Black dots indicate wild-type amino acid identity.

Codon level production fitness values for the AAV5 capsid production assay. Codon level production fitness values were calculated as in Figure 1C by summing counts for all barcodes corresponding to a given codon variant. Rectangles are colored by mutational effect on the production of genome-containing particles, with black indicating deleterious mutations and red indicating beneficial mutations. Black dots indicate wild-type codon identity.

Codon level production fitness values for the AAV9 capsid production assay. Codon level production fitness values were calculated as in Figure 1C by summing counts for all barcodes corresponding to a given codon variant. Rectangles are colored by mutational effect on the production of genome-containing particles, with black indicating deleterious mutations and red indicating beneficial mutations. Black dots indicate wild-type codon identity.

DNase-resistant particle titers for AAV2, AAV5, and AAV9 capsids produced individually with the indicated Rep variants.