Research Article

Genetics and Genomics

Multiplexed measurement of variant abundance and activity reveals VKOR topology, active site and human variant impact

Department of Genome Sciences, University of Washington, United States
Department of Systems Biology, Harvard Medical School, United States
Donnelly Centre and Departments of Molecular Genetics and Computer Science, University of Toronto, and Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Canada
Color Genomics, United States
Department of Medicinal Chemistry, University of Washington, United States
Department of Bioengineering, University of Washington, United States

Sep 1, 2020

Open access
Copyright information

Abstract
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Vitamin K epoxide reductase (VKOR) drives the vitamin K cycle, activating vitamin K-dependent blood clotting factors. VKOR is also the target of the widely used anticoagulant drug, warfarin. Despite VKOR’s pivotal role in coagulation, its structure and active site remain poorly understood. In addition, VKOR variants can cause vitamin K-dependent clotting factor deficiency or alter warfarin response. Here, we used multiplexed, sequencing-based assays to measure the effects of 2,695 VKOR missense variants on abundance and 697 variants on activity in cultured human cells. The large-scale functional data, along with an evolutionary coupling analysis, supports a four transmembrane domain topology, with variants in transmembrane domains exhibiting strongly deleterious effects on abundance and activity. Functionally constrained regions of the protein define the active site, and we find that, of four conserved cysteines putatively critical for function, only three are absolutely required. Finally, 25% of human VKOR missense variants show reduced abundance or activity, possibly conferring warfarin sensitivity or causing disease.

Introduction

The enzyme vitamin K epoxide reductase (VKOR) drives the vitamin K cycle, which activates blood coagulation factors. VKOR, an endoplasmic reticulum (ER) localized transmembrane protein encoded by the gene VKORC1, reduces vitamin K quinone and vitamin K epoxide to vitamin K hydroquinone (Li et al., 2004; Rost et al., 2004). Vitamin K hydroquinone is required to enable gamma-glutamyl carboxylase (GGCX) to carboxylate Gla domains on vitamin K-dependent blood clotting factors. VKOR is inhibited by the anticoagulant drug warfarin (Czogalla et al., 2017; Zimmermann and Matschiner, 1974), and VKORC1 polymorphisms contribute to an estimated ~25% of warfarin dosing variability (Owen et al., 2010). For example, variation in VKORC1 noncoding and coding sequence can cause warfarin resistance (weekly warfarin dose >105 mg) or warfarin sensitivity (weekly warfarin dose <~10 mg) (Osinbowale et al., 2009; Yuan et al., 2005).

Though 15 million prescriptions are written for warfarin each year (https://www.clincalc.com), fundamental questions remain regarding its target, VKOR. For example, the structure of human VKOR is unsolved, though a bacterial homolog has been crystallized (Li et al., 2010). A homology model based on bacterial VKOR has four transmembrane domains, but the quality of the homology model is unclear, as human VKOR has only 12% sequence identity to bacterial VKOR. Moreover, experimental validation of VKOR topology yielded mixed results: similar biochemical assays suggested either three- or four- transmembrane- domain topologies (Schulman et al., 2010; Tie et al., 2012; Shen et al., 2017; Wu et al., 2018).

Topology informs basic aspects of VKOR function including where vitamin K and warfarin bind, so determining the correct topology and validating the homology model is critical. In particular, VKOR has four functionally important, absolutely conserved cysteines at positions 43, 51, 132, and 135, the orientation of which differs between the two proposed topologies. In the four transmembrane domain topology, all four cysteines are located on the ER lumenal side of the enzyme. In this topology, cysteines 43 and 51 are hypothesized to be ‘loop cysteines’ that pass electrons from an ER-anchored reductase, possibly transmembrane thioredoxin-related protein (Schulman et al., 2010), to the active site (Rishavy et al., 2011). However, in the three transmembrane domain topology, these cysteines are located in the cytoplasm and other pathways would be required to convey electrons to the redox center. Even for non-catalytic residues, topology plays an important role. For example, vitamin K presumably binds near the redox center, and topology dictates which residues make up the substrate binding site.

To understand the effect of human variants and to define the vitamin K and warfarin binding sites, VKOR variant activity has been extensively studied in cell-based assays (Czogalla et al., 2017; Shen et al., 2017; Tie et al., 2013). In addition to activity, VKOR protein abundance has also been studied because abundance is an important driver of disease and warfarin response. For example, VKOR R98W is a decreased- abundance variant that, in homozygous carriers, causes vitamin K-dependent clotting factor deficiency 2 (Rost et al., 2004). A 5’ UTR polymorphism reduces VKOR abundance and can be used to predict warfarin sensitivity (Gong et al., 2011). However, so far, the activity and abundance of only a handful of VKOR variants has been tested.

Here, we used multiplexed, sequencing-based assays (Gasperini et al., 2016) to measure the effects of 2,695 VKOR missense variants on abundance and 697 variants on activity. Our analysis of the large-scale functional data supports a four transmembrane domain topology, which an orthogonal evolutionary coupling analysis confirmed. Next, we identified distinct mutational tolerance groups, which are concordant with a four transmembrane homology model. Combining this homology model with variant abundance and activity effects, we identified an active site that contains the catalytic residues C132 and C135 and shares six positions with a previously proposed vitamin K binding site (Czogalla et al., 2017). We found that of four conserved cysteines putatively critical for function, only three are absolutely required, and analyzed the mutational signatures of two putative ER retention motifs. Human VKORC1 variants present in genetic databases and contributed by a commercial genetic testing laboratory were each classified based on abundance and activity. While most variants show wild type-like activity, 25% show low abundance or activity, which could confer warfarin sensitivity or cause disease in a homozygous context. Finally, we analyzed warfarin resistance variants and found that they span a range of abundances, indicating that increased abundance is an uncommon mechanism of warfarin resistance.

Results

Multiplexed measurement of VKOR variant abundance using VAMP-seq

To measure the abundance of VKOR variants, we applied Variant Abundance by Massively Parallel sequencing (VAMP-seq), an assay we recently developed (Matreyek et al., 2018). In VAMP-seq, a protein variant is fused to eGFP with a short amino acid linker. If the variant is stable and properly folded, then the eGFP fusion will not be degraded, and cells will have high eGFP fluorescence. In contrast, if the variant causes the protein to misfold, protein quality control machinery will detect and degrade the eGFP fusion, leading to a decrease in eGFP signal (Figure 1a). mCherry is also expressed from an internal ribosomal entry site (IRES) to control for expression. Differences in abundance are measured on a flow cytometer using the ratio of eGFP to mCherry signal. To determine whether VAMP-seq could be applied to VKOR, we fused eGFP to VKOR N- or C-terminally and found that both orientations had high eGFP signal (Figure 1—figure supplement 1). We compared N-terminally tagged wild type (WT) VKOR to R98W, a variant that ablates a putative ER retention motif and reduces abundance (Czogalla et al., 2014), and to TMD1Δ, a deletion of residues 10–30 which comprise the putative first transmembrane domain (TMD1; Figure 1b). Both reduced abundance variants exhibited much lower eGFP:mCherry ratios than WT, demonstrating that VAMP-seq could be applied to VKOR.

Figure 1 with 2 supplements see all

Download asset Open asset

Multiplexed measurement of VKOR variant abundance using VAMP-seq.

(a) To measure abundance, an eGFP reporter is fused to VKOR. eGFP-tagged WT VKOR is folded correctly, leading to high eGFP fluorescence. However, a destabilized variant is degraded by protein quality control machinery, leading to low eGFP fluorescence. (b) Flow cytometry is used to bin cells based on their eGFP:mCherry fluorescence intensity. Density plots of VKOR library expressing cells (grey, n = 12,109) relative to three controls: WT VKOR (red, n = 4,756), VKOR 98W (blue, n = 2,453), and VKOR TMD1Δ (orange, n = 2,204) are shown. Quartile bins for FACS of the library are marked. (c) Abundance score density plots of nonsense variants (dashed blue line, n = 88), synonymous variants (dashed red line, n = 127), and missense variants (filled, solid line, n = 2,695). The missense variant density is colored as a gradient between the lowest 10% of abundance scores (blue), the WT abundance score (white) and abundance scores above WT (red). (d) Heatmap showing abundance scores for each substitution at every position within VKOR. Heatmap color indicates abundance scores scaled as a gradient between the lowest 10% of abundance scores (blue), the WT abundance score (white), and abundance scores above WT (red). Grey bars indicate missing variants. Black dots indicate WT amino acids. (e) Number of substitutions scored at each position for abundance. (f) Scatterplot comparing VAMP-seq derived abundance scores to mean eGFP:mCherry (n = 1 replicate) ratios measured individually by flow cytometry. Variants were selected at random to span the abundance score range. Error bars show standard error for abundance scores and standard error for eGFP:mCherry ratio.

Figure 1—source data 1 VKOR variant abundance and activity scores. VKOR variant abundance and activity scores.: https://cdn.elifesciences.org/articles/58026/elife-58026-fig1-data1-v1.csv
Download elife-58026-fig1-data1-v1.csv
Figure 1—source data 2 Flow cytometry for monoclonal validation of variants. Flow cytometry for monoclonal validation of variants. 11 variants were run individually, values show mean and error for VAMP-seq score and eGFP:mcherry intensity.: https://cdn.elifesciences.org/articles/58026/elife-58026-fig1-data2-v1.csv
Download elife-58026-fig1-data2-v1.csv

We constructed a barcoded site-saturation mutagenesis VKOR library that covered 92.5% of all 3240 possible missense variants. To express this library in HEK293T cells we used a Bxb1 recombinase landing pad system we previously developed (Matreyek et al., 2017). In this system, each cell expresses a single VKOR variant. Recombined, VKOR variant-expressing cells were then sorted into quartile bins based on their eGFP:mCherry ratios. Each bin was deeply sequenced, and abundance scores were calculated based on each variant’s distribution across bins. Raw abundance scores were normalized such that WT-like variants had a score of one and total loss of abundance variants had a score of zero (Figure 1c). We performed seven replicates, which were well correlated (Figure 1—figure supplement 1, mean Pearson’s r = 0.73; mean Spearman’s ρ = 0.7, Supplementary file 1). Abundance score means and confidence intervals for each variant were calculated from the replicates.

The final dataset describes the effect of 2695 of the 3240 possible missense VKOR variants on abundance (Figure 1d and e). Validation of 10 randomly selected variants spanning the abundance score range showed high concordance between individual eGFP:mCherry ratios assessed by flow cytometry and VAMP-seq derived abundance scores (Figure 1f, Pearson’s r = 0.96, Spearman’s ρ = 0.97). Western blots of these variants also showed high concordance with abundance scores (Figure 1—figure supplement 2).

Multiplexed measurement of VKOR variant activity using a gamma-glutamyl carboxylation reporter

We also measured VKOR variant activity, adapting a HEK293 cell assay based on vitamin K- dependent gamma-glutamyl carboxylation of a cell-surface reporter protein (Haque et al., 2014). In this assay, if VKOR is active, a Factor IX domain reporter is carboxylated, secreted and retained on the cell surface where it is detected with a carboxylation-specific, fluorophore-labeled antibody. However, if VKOR is inactive, the reporter is not carboxylated and the antibody cannot bind (Figure 2a). We modified the HEK293 activity reporter cell line to eliminate endogenous VKOR activity by knocking out both VKORC1 and its paralog, VKORC1-like 1 (VKORC1L1) (Tie et al., 2013; Figure 2—figure supplement 1). We also installed a Bxb1 landing pad to facilitate expression of individual VKOR variants or libraries (Figure 2—figure supplement 1). Recombination of WT VKORC1 into the landing pad of the HEK293 VKOR activity reporter cell line yielded robust reporter activation, demonstrating that the reporter line could be used to assess the activity of a library of VKOR variants (Figure 2b).

Figure 2 with 2 supplements see all

Download asset Open asset

Multiplexed measurement of VKOR variant activity using a gamma-glutamyl carboxylation reporter.

(a) Left panel, A Factor IX Gla domain reporter is expressed inHEK293 cells and consists of a prothrombin pre-pro-peptide which allows for processing and secretion, a Factor IX Gla domain, and Proline rich Gla protein 2 (PRGP2) transmembrane and cytoplasmic domains. Middle panel, Cells expressing WT VKOR carboxylate the reporter Gla domain, which, upon trafficking to the cell surface, can be stained using a carboxylation-specific antibody conjugated to the fluorophore APC. Right panel, VKOR knockout cells do not carboxylate the reporter, so the fluorescent antibody does not bind. (b) Density plots of HEK293 activity reporter cells stained with APC-labeled carboxylation-specific antibody expressing no VKOR (blue, n = 7,188), WT VKOR (red, n = 4,107), or the VKOR variant library (grey, n = 41,418). Quartile bins for FACS of the library are marked. (c) Activity score density plots of nonsense variants (dashed blue line, n = 14), synonymous variants (dashed red line, n = 35), and missense variants (filled, solid line, n = 697). The missense variant density is colored as a gradient between the lowest 10% of activity scores (blue), the WT activity score (white) and activity scores above WT (red).

We recombined a library of VKORC1 variants into the HEK293 activity reporter cell line and sorted recombinant cells into quartile bins based on carboxylation-specific antibody binding. Each bin was deeply sequenced and, as for VAMP-seq, an activity score was computed for each variant. Final activity scores and confidence intervals were computed from six replicates for a total of 697 missense variants, 21.5% of those possible (Figure 2—figure supplement 2, mean Pearson’s r = 0.62 and mean Spearman’s ρ = 0.56, Supplementary file 2). Our activity score density plot showed that most variants had WT-like activity scores (Figure 2c).

Human VKOR has four transmembrane domains

Two different domain models, one with three transmembrane domains and another with four, have been proposed for human VKOR (Li et al., 2010; Tie et al., 2012; Figure 3a). Because charged amino acids occur infrequently in transmembrane domains and should be less tolerated, we reasoned we could discriminate between these two models using a sliding window average of the effect of charged substitutions on VKOR abundance (Elazar et al., 2016a; Sharpe et al., 2010). We found four clearly demarcated regions where charged substitutions profoundly reduced VKOR abundance, relative to aliphatic substitutions (Figure 3b). To exclude the possibility that the eGFP tag used in our VAMP-seq assay somehow affected topology, we also analyzed the activity score data. The activity data, derived using native, untagged VKOR, revealed the same four minima as the abundance data (Figure 3c). In addition to these four minima, we also observed an activity score minimum at position 57, corresponding to a conserved serine at this position. This serine occurs at the end of the lumenal half-helix hypothesized to shield the active site from non-specific oxidation, so it is likely this signal is the result of disruption of that half helix. Together, these results strongly support the hypothesis that, like its distant bacterial homolog, human VKOR has four transmembrane domains.

Figure 3 with 3 supplements see all

Download asset Open asset

Abundance, activity, and evolutionary data support four transmembrane domains.

(a) Three and four transmembrane domain (TMD) models of VKOR, with TMDs in dark grey (Li et al., 2010; Tie et al., 2012). (b) Windowed abundance score means (width = 10 positions) for charged substitutions (green) and aliphatic substitutions (gold). Dark grey boxes correspond to TMDs proposed in the four-domain model. Dashed lines show median synonymous and the nonsense abundance scores. (c) Windowed activity score means (width = 10 positions) for charged substitutions (green) and aliphatic substitutions (gold). Boxes and dashed lines as described in b. (d) Secondary structure classification from local evolutionary couplings shown as alpha scores calculated for alpha helices (red) and beta sheets (blue). Dashed lines show significance cut-offs for alpha helices (1.5, red) and beta sheets (0.75, blue) (Toth-Petroczy et al., 2016). (e) A contact map derived from evolutionary couplings. Black points show pairs of positions with significant coupling. Light green points show predicted contacts between TMD1 and TMD2. Dark green points show predicted contacts between TMD1 and TMD4. (f) Predicted tertiary contacts between TMD1-TMD2 (shown in light green in e) and g, TMD1-TMD4 (shown in dark green in e) shown on the evolutionary couplings-derived hVKOR structural model. (h) Scatterplot comparing change in free energy for membrane insertion (Elazar et al., 2016a) (∆∆G_app) to median abundance score for each amino acid substitution. Cytoplasmic and lumenal positions shown in black, TMD2 in light green, and TMDs 1, 3, and four in dark green. Charged substitutions shown as circles, all other substitutions as triangles.

Figure 3—source data 1 Evolutionary couplings secondary structure predictions. Evolutionary couplings secondary structure predictions. Rows show position, with columns showing alpha helix or beta sheet values and predictions.: https://cdn.elifesciences.org/articles/58026/elife-58026-fig3-data1-v1.csv
Download elife-58026-fig3-data1-v1.csv
Figure 3—source data 2 Evolutionary couplings 3D contact predictions. Evolutionary couplings 3D contact predictions. Rows show pairs of residues with contact probabilities.: https://cdn.elifesciences.org/articles/58026/elife-58026-fig3-data2-v1.csv
Download elife-58026-fig3-data2-v1.csv
Figure 3—source data 3 Insertion energies from Elazar et al., 2016b. Insertion energies from Elazar et al., 2016b. Amino acids with calculated insertion energy.: https://cdn.elifesciences.org/articles/58026/elife-58026-fig3-data3-v1.csv
Download elife-58026-fig3-data3-v1.csv

To validate these findings, we performed evolutionary coupling analysis to infer the three-dimensional structure suggested by co-evolution. We aligned 6910 VKOR sequences from both eukaryotes and prokaryotes (1118 sequences from eukaryotes, 5731 from bacteria, and 61 from environmental samples and viruses) and identified coupled residues using the EVcouplings software (Hopf et al., 2012; Marks et al., 2011). Local patterns of evolutionary couplings (i.e. between nearby positions, i to i+4) supported a four-helix topology. The helices predicted by these local evolutionary couplings overlapped 70 of the 82 residues in alpha-helices of the bacterial structure (PDB 4NV5) (Shen et al., 2017) and included in our alignment (hyper-geometric test p-value=3.26⁻²³, Figure 3d).

We identified non-local evolutionary coupling patterns characteristic of three-dimensional contacts, which also strongly supported the four transmembrane domain model. Using these contacts, we computationally folded human VKOR, yielding a modeled structure similar to the bacterial structure (RMSD = 2.58 Å over 97/143 C_alpha, Figure 3—figure supplement 1, Supplementary file 3). The predicted tertiary structure had a four-helix topology, with antiparallel contacts between the full lengths of transmembrane domains 1 and 2 (Figure 3e, Figure 3f) and between the full lengths of transmembrane domains 1 and 4 (Figure 3e, Figure 3g). These antiparallel contacts would not be present in a three-helix topology and are only possible in a four-helix topology. This topology is also supported when we restricted our analysis to 1118 eukaryotic sequences exclusively (Figure 3—figure supplement 2).

Comparison of our abundance data to the energy required to insert different amino acids into the membrane yielded additional evidence for the four transmembrane domain model. The apparent change in free energy (ΔΔG_app) of insertion relative to wild type for every amino acid has been determined experimentally using deep mutational scanning of bacterial membrane proteins (Elazar et al., 2016a). Median abundance score and ΔΔG_app for each amino acid are correlated (Figure 3h). In particular, the large energetic cost of insertion of transmembrane domains with charged amino acids is apparent, including within the second transmembrane domain TMD2. Beyond insertion energies of individual amino acids, the overall hydrophobicity of transmembrane helices contributes to membrane protein insertion (Elazar et al., 2016a), as well as topology (Elazar et al., 2016b) and degradation (Guerriero et al., 2017). To determine whether overall helix hydrophobicity was a large factor contributing to abundance scores, we calculated the free energy for insertion (ΔG_helix) of each helix in the four transmembrane domain model using the ΔG prediction server v1.0 (Hessa et al., 2007) and TopGraph (Elazar et al., 2016b). Both predicted that transmembrane domain three had the most favorable ΔG_helix for insertion (ΔG prediction server: TMD1: 0.435, TMD2: 1.551, TMD3: −1.749, and TMD4: 1.734; Topgraph: TMD1: −6.3, TMD2: −5.5, TMD3: −12.6, TMD4: −4.3). Interestingly, we observed that TMD3 has a high density of substitutions with WT-like scores (Figure 3—figure supplement 3a), suggesting that TMD3’s favorable insertion energy might explain its mutational tolerance. In addition, the high concordance of hydrophobicity indices from bacterial and mammalian multiple sequence alignments further supports the conservation of a four transmembrane domain topology between bacteria and mammals (Figure 3—figure supplement 3b).

Detailed structural context of VKOR variant abundance effects

Having confirmed that human VKOR has four transmembrane domains, we next explored the detailed pattern of mutational effects we observed in the context of a four transmembrane domain homology model. We generated a homology model of human VKOR with I-TASSER using the bacterial VKOR structure (Shen et al., 2017; Yang et al., 2015, Supplementary file 4). We performed hierarchical clustering of positions based on abundance scores, which yielded four groups of positions with characteristic mutational patterns (Figure 4a). In Group 1, most substitutions were neutral or increased abundance; in Group 2, charged amino acid and proline substitutions decreased abundance; in Group 3, all substitutions decreased abundance; and in Group 4, all substitutions decreased abundance profoundly. Each group corresponded to a spatially distinct region of the homology model structure (Figure 4b).

Figure 4 with 2 supplements see all

Download asset Open asset

Hierarchical clustering of abundance scores and distributions of abundance and activity scores by domain.

(a) A heatmap showing hierarchical clustering of positions based on abundance score vectors, with the dendrogram above. Groups of positions, chosen based on the dendrogram, are numbered and colored. Heatmap color indicates abundance scores scaled as a gradient between the lowest 10% of abundance scores (blue), the WT abundance score (white) and abundance scores above WT (red). Grey bars indicate missing variants. Black dots indicate WT amino acids. (b) Positions in groups 1–4 shown on the VKOR homology model, with numbers and colors corresponding to panel a. (c) Boxplot showing relative solvent accessibility of positions in each cluster determined using DSSP (Kabsch and Sander, 1983; Touw et al., 2015) and colored as in b. Bold black line shows median, box shows 25th and 75th percentile. Line shows 1.5 interquartile range above and below percentiles, and outliers are shown as black points. (d) Histograms of abundance scores for missense variants in the cytoplasmic, ER lumenal, or transmembrane domains. (e) Histograms of activity scores for missense variants in the cytoplasmic, ER lumenal, or transmembrane domains.

Group one positions were located in or adjacent to cytoplasmic and ER lumenal loops, which were more tolerant of substitutions than the transmembrane domains. At four Group one positions, K30, R33, R35, and R37, almost every substitution increased abundance. These positively charged positions are positioned either at the edge of TMD1 (K30) or in the ER lumen directly abutting the top of TMD1 (R33, R35, and R37). The ‘positive inside rule’ (von Heijne, 1989), suggests that positive charges in membrane proteins generally reside in the cytoplasm, and this phenomenon is important for driving topology and membrane insertion (Elazar et al., 2016b; Nilsson and von Heijne, 1990; von Heijne, 1989). K30, R33, R35, and R37 violate the positive inside rule, and substitutions at these positions may increase abundance by reducing charge inside the ER, reducing topological frustration or increasing membrane insertion efficiency. Compared to the other 12 arginine and lysine positions in WT VKOR, K30, R33, R35, and R37 are the only ones where substitutions generally increased abundance (Figure 4—figure supplement 1). Our observations are consistent with a screen of rat VKOR variants intended to improve protein expression in E. coli where deletion of positions 31 to 33 increased protein levels (Hatahet et al., 2015).

In Group two, charged amino acids or proline substitutions generally decreased abundance. Group two consisted mostly of transmembrane positions that had side chains projecting into the lipid bilayer. Such transmembrane positions usually have hydrophobic, nonpolar side chains (Ulmschneider and Sansom, 2001). Proline has poor helix forming propensity, explaining why proline substitutions decreased abundance at these positions. Group 3 consisted of a mixture of cytoplasmic, ER lumenal and transmembrane positions where most substitutions decreased abundance. The cytoplasmic positions in this group included the putative dilysine ER localization motif at positions 159 and 161. Also in this group were R98, part of another putative ER retention motif at positions 98 and 100, and a glycine adjacent to TMD1 at position nine. The transmembrane positions had side chains projecting towards neighboring transmembrane helices, suggesting that, as for other membrane proteins (Fleming and Engelman, 2001; Mravic et al., 2019), intramolecular sidechain packing is important for abundance.

Finally, substitutions in Group 4, consisting of positions G19, Y88, I141, and L145, resulted in catastrophic loss of abundance. These positions are all in transmembrane domains with side chains projecting into the interior of the protein. On the basis of strict mutational intolerance of these positions, we hypothesized that their coordinated side chain packing comprises the core of the VKOR four helix bundle. Indeed, Group four residues had dramatically lower relative solvent accessibility than Groups 1–3 (Figure 4c).

The four transmembrane domain homology models also allowed us to explain VKOR’s unusual trimodal distribution of variant abundance scores. Previous VAMP-seq derived abundance score distributions for the cytosolic proteins TPMT and PTEN were bimodal (Figure 4—figure supplement 2; Matreyek et al., 2018), and 15 of 16 deep mutational scans of other soluble proteins using a variety of other assays also exhibited bimodal functional score distributions (Gray et al., 2017). Because VKOR is an ER resident, transmembrane protein, we hypothesized that its unusual trimodal abundance score distribution resulted from transmembrane domain substitutions. Indeed, the lowest mode of the distribution was composed almost exclusively of deleterious transmembrane domain substitutions (Figure 4d). In contrast, the intermediate mode consisted of substitutions in the ER lumen, cytoplasm, and transmembrane domains. Similarly, substitutions that profoundly decreased activity occurred in transmembrane domains (Figure 4e).

Variant activity and abundance identify functionally constrained regions of VKOR

We reasoned that our activity and abundance data could reveal the location of functionally important positions in VKOR, including the active site, since functionally important positions should have many loss-of-activity but few loss-of-abundance variants. Thus, we calculated the specific activity for each variant by taking the ratio of its rescaled activity score and abundance score (see Methods). We computed the median specific activity for each position; substitutions at positions with low median specific activity generally have low activity relative to their abundance. We set a specific activity threshold based on two absolutely conserved cysteines that form VKOR’s redox center, C132 and C135. Using this threshold, positions with the lowest 12.5% of specific activity scores and with at least four variants scored for activity were deemed functionally constrained and mapped on the homology model of VKOR (Figure 5a, Figure 5—figure supplement 1). These 11 functionally constrained positions are organized around C132 and C135 and define, at least in part, the VKOR active site (Figure 5b,c, Figure 5—figure supplement 1). Among the functionally constrained positions are six positions previously identified in vitamin K docking simulations (Czogalla et al., 2017; Figure 5—figure supplement 1), including F55, which is hypothesized to bind vitamin K. Three functionally constrained positions, G60, R61, and A121, did not match any position in the predicted active site, but were immediate neighbors of W59 and L120, positions that are present in the predicted active site.

Figure 5 with 3 supplements see all

Download asset Open asset

Functionally constrained positions reveal VKOR active site and critical cysteines.

(a) Positions with the lowest 12.5% of median specific activity scores and at least four variants scored for activity are shown as magenta spheres on the VKOR homology model. Cysteines C132, and C135, also in the bottom 12.5% of median specific activity scores, are shown in green spheres. (b) Magnified view of the redox center cysteines (positions 132, and 135, green spheres) and surrounding residues that define the active site (magenta spheres). Residues shown in transparent spheres, with side chains also shown in sticks. (c) Panel b rotated 120°.

Figure 5—source data 1 VKOR positional abundance and activity scores. VKOR positional abundance and activity scores. Rows show positions, with columns showing median abundance score, median activity score, rescaled scores, and specific activity score.: https://cdn.elifesciences.org/articles/58026/elife-58026-fig5-data1-v1.csv
Download elife-58026-fig5-data1-v1.csv

Besides C132 and C135, VKOR has two additional absolutely conserved cysteines, C43 and C51. In the four transmembrane domain model, C43 and C51 are postulated to be loop cysteines that relay electrons to the C132/C135 redox center (Liu et al., 2014). We classified C43 as having low specific activity, but we only observed one variant at this position, so it was not included in our set of functionally constrained positions (Figure 5—figure supplement 2). In contrast, substitutions at C51 resulted in only modest activity loss, a phenomenon that has been observed previously (Shen et al., 2017). Interestingly, every substitution at C51 and 15 of 19 at C132 decreased VKOR abundance (Figure 5—figure supplement 2). Inside cells, the majority of VKOR molecules have a C51-C132 disulfide bond, and warfarin binds to this redox state of VKOR (Shen et al., 2017). Since disruption of this disulfide bond apparently impacts abundance as well as activity, this bond may be important for VKOR folding and stability.

VKOR is thought to contain two sequences important for ER localization. The first is a diarginine motif (RxR) at positions 98–100, and the second is a dilysine motif (KXKXX) at positions 159–163. While we did not directly measure localization, we found that only six of 19 R98 variants and seven of 14 R100 variants resulted in low abundance (Figure 5—figure supplement 3). In contrast, nearly all variants at K159 (14 of 18) and K161 (17 of 19) resulted in low abundance (Figure 5—figure supplement 3). A histidine substitution was tolerated at position 161, which mimics the KXHXX motif commonly found in coronaviruses and a small number of human proteins (Ma and Goldberg, 2013). Because protein localization and degradation are coupled (Hessa et al., 2011), we suggest that the reductions in abundance we observe are the result of degradation caused by mislocalization, and that the dilysine motif at positions 159–163 is essential for VKOR ER localization. Overall, comparison of VKOR variant activity and abundance revealed functionally important regions, refining our understanding of the active site, redox-active cysteines, and ER retention motifs.

Functional consequences of VKOR variants observed in humans

Variation in VKOR is linked to both disease and warfarin response, but the overwhelming majority of VKOR variants found in humans so far have unknown effects. Thus, we curated a total of 215 variants that had either been previously reported in the literature as affecting warfarin response (Supplementary file 5), were in ClinVar (Landrum et al., 2014), were in gnomAD v2 or v3 (Karczewski et al., 2019), or were present in individuals whose healthcare provider had ordered a multi-gene panel test from a commercial testing laboratory (Color Genomics) (Supplementary file 6). Of eight variants present in ClinVar, we included only one (D36Y) in our analysis as it was the only variant reviewed by an expert panel (Kurnik et al., 2012). 159 variants were present in gnomAD, and all but one missense variant (D36Y) had population frequencies less than 0.2%. 28 variants were literature-curated warfarin response variants, only 12 of which were in one of the databases surveyed. D36Y was the only warfarin response variant present in all databases, ClinVar, gnomAD, and Color (Figure 6—figure supplement 1).

We classified 193 of the 215 variants we curated according to their abundance (Supplementary file 6). All synonymous variants with the exception of two were WT-like or possibly WT-like, while the three nonsense variants were scored as low abundance (Figure 6a). Missense variants spanned all abundance categories, with 129 (60%) having WT-like or possibly WT-like abundance. 30 missense variants were low abundance, and 12 were high abundance. The single known pathogenic variant R98W was low abundance (Figure 6b). We also classified 54 variants according to their activity (Supplementary file 6). Only one variant, A115V, exhibited low activity. It had WT-like abundance, indicating that the loss of activity is not due to loss of abundance.

Figure 6 with 1 supplement see all

Download asset Open asset

Characterization of human variants using abundance and activity data.

(a) Histogram of abundance classifications for variants from gnomAD, ClinVar, and Color Genomics. Nonsense variants colored in blue, synonymous in red, and missense in grey. (b) Histogram of abundance classifications for same variants in a, colored by pathogenicity. The only variant known to cause disease, R98W, is colored in blue. All other variants shown in yellow. (c) Scatterplot showing abundance scores for literature-curated warfarin resistance variants. Bars show standard error and are colored by abundance class. Variants are arranged in order of abundance score.

Figure 6—source data 1 Abundance and activity data for human variants found in ClinVar, gnomAD v2 and v3, and Color Genomics dataset.: https://cdn.elifesciences.org/articles/58026/elife-58026-fig6-data1-v1.csv
Download elife-58026-fig6-data1-v1.csv

We examined warfarin response variants including W5X, the only variant observed so far linked to human warfarin sensitivity (Oldenburg et al., 2004). As expected, W5X was low abundance, reinforcing that heterozygous loss of VKOR is the cause of warfarin sensitivity in carriers of this variant. Warfarin resistance variants, on the other hand, are predicted to abrogate warfarin binding (Li et al., 2010), but it is unclear whether these variants have appreciable effects on abundance or activity. We found that warfarin resistance variants span a range of abundances and that the distribution of warfarin resistant variant abundance was not different from missense variants generally (Figure 6c, two-sided Kolmogorov-Smirnov test p=0.438). Five warfarin-resistance variants had low abundance, suggesting that these variants must block drug binding or increase activity to confer resistance. One variant, A26T, had high abundance, a possible mechanism of warfarin resistance. The five warfarin resistance variants, R58G, W59L, V66M, G71A, and N77S, whose activity we scored, were all WT-like. Thus, our abundance and activity data are consistent with warfarin resistance arising largely from variants that block warfarin binding.

Discussion

We conducted multiplexed assays to measure the effects of 2,695 VKOR variants on abundance and 697 variants on activity. Both abundance and activity data provided evidence for a four transmembrane topology, which was further supported by evolutionary couplings analysis. We evaluated a VKOR homology model in the context of the patterns of variant effects on abundance we measured and found that the homology model could explain these patterns. Low specific activity residues mapped onto this homology model identify, at least in part, the active site, which largely overlaps with the results of a vitamin K docking simulation (Czogalla et al., 2017). Our active site is shallower than what the docking simulation predicts; this is the result of low abundance scores at some of the deeper, transmembrane positions predicted by docking to bind the isoprenoid chain of vitamin K (F87, Y88), and poor coverage of activity scores for other positions (V112, S113). In light of the fact that substitutions at F87 and Y88 resulted in low abundance, we note that the modeled vitamin K binding mode would disrupt packing of VKOR core residues and require repacking of helices to maintain protein stability (Merkle et al., 2018). In addition to the active site, substitutions at the dilysine and, to a lesser extent, the diarginine ER localization motifs caused abundance loss.

We also used our large-scale functional data to analyze 215 VKOR variants found in humans. 16% of these variants affect neither activity nor abundance; we identified 54 previously uncharacterized low abundance or low activity variants that could be pathogenic or alter warfarin response. We found that only one warfarin resistance variant had increased abundance, indicating that increased abundance is not a pervasive warfarin resistance mechanism. Many of the other warfarin resistant variants have warfarin IC50s that are 10- to 100-fold higher than wildtype VKOR in cell assays (Shen et al., 2018), and this high level of resistance probably cannot be gained through increased protein abundance alone. All five of the warfarin resistance variants whose activity we scored were WT-like. Taken together these data support the notion that warfarin resistance generally involves alterations to warfarin binding rather than abundance or activity. We analyzed one known warfarin sensitivity variant, W5X, and found that it is low abundance, suggesting the possibility that any of the 53 other low abundance variants, if found in a person, might also confer warfarin sensitivity.

While our VKOR variant abundance and activity data illuminates various aspects of VKOR’s structure and function, the data have limitations. For example, neither assay captures variant effects on mRNA splicing, which means we cannot determine the effect of human splice site variants on VKOR activity or abundance. In addition, as a result of how these assays were engineered, both have limited dynamic ranges. Thus, subtle effects on abundance or activity cannot be discerned, and it is difficult to translate the scores these assays generated to a more precise biochemical measure, like absolute VKOR molecules present or enzymatic kinetics. In addition, both assays have inherent noise, largely arising from the limited number of cells we can sample due to the bottleneck of cell sorting. We account for this noise by filtering each dataset based on variant frequency and presenting a confidence interval for each abundance and activity score. Reengineering the assay to be growth-based, instead of flow cytometry-based, would increase the number of cells sampled and would most likely improve library coverage and score estimation.

In the future, we envision that the assays we used could be employed to better understand VKOR’s interaction with warfarin. Here, we could measure warfarin’s effect on both variant abundance and activity, mapping the warfarin binding site more finely. In addition, we could identify warfarin resistance mutations that have not yet been observed in the clinic and group variants by their putative resistance mechanism. Overall, our work highlights the value of multiplexed assays of variant effect for better understanding protein structure, function and human variant effects.

Materials and methods

Key resources table

Reagent type (species) or resource	Designation	Source or reference	Identifiers	Additional information
Antibody	Murine anti-Factor IX carboxylated Gla domain (mouse monoclonal)	Green Mountain Antibodies	Cat#GMA-001	(1:100)
Antibody	HRP Anti-beta-actin antibody (mouse monoclonal)	Abcam	Cat#ab20272; RRID:AB_445482	(1:10,000)
Antibody	Amersham ECL Mouse IgG, HRP-linked whole Ab (sheep polyclonal)	GE Healthcare	Cat#NA931; RRID:AB_772210	(1:10,000)
Antibody	Goat anti-Mouse IgG (H+L) Highly Cross-Adsorbed Secondary Antibody, Alexa Fluor Plus 488 (goat polyclonal)	ThermoFisher	Cat#:A32723; RRID:AB_2633275	(1:10,000)
Antibody	Goat anti-Rabbit IgG (H+L) Cross-Adsorbed Secondary Antibody, Alexa Fluor 647 (goat polyclonal)	ThermoFisher	Cat#:A-21244; RRID:AB_2535812	(1:10,000)
Antibody	anti-Cofilin (D3F9) XP Rabbit mAb (rabbit monoclonal)	Cell Signaling Technology	Cat#:5175; RRID:AB_10622000	(1:1000)
Antibody	Anti-GFP from mouse IgG1κ (clones 7.1 and 13.1) (mouse monoclonal)	Roche	Cat#:11814460001; RRID:AB_390913	(1:1000)
Antibody	Anti-VKOR (rabbit monoclonal)	PMID:16634640		(1:1000)
Strain, strain background (Escherichia coli)	E. coli electrocompetent	NEB	Cat#C3020K
Commercial assay or kit	LYNX Rapid APC conjugation kit	BioRad	Cat#LNK033APC
Commercial assay or kit	SuperSignal West Dura Extended Duration Substrate	ThermoFisher	Cat#34076
Commercial assay or kit	Library Quantification Kit (Illumina)	KAPA Biosystems	Cat#KK4854
Commercial assay or kit	MiSeq Reagent Kit v3 (600 cycles)	Illumina	Cat#MS-102–3003; RRID:SCR_016379
Commercial assay or kit	NextSeq 500/550 High Output v2 kit (75 cycles)	Illumina	Cat#TG-160–2005; RRID:SCR_016381
Commercial assay or kit	Qiagen miniprep kit	Qiagen	Cat#27106
Commercial assay or kit	GenElute HP Plasmid midiprep kit	Sigma	Cat#NA0200
Commercial assay or kit	DNA clean and concentrator kit	Zymo	Cat#D4031
Software, algortihm	PyMOL	Schrodinger		https://pymol.org
Software, algortihm	Enrich2	PMID:28784151		https://github.com/FowlerLab/Enrich2
Software, algortihm	R	R		https://cran.r-project.org/
Other	Fugene 6	Promega	Cat#E2691
Other	Lipofectamine 3000	ThermoFisher	L3000015
Other	Atomic coordinates, bacterial VKOR structure	Protein Data Bank	PDB: 4NV5
Other	Raw and analyzed data	This paper	GSE149922
Cell line (human)	293T AAVS1 tetbxb1 clone 4	PMID:28335006
Cell line (human)	HEK293 VKOR activity reporter	PMID:24297869
Recombinant DNA reagent	PX458	Addgene	Cat#48138; RRID:Addgene_48138

Share this article

Cite this article

Multiplexed measurement of VKOR variant abundance using VAMP-seq.

Figure 1—source data 1

Figure 1—source data 2

Multiplexed measurement of VKOR variant activity using a gamma-glutamyl carboxylation reporter.

Abundance, activity, and evolutionary data support four transmembrane domains.

Figure 3—source data 1

Figure 3—source data 2

Figure 3—source data 3

Hierarchical clustering of abundance scores and distributions of abundance and activity scores by domain.

Functionally constrained positions reveal VKOR active site and critical cysteines.

Figure 5—source data 1

Characterization of human variants using abundance and activity data.

Figure 6—source data 1

Author details

Melissa A Chiasson

Contribution

Competing interests

Nathan J Rollins

Contribution

Competing interests

Jason J Stephany

Contribution

Competing interests

Katherine A Sitko

Contribution

Competing interests

Kenneth A Matreyek

Contribution

Competing interests

Marta Verby

Contribution

Competing interests

Song Sun

Contribution

Competing interests

Frederick P Roth

Contribution

Competing interests

Daniel DeSloover

Contribution

Competing interests

Debora S Marks

Contribution

Competing interests

Allan E Rettie

Contribution

Competing interests

Douglas M Fowler

Contribution

For correspondence

Competing interests

Citations by DOI

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organism