Introduction

Human embryonic stem cells (hESC) are derived from the inner cell mass of a pre-implantation embryo1. They show prolonged undifferentiated potential, as well as the ability to differentiate into the three main embryonic germ layers2, making them excellent models for studying disease mechanisms, development and differentiation. However, their use remains restricted by regulations, based in part upon ethical considerations3.

Over a decade ago, methods allowing the induction of pluripotent stem cells from fibroblast cultures, in both human and mice, were developed4,5. These reports showed that by exogenously expressing a small set of key transcription factors (Oct4, Sox2, c-Myc and Klf4), a somatic cell could be reprogrammed back into a pluripotent state, characterised by their capacity for self-renewal and ability to differentiate into the three main germ layers. These human induced pluripotent stem cells (hiPSCs) show many key features of their physiological embryonic stem cell (hESC) counterparts, while avoiding many of the ethical issues regarding the use of stem cells derived from embryos.

Since the discovery of reprogramming methods, hiPSC lines have attracted great interest, particularly for their potential use as alternatives to hESCs in regenerative medicine6 and disease modelling, including studies on monogenic disorders7,8, and some late onset diseases9. However, to understand the value of using hiPSCs in regenerative therapy, drug development and/or studies of disease mechanisms, it is important to establish how similar hiPSCs are to hESCs at the molecular and functional levels. To address this, multiple studies have compared hiPSCs and hESCs, using a variety of assays, including methylation analysis10, transcriptomics11,12 and even quantitative proteomics13. It should be noted, however, that many of these earlier studies were performed at a time when reprogramming protocols were less robust14 and when the depth of proteome coverage and quantitative information that could be obtained was lower than today.

In this study, we have addressed the similarity of hiPSCs to hESCs by performing a detailed proteomic analysis, comparing a set of hiPSC lines derived from human primary skin fibroblasts15 of independent, healthy donors, with several independent hESC lines. The data highlight that while both types of stem cell lines have very similar global protein abundance profiles, they also show some specific and significant quantitative differences in protein expression. In particular, the reprogramed iPSC lines consistently display higher total protein levels, predominantly affecting cytoplasmic proteins required to sustain higher growth, along with mitochondrial changes, and an excess of secreted proteins, with impact upon cell phenotypes.

Results

hESCs and hiPSCs display quantitative differences in protein abundances

For this study, we compared multiple hESC and hiPSC lines, all derived from different donors and cultured using identical growth conditions. First, the expression levels of the main pluripotency markers were verified in each of the lines, with no differences detected between the respective hESC and hiPSC cell types (Fig. 1a). From these data representative sets of four hiPSCs and four hESCs lines were selected for detailed proteomic analysis using mass spectrometry. The proteomes were characterised using tandem mass tags (TMT)16, within a single 10-plex (Table S1) and using MS3-based synchronous precursor selection17 (SPS). To further optimise quantification accuracy, each sample was allocated to a specific isobaric tag to minimise cross-population reporter ion interference (Fig.1b), as previously described18. In total 8,491 protein groups (henceforth referred to as ‘proteins’), were detected at 1% FDR, with >99% overlap between the proteins detected from both the hESC and hiPSC lines (Fig. 1c). However, it is important to highlight that TMT is not the right method to use when looking for proteins that are specific to one condition or population18.

Proteomic overview:

(a) Western blots showing the expression of the pluripotency factors NANOG, OCT4 and SOX2 across all hESC and hiSPC lines (b) Diagram showing the SPS-MS3 TMT proteomic workflow used for the experiment. (c) Venn diagram showing the overlap of proteins identified within the hiPSC and hESC populations. (d) Average copy number histogram for the hESCs. (e) Average copy number histogram for the hiPSCs. (f) Bubble plot showing proteins coloured by specific categories where the size is represented by the average hESC estimated protein copy numbers. (g) Bubble plot showing proteins coloured by specific categories where the size is represented by the average hiPSC estimated protein copy numbers. (h) PCA plot based on the log10 copy numbers for all 8 replicates. hESCs are shown in purple and hiPSCs in orange.

To provide a quantitative comparison of the respective proteomes, we focussed on analysing the 7,878 proteins that were detected with at least 2 unique and razor peptides (see methods). After confirming that there were no differences in the abundance levels of histones between the two cell types (Fig. S1), protein copy numbers were estimated via the “proteomic ruler”19 (see methods). The copy number data highlighted that both the hESC (Fig. 1d) and hiPSC (Fig. 1e) proteomes display a similar dynamic range, with estimated protein copy numbers extending from a median of less than 100 copies, to over 100 million copies per cell. Furthermore, the composition of the respective proteomes is highly similar. Both cell types display high expression levels of ribosomal proteins, protein chaperones and glycolytic enzymes (Fig. 1f&g), consistent with rapid proliferation and dependence on glycolysis for energy generation20. It is only when the quantitative data are examined in more detail that differences between the cell types become apparent (Fig. 1h). A principal component analysis (PCA), based on the protein copy numbers, revealed a clear separation between the two stem cell populations within the main component of variation, which accounted for 69% of variance. The PCA suggested that the independent iPSC lines were clearly different to the hESC lines, and vice versa.

Standard normalisation methods mask changes in total protein content in hiPSCs compared to hESCs

A previous proteomic study reported that there were virtually no protein level differences between hESCs and hiPSCs13. However, in that study the intensity data were median normalised. We therefore decided to compare two different normalisation methods: i.e., the previously used median normalisation method and the “proteomic ruler”19. The median normalisation produces concentration-like results and is frequently used to normalise proteomic data. With this approach, our data also show no major differences in protein abundances between the hESC and hiPSC lines (Fig. 2a), i.e. ∼94% of all proteins displayed no significant changes in abundance (FC>1.5-fold; q-value < 0.001), similar to the previously reported conclusion13. However, median (or total intensity) normalisation methods lack the capacity to detect changes in absolute abundance, cell size or protein content. By artificially forcing all medians to be almost identical, such changes are invisible.

Normalisation and protein content:

(a) Concentration based volcano plot showing the -log10 p-value and the log2 fold change comparing hiPSCs to hESCs. Elements shaded in red are considered significantly changed. All dots above the red line have a q-value lower than 0.001 (b) Copy number-based volcano plot showing the -log10 p-value and the log2 fold change comparing hiPSCs to hESCs. Elements shaded in red are considered significantly changed. All dots above the red line have a q-value lower than 0.001 (c) Box plot showing the MS based estimated protein content for hESCs and hiPSCs. (d) Box plot showing the protein amount per million cells derived from the EZQ Protein Quantification Kit for all hESCs and hiPSCs (e) Boxplot showing the median forward scatter of hESCs and hiPSCs. (f) Boxplot showing the median side scatter of hESCs and hiPSCs. (g) Boxplot showing the median number of cells across cell cycle stages for hESCs and hiPSCs. For all boxplots, the bottom and top hinges represent the 1st and 3rd quartiles. The top whisker extends from the hinge to the largest value no further than 1.5 3 IQR from the hinge; the bottom whisker extends from the hinge to the smallest value at most 1.5 3 IQR of the hinge.

This is not the case for the results produced with the “proteomic ruler”19. The copy number-based analysis enables an approximation to absolute protein abundance and can reveal changes in cell mass as we previously reported21,22. Using the proteomic ruler method highlighted systematic differences between hESCs and hiPSCs (Fig. 2b), with 56% (4,408/7,878) of all proteins detected significantly increased in hiPSCs (FC>1.5-fold; q-value < 0.001). In contrast, only 40 proteins (0.5%) showed significantly lower expression levels in hiPSCs. With thousands of proteins displaying higher abundance, we hypothesised that hiPSCs have higher total protein content, compared to hESCs. Using the protein copy numbers to estimate the total protein content showed that hiPSCs had >50% higher protein content compared to hESCs (Fig. 2c). To validate this observation, an independent assay (EZQTM assay; see methods), was used to measure the total protein yield from similar numbers of freshly grown hiPSCs and hESCs. From these experiments, the calculated protein amount per million cells was 74% higher (Fig. 2d; p-value=0.0018) in hiPSCs, relative to hESCs. We conclude that hiPSCs have a higher total protein content.

Changes in protein content could potentially be linked to differences in the cell cycle profile. Hence, we used fluorescence-activated cell sorting (FACS) to study the cell cycle distribution of hESCs and hiPSCs. The FACS data showed that hiPSCs have significantly higher forward scatter (Fig. 2e), correlated to increased cell size, as well as significantly higher side scatter (Fig. 2f), correlating to increased cell granularity. However, the FACS analysis revealed no significant differences between hiPSCs and hESCs in the percentage of cells at each of the cell cycle stages (Fig. 2e). We conclude that hiPSCs have significantly higher total protein content, with increased size and granularity, but that these differences with hESCs are independent of changes in cell cycle distribution.

hiPSCs have elevated nutrient transporters, metabolic proteins, and protein and lipid synthesis machinery

To maintain a higher protein content than hEScs with a comparable cell cycle profile, hiPSCs would require higher protein synthesis capacity, which in turn requires nutrients and energy. Energy metabolism in primed pluripotent stem cells is largely dependent on glycolysis23, which is sensitive to glucose uptake and lactate shuttling. Therefore, we compared the expression of the respective glucose and lactate transporters between hiPSCs and hESCs. The data showed both main glucose transporters, GLUT1 (SLC2A1) and GLUT3 (SLC2A3), had higher abundance in hiPSCs, as did the lactate transporters SLC16A1 and SLC16A3 (Fig. 3a). Other rate limiting enzymes, including Hexokinase 1 (HK1) and 2 (HK2) were also significantly increased within hiPSCs (Fig. 3b), suggesting increased glycolytic potential.

Fuelling growth:

(a) Boxplots showing the estimated copy numbers for the lactate (SLC16A1 and SLC16A3) and glucose transporters (SLC2A1 and SLC2A3) across hESCs and hiPSCs. (b) Schematic showing the glycolytic proteins and their fold change in hESC vs hiPSCs. (c) Chord diagram showing the 15 most upregulated solute carrier proteins along with their classification based on transport activities/localisation. (d) Boxplots showing the estimated copy numbers of the main amino acid transporters in hESCs and hiPSCs. (e) Boxplot showing the net glutamine uptake (see methods) in hESCs and hiPSCs. (f) Schematic showing the glutaminolysis proteins and their fold change in hESC vs hiPSCs. (g) Radar plot showing the median fold change (iPSC/ESC) for protein categories which are related to the pre-ribosomes. Boxplots showing the estimated copy numbers for (h) SREBF1, (i) FASN, (j) SCD, (k) PLIN3. in hESC vs hiPSCs. (l) Transmission electron microscopy images for hiPSCs. Lipid droplets are marked with red arrows. (m) Transmission electron microscopy images for hESCs. For all boxplots, the bottom and top hinges represent the 1st and 3rd quartiles. The top whisker extends from the hinge to the largest value no further than 1.5 3 IQR from the hinge; the bottom whisker extends from the hinge to the smallest value at most 1.5 3 IQR of the hinge.

Nutrient uptake is mostly handled by the SLC (solute carrier) group of membrane transporters. Analysis of the 15 most upregulated SLC transporters in hiPSCs compared to hESCs showed that they mostly belonged to two categories, i.e., amino acid and mitochondrial transporters (Fig. 3c). Amino acids are vital to sustain high rates of protein synthesis24 and the data showed that 11/12 amino acid transporters were significantly increased in hiPSCs, compared to hESCs, including the hyper abundant, present at >4 million copies per cell, protein SLC3A2 (Fig. 3d). The highest fold increases, >4-fold, was seen for SLC38A1 and SLC38A2, both of which are major glutamine transporters25,26.

We next examined whether the increased abundance of the glutamine transporters had phenotypic impact, i.e., whether it correlated with increased glutamine uptake within hiPSCs. To test this hypothesis, we measured the uptake of radio-labelled glutamine in both hiPSCs and hESCs (see methods). The data showed that hiPSCs had a median of >90% higher uptake of glutamine, compared to hESCs (Fig. 3e). Glutamine has been reported to be the most consumed amino acid in hESCs27 and its catabolism to be one of the vital metabolic pathways that can provide ATP and more importantly biosynthetic precursors required to sustain growth28. Hence, we also explored the abundance of enzymes involved in glutaminolysis and found that vital proteins, including GLS, GLUD1, GPT2 and GOT2, were also significantly higher in hiPSCs (Fig. 3f).

Having established that hiPSCs have increased expression of nutrient transporters and higher expression of enzymes in key metabolic pathways, compared with hESCs, we next looked at the machinery required for protein synthesis. The levels of many of the proteins involved in ribosome subunit biogenesis, including ribosomal proteins, were higher in hiPSCs (Fig. 3g). The increased expression of translation machinery components, nutrient transporters and many metabolic enzymes is consistent with the increased total protein content seen within hiPSCs.

The data also highlighted increased fatty acid (FA) and lipid droplet (LD) synthesis potential in hiPSCs, with increased abundance of the SREBP1 (SREBF1; Fig. 3h), master regulator of lipid synthesis29, as well as FASN (Fig. 3i) and SCD (Fig. 3j). Similarly, a crucial regulator for LD assembly, PLIN330, (Fig. 3k), displayed >2-fold increased abundance in hiPSCs. To examine the potential phenotypic impact of this increased abundance of proteins involved in LD synthesis, we performed transmission electron microscopy (TEM) analyses to compare hiPS and hES cells. This showed that LDs were clearly visible in hiPSCs (Fig. 3l), but not visible in hESCs (Fig. 3m). We conclude that the hiPSCs have elevated levels of LDs, resulting from the increased expression of proteins involved in lipid synthesis and LD assembly.

hiPSCs show altered mitochondrial metabolism compared to hESCs

Our data also highlighted important changes in mitochondrial proteins, including increases in the levels of metabolic proteins that are encoded within the mitochondrial genome31, (Fig. 4a). The latter proteins are translated by special mitochondrial ribosomes (mitoribosomes) embedded in the mitochondrial membrane. The protein components of mitoribosomes also showed increased expression in hiPSCs (Fig. 4b), along with virtually all proteins involved in the translation initiation, elongation and termination of mitochondrial genome-encoded proteins (Fig. 4c).

Mitochondrial differences:

(a) Schematic showing the mitochondrial genome encoded proteins and their fold change in hESCs and hiPSCs. (b) Boxplot showing the estimated copy numbers of all mitochondrial ribosomal proteins. (c) Schematic showing proteins involved in mitochondrial translation and their fold change (hiPSCs/hESCs). (d) Treeplot showing all mitochondrial transporters, size is proportional to the estimated copy numbers in hiPSCs. Boxplot showing the estimated copy numbers for (e) SLC25A20. (f) CPT1A. (g) MCAT, (h) MECR, (i) OSXM, (j) Boxplot showing the log2 fold change (hiPSC/hESCs) of all subunits of the different complexes of the electron transport chain. The median fold change across all detected proteins is shown as a dotted line. (k) Schematic showing the fold change of citic acid cycle and glutaminolysis proteins in hESCs and hiPSCs. (l) Boxplot showing the P/E control ratio. All boxplots show the data for hESCs and hiPSCs. For all boxplots, the bottom and top hinges represent the 1st and 3rd quartiles. The top whisker extends from the hinge to the largest value no further than 1.5 3 IQR from the hinge; the bottom whisker extends from the hinge to the smallest value at most 1.5 3 IQR of the hinge.

The analysis of transporter proteins revealed a cluster of 22/27 mitochondrial transporters were significantly increased in hiPSCs, including the hyper abundant, >10 million copies per cell, ATP/ADP transporter (Fig. 4d). A subset of 14 transporters displayed >2-fold increased abundance and this included the acylcarnitine transporter SLC25A20 (Fig. 4e), which is part of the carnitine shuttle in the beta oxidation pathway. Other component of the shuttle, CPT1A, displayed over 4-fold higher abundance in hiPSCs (Fig. 4f), suggesting an important role. Data showed it was not just fatty acid oxidation but also synthesis that was affected, with proteins acting in the mitochondrial fatty acid synthesis (mFAS) pathway also increased in abundance. MCAT (Fig. 4g), MECR (Fig. 4h) and OXSM (Fig. 4i) all displayed ∼2-fold higher abundance in hiPSCs compared to hESCs. These results have a metabolic relevance as mFAS has been reported to control the electron transport chain (ETC) activity32, which was also increased in hiPSCs as subunits of all 5 ETC complexes increased in abundance in hiPSCs and with complex II and III showing the most prominent effects (Fig. 4j). Complex II is also part of the tricarboxylic acid (TCA) cycle, which displayed increased in abundance in the majority of proteins involved in the pathway (Fig. 4k).

As the proteomic data showed clear differences between hiPSCs and hESCs in the levels of mitometabolism proteins, we performed experiments to explore whether this was reflected in phenotypic differences between hiPSCs and hESCs. This was tested using high-resolution respirometry (see Methods). The data showed that hiPSCs had a higher P/E control ratio to hESCs, which denotes an increased capacity of the phosphorylation system to produce ATP (Fig. 4l). We conclude that hiPSCs have elevated levels of mitometabolism proteins relative to hESCs, resulting in higher respiratory activity.

hiPSCs upregulate secreted proteins affecting their microenvironment

Among the most upregulated proteins in hiPSCs were a subset of secreted proteins. Secreted proteins are of great importance because changes in their absolute abundance can affect the extracellular environment. These secreted proteins mostly represented 4 categories: structural extracellular matrix (ECM) proteins, growth factors, protease inhibitors and proteases (Fig. 5a). The ECM can provide both support for cells as well as active participation in cell signalling by providing domains for growth factors33. It can also be reshaped by tumours to promote cancer cell growth and migration34. The data show that both the laminins, and collagens were all increased in abundance in hiPSCs (Fig. 5b). Collagens are reported to alter the stiffness of the ECM and their synthesis is iron intensive. Interestingly, the data also show that proteins involved in importing and storing iron were increased in abundance in hiPSCs (Fig. 5c-f).

Secreted proteins:

(a) Sankey diagram showing secreted proteins that are significantly increase in hiPSCs and belong to the the ECM matrix, Growth factor, Protease Inhibitor or Protease categories. (b) Schematic showing ECM proteins that are significantly increased in abundance in hiPSCs. (c) Boxplot showing the estimated copy numbers of TF in hESCs and hiSPCs. (d) Boxplot showing the estimated copy numbers of TFRC in hESCs and hiSPCs. (e) Boxplot showing the estimated copy numbers of FTH1. (f) Boxplot showing the estimated copy numbers of FTL. (g)Schematic showing the changes in abundance in vital primed pluripotency growth factors. (h) Boxplot showing the estimated protein copy numbers for VGF. (i) Boxplot showing the estimated protein copy numbers for MDK. (j) Boxplot showing the estimated protein copy numbers for TGFB1. (k) Boxplot showing the estimated protein copy numbers for ARG1. (l) Boxplot showing the estimated protein copy numbers for CD276. (m) Boxplot showing the estimated protein copy numbers for HLA-E. (n) Boxplot showing the estimated protein copy numbers for CD200. (o) Boxplot showing the estimated protein copy numbers for CD47. All boxplots show the data for hESCs and hiPSCs. For all boxplots, the bottom and top hinges represent the 1st and 3rd quartiles. The top whisker extends from the hinge to the largest value no further than 1.5 3 IQR from the hinge; the bottom whisker extends from the hinge to the smallest value at most 1.5 3 IQR of the hinge.

The data also showed that 13 growth factors were increased in abundance in hiPSCs, compared to hESCs. A subset of these, i.e., FGF1, FGF2 and NODAL, are reported to have direct relevance to the maintenance of pluripotency and can modulate important processed in PSCs 3537 (Fig. 5g). Other growth factors that are upregulated in hPSCs are linked to disease and cancer, including VGF (Fig. 5h), which is linked to promoting growth and survival in glioblastoma38 and MDK (Fig. 5i), which is highly expressed in malignant tumors39 and has been shown to play a role in chemoresistance40.

hiPSCs display increased abundance of immunosuppressive proteins

NODAL wasn’t the only growth factor in TGFB family member that was increased in hiPSCs, with TGFB1 displaying a ∼5-fold increase in abundance in hiPSCs compared to hESCs (Fig. 5j). Besides its role as a growth factor, TGFB1 has been shown to have important roles in the regulation of the immune response, promoting the generation of regulatory T cells, while inhibiting the generation and function of effector T cells. As immunogenicity of PSCs is a topic of relevance to clinical adaptations, we looked for differences in modulators of the immune response.

Arginine availability is vital to effector T cells and other leukocytes, where depletion mediated by Arginase has been shown to be linked to and T cell inhibition41. Our data show that hiPSCs have ∼2.5-fold higher abundance of ARG1 (Fig. 5k). Furthermore, hiPSCs also display increased expression of the immune checkpoint protein CD276 (Fig. 5l), which has been reported to be a potent inhibitor of survival and function of T cells42,43.

hiPSCs also displayed increased abundance of inhibitory ligands that supress the immune function of other leukocytes. The data show hiPSCs have increased abundance of the non-classical HLA-E (Fig. 5m), which has been shown to interact with the NK cell receptor NKG2A to mediate immune evasion in ageing cells44. They also displayed increased abundance of CD200 (Fig. 5n), a ligand for CD200R, which can inhibit the immune response from macrophages, basophils, NK cells and T cells, as well as CD47 (Fig. 5o), a ligand of SIRPA that helps cells to escape macrophage phagocytosis. These data indicate that hiPSCs have increased abundance of known immunosuppressive proteins, compared to hESCs.

hiPSCs display reduced abundance of H1 histones

A striking feature of this proteomic study is how few proteins (<1%; 40/7,878), showed significantly decreased abundance in hiPSCs, compared to hESCs. A high proportion of these proteins affected nuclear processes. Thus, an overrepresentation analysis showed that proteins whose abundance was decreased in hiPSCs were enriched in GO terms related to DNA recombination, nucleosome positioning and chromatin silencing (Fig. 6a). Notably, this included four H1 histones, which are reported to influence nucleosomal repeat length45 and stabilise chromatin structures46. Our data show that the most abundant variant in hESCs, HIST1H1E, is decreased in abundance in hiPSCs by ∼3.5-fold (Fig. 6b), while HIST1H1C (Fig. 6c), HIST1H1D (Fig. 6d) and H1FX (Fig. 6e) are all decreased by >1.7-fold.

Changes within histones:

(a) Barplot showing the GO term enrichment results for proteins significantly decreased in abundance (see methods) in hiPSCs. (b) Boxplots showing estimated copy numbers for histones HIST1H1E in hESCs and hiPSCs. (c) Boxplots showing estimated copy numbers for histones HIST1H1D in hESCs and hiPSCs. (d) Boxplots showing estimated copy numbers for histones HIST1H1C in hESCs and hiPSCs. (e) Boxplots showing estimated copy numbers for histones H1FX in hESCs and hiPSCs. (f) Barplot showing the estimated copy numbers for all histones in hESCs and hiPSCs. (g) Boxplots showing estimated copy numbers for histones HIST1H3A in hESCs and hiPSCs. (h) Boxplots showing estimated copy numbers for histones HIST1H4A in hESCs and hiPSCs. Western blot showing the abundance of (i) H3 and (j) H4 histones in hESCs and hiPSCS. (k) Boxplots showing estimated copy numbers for histones H2AFV in hESCs and hiPSCs. (l) Boxplots showing estimated copy numbers for histones H2AFY in hESCs and hiPSCs. (m) Boxplots showing estimated copy numbers for histones H2AFY2 in hESCs and hiPSCs. For all boxplots, the bottom and top hinges represent the 1st and 3rd quartiles. The top whisker extends from the hinge to the largest value no further than 1.5 3 IQR from the hinge; the bottom whisker extends from the hinge to the smallest value at most 1.5 3 IQR of the hinge.

As histone variants have very similar protein sequences, where peptides can match to multiple H1 histones, a peptide level analysis was necessary to deconvolute the signal (Fig. S2). The Andromeda search engine47 assigns peptide intensities to a protein following a razor peptide approach, where the intensity of a peptide is assigned to only one protein, regardless if its unique or shared. This makes the analysis of specific variants challenging at the protein level. Hence, we focussed on a peptide specific analysis and found that the intensity of the peptides between that were shared between these 3 H1 histones displayed a consistent reduction in abundance in hiPSCs (Fig. S2).

The systematic reduction in abundance in hiPSCs seen with H1 histone variants was not seen for members of the other histone families. Evaluating either the concentration (Fig. S1), or copy numbers (Fig. 6f), across all histones, showed no significant differences in expression between hiPSCs and hESCs. Furthermore, for core histones, including H3 and H4, there were no significant abundance differences seen within either the proteomics data(Fig. 6g&h), or in additional western blot analyses that we performed to validate these conclusions (Fig. 6i&j). However, we did detect differences between hiPSCs and hESCs in the expression of histone H2 variants, with H2AFV (Fig. 6k), H2AFY (Fig. 6l) and H2AFY2 (Fig. 6m), all increased in abundance in hiPSCs. As histone H2 variants also have high sequence similarity and shared peptides, we also performed a peptide level analysis, which validated that both shared and unique peptides displayed the same pattern showing increased abundance in hiPSCs (Fig. S3). Thus, we conclude that there are opposing effects for histone H1 and histone H2 variants, with the former decreased and the latter increased in abundance in hiPSCs.

Discussion

Induced pluripotent stem cells can provide vital models for clinical research and future therapies, which makes understanding their similarities and any specific differences with embryo-derived human stem cells vital. This study provides a detailed comparison of the proteomes of multiple hiPSC and hESC lines, with the major conclusion that while they express a near identical set of proteins, with similar abundance ranks, they also display important quantitative differences. In particular, our data indicate that fibroblast reprogrammed hiPSCS display considerable differences in the cytoplasmic and mitochondrial proteome compared to hESCs, while the nuclear proteome, was more similar between the two cell types. Furthermore, additional microscopy analyses and functional assays show that the systematic differences in the proteomes of the respective hiPSCs and hESCs have impact on cell phenotypes, most notably affecting mitochondria, metabolic activity and transport.

Using estimated protein copy numbers, our data show that only <1% of proteins were significantly decreased in abundance in hiPSCs compared to hESCs, and this included multiple H1 histones, while in contrast, ∼ 56% of all proteins quantified were significantly increased (fold change>1.5 and q-value <0.001), with most of these increases affecting cytoplasmic and mitochondrial proteins and activities. The MS data show that total protein levels are higher overall in hiPSCs as compared with hESCs, a result that was independently validated and confirmed using an EZQ protein assay. This difference in total protein content was shown by FACS not to result from differences between hiPSCs and hESCs in cell cycle progression. Instead, the increased protein levels in hiPSCs correlated with increased levels of the protein translational machinery, along with increased metabolic and mitochondrial activity and higher levels of nutrient transport.

These results highlight an important technical point relating to data normalisation and its effect on the interpretation of such data. By using a standard median normalisation (concentration-based approach), instead of the proteomic ruler19, the difference in total protein content between the cell types, involving the increased abundance of thousands of proteins, is not apparent. Hence a cell type with 4-fold higher protein content would display virtually no significant differences as long as the protein ranks and concentration remained similar. This would result in an erroneous conclusion that there is little to no change in protein expression between hiPSCs and hESCs, while the orthogonal data suggest otherwise.

Having established that hiPSCs displayed higher total protein content than corresponding hESC lines, we sought to understand how this could be maintained. To maximise protein synthesis nutrient availability and energy production are key24. The proteomic data show that vital nutrient transporters, known to be important for growth and protein production48 were significantly increased in hiPSCs compared to hESCs. In particular the 3 glutamine transporters (SLC1A5, SLC38A1 and SLC38A2) were all significantly increased in abundance, with additional functional assays showing that this correlated with higher levels of glutamine uptake measured in hiPSCs. Glutamine has been previously shown to fuel growth and proliferation in rapidly dividing cells, including cancer cells25, and could be sustaining higher rates in hiPSCs.

Nutrients provide the fuel, but it is the metabolic proteins that are the engines that convert them to energy. Here our data showed that proteins involved in both glycolysis and glutaminolysis were significantly increased in abundance in hiPSCs. When cells preferentially use the glycolytic pathway, i.e stem cells and cancer cells, there is increased demand for biosynthetic precursors and NADPH49. These precursors can be supplied via the glutaminolysis5052 linked to the TCA, both important mitochondrial processes and both with significantly increased in hiPSCs along with other proteins involved in the electron transport chain. Differences in the mitochondria between hiPSCs and hESCs have been previously reported, but whether they originate from the reprogramming process or are induced by the increased nutrient uptake remains a point of interest.

Secreted proteins, such as growth factors and ECM proteins, are a category of great interest, because their absolute abundance can affect the surrounding cellular microenvironment. hiPSCs were found here to show increased expression levels of growth factors that are linked to cancer and immunosuppression. For example FGF2, an important growth factor for primed pluripotent stem cells, has been shown that to promote ERK activation53, stimulating protein synthesis5456. Thus, the increased abundance of FGF2 could be a feedforward loop further driving/sustaining growth in hiPSCs, however that growth potential is also linked to breast57 and gastric cancers58 as well as gliomas59. Another important growth factor that is increased in abundance in hiPSCs is TGFB1, a known potent inhibitor of T cell responses60,61. We note that the immunogenicity of pluripotent stem cells has important consequences for cell therapy applications. Our data suggest that hiPSCs might have a higher immune evasion potential via multiple mechanisms. They display increased abundance of secreted T cell inhibitors like TGFB1 and ARG162, along with inhibitory ligands such as CD276, CD200 and CD47. An increased inhibitory capacity, combined with tumorigenic potential of hiPSCs63, raises some concerns about the suitability of using reprogrammed hiPSCs for certain types of therapeutic applications.

In summary, our data show that hiPSCs and hESCs, despite their clear similarities, are not identical at both the protein and phenotypic levels. We show that reprogrammed hiPSCs differ from hESCs predominantly in their cytoplasmic and mitochondrial proteome, leading to measurable functional differences affecting their metabolic activity and growth potential. These data can help to inform future strategies to mitigate for these differences as hiPSCs continue to be used in important clinical applications and as disease models.

Materials and Methods

hiPSC and hESC Cell Culture

Human iPS cells (aizi_1,bubh_3, kucg_2, oaqd_3, ueah_1 and wibj_2) and human hESCs (SA121 and SA181, H1, H9) were both grown in identical conditions, maintained in TESR medium64 supplemented with FGF2 (Peprotech, 30 ng/ml) and noggin (Peprotech, 10 ng/ml) on growth factor reduced geltrex basement membrane extract (Life Technologies, 10 μg/cm2) coated dishes at 37°C in a humidified atmosphere of 5% CO2 in air.

Cells were routinely passaged twice a week as single cells using TrypLE select (Life Technologies) and replated in TESR medium that was further supplemented with the Rho kinase inhibitor Y27632 (Tocris, 10 μM) to enhance single cell survival. Twenty-four hours after replating Y27632 was removed from the culture medium. For proteomic analyses cells were plated in 100 mm geltrex coated dishes at a density of 5×104 cells cm-2 and allowed to grow to for 3 days until confluent with daily medium changes.

Immunoblotting

Equal volumes of hiPSC or hESCs protein lysates were boiled in LDS/RA buffer for 5 mins at 95°C and loaded into 4-15% NuPAGE Bis-Tris SDS-PAGE gels in running buffer (50 mM MES, 50 mM Tris, 0.1 % SDS, 1 mM EDTA, pH7.3), transferred onto nitrocellulose membrane (Amersham #10600041) in transfer buffer (8 mM Tris, 30 mM Glycine, 20 % Methanol) and stained with Ponceau S (Sigma-Aldrich, #P7170). Membranes were blocked in TBS-T + 5% BSA for 1 hr at RT and incubated overnight at 4°C in primary antibodies prepared in TBS-T + 5% BSA. Membranes were washed 3 x 15 mins in TBS-T, incubated with secondary antibody for 1 hr at RT, washed, and imaged using Odyssey CLx (LI-COR). Antibodies: Histone H3 (Abcam, ab1791, 1:1000); Histone H4 (Abcam, ab10158, 1:1000), Vimentin (CST, #5741S, 1:1000), IRDye® 680RD Donkey anti-Rabbit IgG Secondary Antibody (LI-COR, 926-68073, 1:10,000).

Cell line selection for mass spectrometry

Human iPS cells (bubh_3, kucg_2, oaqd_3 and wibj_2) and human hESCs (SA121 and SA181, H1 and H9) were analysed by mass spectrometry using TMT as described below.

Protein extraction

Cell pellets were resuspended in 300 µL extraction buffer (4% SDS in 100 mM triethylammonium bicarbonate (TEAB), phosphatase inhibitors (PhosSTOP™, Roche)). Samples were boiled (15 min, 95 °C, 350 rpm) and sonicated for 30 cycles in a bath sonicator (Bioruptor® Pico bath sonicator, Diagenode, Belgium; 30s on, 30s off) followed by probe sonication for 50 s (20s on, 5s off). 2 µL Benzonase® nuclease HC (250 U/µL, Merck Millipore) was added and incubated for 30 min (37 °C, 750 rpm). Reversibly oxidized cysteines were reduced with 10 mM TCEP (45 min, 22 °C, 1,000 rpm) followed by alkylation of free thiols with 20 mM iodoacetamide (45 min, 22 °C, 1,000 rpm, in the dark). Proteins were quantified using the fluorometric EZQTM assay (Thermo Fisher Scientific).

Protein digestion using the SP3 method

Protein extracts were cleaned and digested with the SP3 method as described previously with modifications65,66. Briefly, 50 µL of a 20 µg/µL SP3 bead stock (Sera-Mag SpeedBead carboxylate-modified magnetic particles; GE Healthcare Life Sciences) and 500 µL acetonitrile (ACN; final concentration of 70%) were added to 150 µL of protein extract and incubated at room temperature for 10 min (1000 rpm). Tubes were mounted on a magnetic rack, supernatants were removed and beads were washed twice with 70% ethanol and once with ACN (1 mL each). Beads were resuspended in 80 µL 100 mM TEAB and digested for 4 h with LysC followed by tryptic digestion overnight (1:50 protease:protein ratio, 37 °C, 1,000 rpm).

Peptides were cleaned by addition of 3.5 µL formic acid (final concentration of 4%) and 1.7 mL ACN (final concentration of 95%) followed by incubation for 10 min. After spinning down (1,000 g) tubes were mounted on a magnetic rack and beads were washed once with 1.5 mL ACN. Peptides were eluted from the beads with 100 µL 2% DMSO and acidified with 5.2 µL 20 % formic acid (final concentration of 1%) followed by centrifugation (15,000 g). Peptide amounts were quantified using the fluorometric CBQCA assay (Thermo Scientific).

TMT labelling

For each sample 15 µg peptides per sample were dried in vacuo in a Concentrator plus (Eppendorf) and resuspended in 50 µL 200 mM EPPS pH 8.5. TMT10plex tags (Thermo Scientific) were dissolved in anhydrous ACN and added to the peptide sample in a 1:10 peptide:TMT ratio. Additional anhydrous ACN was added to a final volume of 22 µL. Samples were incubated for 2 h (22 °C, 750 rpm). Unreacted TMT was quenched by incubation with 5 µL 5% hydroxylamine for 30 min. Samples were combined, dried in vacuo and resuspended in 1% TFA followed by clean-up with solid-phase extraction using Waters Sep-Pak tC18 50 mg. Samples were loaded, washed five times with 1 mL 0.1% TFA in water and peptides were eluted with 70% ACN/0.1% TFA (1 mL) and dried in vacuo in a Concentrator plus (Eppendorf).

High pH reversed phase peptide fractionation

TMT labelled peptide samples were fractionated using off-line high pH reversed phase chromatography. Dried samples were resuspended in 5% formic acid and loaded onto a 4.6 x 250 mm XBridge BEH130 C18 column (3.5 µm, 130 Å; Waters). Samples were separated on a Dionex Ultimate 3000 HPLC system with a flow rate of 1 mL/min. Solvents used were water (A), ACN (B) and 100 mM ammonium formate pH 9 (C). While solvent C was kept constant at 10%, solvent B started at 5% for 3 min, increased to 21.5% in 2 min, 48.8% in 11 min and 90% in 1 min, was kept at 90% for further 5 min followed by returning to starting conditions and re-equilibration for 8 min. Peptides were separated into 48 fractions, which were concatenated into 24 fractions and subsequently dried in vacuo. Peptides were redissolved in 5% formic acid and analysed by LC-MS.

LC-MS analysis

TMT labelled samples were analysed on an Orbitrap Fusion Tribrid mass spectrometer coupled to a Dionex RSLCnano HPLC (Thermo Scientific). Samples were loaded onto a 100 µm × 2 cm Acclaim PepMap-C18 trap column (5 µm, 100 Å) with 0.1% trifluoroacetic acid for 7 min and a constant flow of 4 µL/min. Peptides were separated on a 75 µm × 50 cm EASY-Spray C18 column (2 µm, 100 Å; Thermo Scientific) at 50 °C using a linear gradient from 10% to 40% B in 153 min with a flow rate of 200 nL/min. Solvents used were 0.1% formic acid (A) and 80% ACN/0.1% formic acid (B). The spray was initiated by applying 2.5 kV to the EASY-Spray emitter. The ion transfer capillary temperature was set to 275 °C and the radio frequency of the S-lens to 50%. Data were acquired under the control of Xcalibur software in a data-dependent mode. The number of dependent scans was 12. The full scan was acquired in the orbitrap covering the mass range of m/z 350 to 1,400 with a mass resolution of 120,000, an AGC target of 4×105 ions and a maximum injection time of 50 ms. Precursor ions with charges between 2 and 7 and a minimum intensity of 5×103 were selected with an isolation window of m/z 1.2 for fragmentation using collision-induced dissociation in the ion trap with 35% collision energy. The ion trap scan rate was set to “rapid”. The AGC target was set to 1×104 ions with a maximum injection time of 50 ms and a dynamic exclusion of 60 s. During the MS3 analysis, for more accurate TMT quantification, 5 fragment ions were co-isolated using synchronous precursor selection in a window of m/z 2 and further fragmented with a HCD collision energy of 65%. The fragments were then analysed in the orbitrap with a resolution of 50,000. The AGC target was set to 5×104 ions and the maximum injection time was 105 ms.

High-resolution respirometry in wibj_2 and H1 stem cells

Mitochondrial respiration was studied in digitonin-permeabilised WIBJ2 and WA01 stem cells (10 μg / 106 cells) to keep mitochondria in their architectural environment. The analysis was performed in an oxygraphic chamber with thermostat set to 37°C with continuous stirring (Oxygraph-2 k, Oroboros instruments, Innsbruck, Austria). Cells were collected with trypsin, pelleted, and then placed in MiR05 respiration medium (110 mM sucrose, 60 mM lactobionic acid, 0.5 mM EGTA, 3 mM MgCl2, 20 mM taurine, 10 mM KH2PO4, 20 mM HEPES adjusted to pH 7.1 with KOH at 30°C, and 1 g/l BSA essentially fatty acid free). Substrate-Uncoupler-Inhibitor titration protocol number 2 (SUIT-002)67 was used to determine respiratory rates. Briefly, after residual oxygen consumption in absence of endogenous fuel substrates (ROX, in presence of 2.5 mM ADP) was measured, fatty acid oxidation pathway state (F) was evaluated by adding malate (0.1 mM) and octanoyl carnitine (0.2 mM) (OctMP). Membrane integrity was tested by adding cytochrome c (10 μM) (OctMcP). Subsequently, the NADH electron transfer-pathway state (FN) was studied by adding a high concentration of malate (2 mM, OctMP), pyruvate (5 mM, OctPMP), and glutamate (10 mM, OctPGMP). Then succinate (10 mM, OctPGMSP) was added to stimulate the S pathway (FNS), followed by glycerophosphate (10 mM, OctPGMSGpP) to reach convergent electron flow in the FNSGp-pathway to the Q-junction. Uncoupled respiration was next measured by performing a titration with CCCP (OctPGMSGpE), followed by inhibition of complex I (SGpE) with rotenone (0.5 μM, SGpE). Finally, residual oxygen consumption (ROX) was measured by adding Antimycin A (2.5 μM). ROX was then subtracted from all respiratory states, to obtain mitochondrial respiration. Results are expressed in pmol · s−1 · 106 cells. The P/E control ratio, which reflects the control by coupling and limitation by the phosphorylation system, was subsequently calculated by dividing the OctPGMSGpP value by the OctPGMSGpE value.

Radiolabelled glutamine uptake (protocol was adapted from68)

Two hiPSC lines (wibj_2 and oaqd_3) with 3 technical replicates each were compared to two hESC lines (SA121 and SA181) with 3 technical replicates of each. Both hiPSCs and hESCs were plated in 6-well plates 2 days before the transport assay (5e4 cells/cm2 – this gives 1e6 cells/well on “uptake day”). The cell growth media was carefully aspirated so as not to disturb the adherent monolayer of cells. They were washed gently by pipetting with 5 mls preheated (37°C) uptake solution (HBSS (pH 7.4), GIBCO) and aspirating off. This was repeated 3 times. They were then incubated with 0.5 ml of uptake solution containing [3H]glutamine (5 μCi/ml; perkin elmer, NET 55100) in either the presence or absence of L-glutamine (5 mM; sigma) for 2 min.

Glutamine uptake was stopped by removing the uptake solution and washing cells with 2 ml of ice-cold stop solution (HBSS with 10 mM nonradioactive L-glutamine) three times. After the third wash, the cells were lysed in 200 μl of 0.1% SDS and 100 mM NaOH, and 100 μl was used to measure the radioactivity associated with the cells. Finally 100 μl sample was added to scint vials containing 3 mls scintillant (OptiPhase HiSafe 3, Perkin Elmer). β-radioactivity was measured with Tri-Carb 4910TR liquid scintillation counter.

The net glutamine CPM values where calculated by subtracting the Quench CPM values from the Glutamine CPM values.

TEM Sample Preparation

Cells were fixed on the dish in 4% paraformaldehyde and 2.5% glutaraldehyde in 0.1M sodium cacodylate buffer (pH 7.2) for 30 minutes then scrapped and transferred to a tube and fixed for a further 30 minutes prior to pelleting. The pellets were cut into small pieces, washed 3 times in cacodylate buffer and then post-fixed in 1% OsO4 with 1.5% Na ferricyanide in cacodylate buffer for 60 min. After another 3 washes in cacodylate buffer they were contrasted with 1% tannic acid and 1% uranyl acetate. The cell pellets were then dehydrated through alcohol series into 100% ethanol, changed to propylene oxide left overnight in 50% propylene oxide 50% resin and finally embedded in 100% Durcupan resin (Sigma). The resin was polymerised at 60°C for 48hrs and sectioned on a Leica UCT ultramicrotome. Sections were contrasted with 3% aqueous uranyl acetate and Reynolds lead citrate before imaging on a JEOL 1200EX TEM using a SIS III camera.

Proteomics search parameters

The data were searched and quantified with MaxQuant69 (version 1.6.7) against the human SwissProt database from UniProt70 (November 2019). The data were searched with the following parameters: type was set to Reporter ion on MS3 with 10plex TMT, stable modification of carbamidomethyl (C), variable modifications of oxidation (M), acetylation (proteins N terminus) and deamidation (NQ). The missed cleavage threshold was set to 2, and the minimum peptide length was set to 7 amino acids. The false discovery rate was set to 1% for positive identification at the protein and peptide spectrum match (PSM) level.

Unique, shared and razor peptides

Peptides which are exclusive to a single protein group are considered unique peptides. Peptides whose sequences match more than one protein group are called shared peptides. Razor peptides are shared peptides whose intensity gets assigned to a single protein group despite matching multiple protein groups.

Data filtering

All protein groups identified with less than either 2 razor or unique peptides or labelled as ‘Contaminant’, ‘Reverse’ or ‘Only identified by site’ were removed from the analysis.

Peptide normalisation

For supplemental figures 2 & 3 peptide intensities were divided by the sum of the intensity from all histone peptides and were multiplied by 1,000,000.

Copy number calculations

Protein copy numbers were estimated following the “proteomic ruler” method19, but adapted to work with TMT MS3 data. The summed MS1 intensities were allocated to the different experimental conditions according to their fractional MS3 reporter intensities.

Protein content estimations

The protein content was estimated using the following formula: CN × MW and then converting the data from Daltons to picograms, where CN is the protein copy number and MW is the protein molecular weight (in Da).

28S to 39S ratios

For each hiPSC and hESC line the ratio of the small to large subunits of the mitochondrial ribosomes were calculated using the sum of the estimated copy numbers for all subunits of the 28S complex divided by the sum of estimated copy numbers of all 39S subunits.

Differential expression analysis

Fold changes and P-values were calculated in R. For individual proteins the p-values were calculated with the bioconductor package LIMMA71 version 3.7. The Q-values provided were generated in R using the “qvalue” package version 2.10.0. P-values for protein families and protein complexes were calculated in R using Welch’s T-test.

hiPSC vs hESC overrepresentation analysis

All overrepresentation analysis were done on WebGestalt. The first analysis selected proteins with a fold change > 2 and a q-value < 0.001. The second analysis selected proteins whose fold change was lower than the median minus one standard deviation (0.195) and a q-value < 0.001. Both analyses used all identified proteins with 2 or more razor and unique peptides as a background and required an FDR lower than 0.05.

Peptide coverage figures

The supplemental figures showing the peptide coverage across H1 and H2 histones (Fig. S2 &S3) were generated with Protter72.

Data availability

The raw files and the mzTab outputs were uploaded to PRIDE as a full submission under the identifier PXD014502 and are available at https://www.ebi.ac.uk/pride/archive/projects/PXD014502

Acknowledgements

We would like to thank Gabriel Sollberger well as all members of the Lamond Laboratory for their input and advice. This work was supported by the Wellcome Trust/MRC grant (098503/E/12/Z), Wellcome Trust grants (073980/Z/03/Z, 105024/Z/14/Z, 206293/Z/17/Z, 097418/Z/11/Z, 205023/Z/16/Z), BBSRC Project Grant (BB/V010948/1), EPSRC grant (EP/Y010655/1), a Wellcome Trust Equipment Award (202950/Z/16/Z) and a UK Research Partnership Infrastructure Fund award to the Centre for Translational and Interdisciplinary Research.

Author contributions

A.J.B conceived the study, planned the experiments, analysed and interpreted the data. E.G executed all the proteomic sample preparation, and the mass spectrometry experiments. L.V.S performed the glutamine uptake assay and the FACS analysis. A.R.P performed the TEM experiments. F.S performed the respiration analysis. H.J. performed the EZQ assay and assisted with data interpretation. L.D cultured the hESC and hiPSCs, performed the pluripotency marker western blot. C.E and E.K.J.H performed the histone western blots. H.Y., M.P and J.S helped to interpret the data. A.I.L, D.A.C and G.F supervised the project and helped to interpret the data. The paper written be A.J.B and A.I.L and edited by all authors.

Declaration of interests

E.G now works for Boehringer Ingelheim Pharma GmbH & Co. KG. A.I.L, M.P and J.S are board members of Tartan Cell Technologies Ltd. M.P and J.S are board members of Glencoe Software Ltd and AIL is a board member of Platinum Informatics Ltd.