Single cell RNA-seq identifies the origins of heterogeneity in efficient cell transdifferentiation and reprogramming

  1. Mirko Francesconi
  2. Bruno Di Stefano
  3. Clara Berenguer
  4. Luisa de Andrés-Aguayo
  5. Marcos Plana-Carmona
  6. Maria Mendez-Lago
  7. Amy Guillaumet-Adkins
  8. Gustavo Rodriguez-Esteban
  9. Marta Gut
  10. Ivo G Gut
  11. Holger Heyn
  12. Ben Lehner  Is a corresponding author
  13. Thomas Graf  Is a corresponding author
  1. Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Spain
  2. Harvard University, United States
  3. Institució Catalana de Recerca i Estudis Avançats (ICREA), Spain
  4. Universitat Pompeu Fabra (UPF), Spain
7 figures, 1 table and 8 additional files

Figures

Figure 1 with 4 supplements
Single cell gene expression analysis of B cell to macrophage transdifferentiation and B cell to iPSC reprogramming.

(a) Overview of the experimental design, showing time points analysed. (b) Single cell projections onto the first two diffusion components (DC1 and DC2). c-f, as in b, with top 50% of cells expressing selected markers for B cells in red (c) GMP/granulocytes in orange (d) monocytes in purple (e) macrophages in light blue (f) and pluripotent cells in orange-red (g). (h-i) Projection of transdifferentiating cells onto B cell-, macrophage-, and monocyte-specific independent components (h) and reprogramming cells onto, B cell-, mid- and late- pluripotency specific independent components as defined in Figure 1—figure supplement 2a (i).

https://doi.org/10.7554/eLife.41627.002
Figure 1—figure supplement 1
Data pre-processing, batch correction and independent component analysis.

(a) Overview of the data collection and pre-processing steps. (b) Distribution of the top 100 eigenvalues and of single cell projections onto the first four principal components across pools and time points from the gene expression PCA before batch correction. (c) Distribution of the top 100 eigenvalues and of single cell projections onto the first four principal components and component 13 across pools and time points from the PCA of gene expression after MNN batch correction. (d) Distribution of single cell projections onto the 35 independent components across pools and time points. (e), Distribution of the single cell projections onto independent component 33 across plate columns.

https://doi.org/10.7554/eLife.41627.003
Figure 1—figure supplement 2
Characterisation of independent components, gene expression reconstruction and diffusion maps.

(a) Heatmap of the correlations between the gene loadings of selected single cell independent components and gene loadings of selected independent components from the reference mouse cell atlas (Hutchins et al., 2017) (see methods for the definition of cell type specific components). (b) Distribution of the single cell projections (scores) onto the macrophage, mid pluripotency, granulocyte, monocyte, pre-B, late pluripotency, G2/M, oxidative phosphorylation, G1/S and a second oxidative phosphorylation specific components across time points. (c) Cell type projections (scores) onto selected atlas components. (d) Scheme of ICA decomposition the expression data matrix into a matrix of independent sources and mixing matrix. (e) Scheme of reconstruction of gene expression after filtering out components capturing batch effects. (f) diffusion maps calculated with a smaller (left) or larger (right) gaussian kernel sigma than the one used for diffusion map shown in Figure 1b.

https://doi.org/10.7554/eLife.41627.004
Figure 1—figure supplement 3
Single cell analysis of reprogramming and transdifferentiation.

(a) Heatmap of the mean similarity score (scaled correlation, z-score) of single cells to relevant cell types from the reference atlas at each time point during transdifferentiation and reprogramming. (b) Boxplot of Pearson’s correlation between each single cell and all 272 cell types from the atlas at each time point. (c–i) Single cell projections onto the first two diffusion components, with cells expressing top 50% of selected markers for B cells in red (c) GMPs/granulocytes in light orange (d), monocytes in purple (e) and macrophages in light blue (f) early (g), mid (h) and late (i) pluripotency markers in orange-red. (j) Heatmap of genes differentially expressed during transdifferentiation with fold change of at least 1.3 between adjacent time points, single cells are sorted according to diffusion pseudotime. (k) Heatmap of genes differentially expressed at 5% FDR and with fold change of at least 1.3 between 18 hr and the first time point after the branching between transdifferentiation and reprogramming (l) Heatmap of genes upregulated during reprogramming at 5% FDR and with fold change of at least 1.3 between adjacent time points, single cells are sorted according to diffusion pseudotime.

https://doi.org/10.7554/eLife.41627.005
Figure 1—figure supplement 4
Gene expression distribution of markers during reprogramming and transdifferentiation.

(a-b) Dot plots showing the distribution of gene expression of markers of different cell types at each time point during trans-differentiation (a) and of key pluripotency markers during reprogramming (b).

https://doi.org/10.7554/eLife.41627.006
Figure 2 with 3 supplements
Myc activity correlates with differences in single cell transdifferentiation and reprogramming trajectories.

(a) Distribution of gene expression similarity between single cells and reference bone marrow derived macrophages (Hutchins et al., 2017) (acquisition of macrophage state) during transdifferentiation. (b) Correlation between the Myc component and acquisition of macrophage state from a; start and end time points were omitted to improve clarity (they are presented in Figure 2—figure supplement 1a). (c) Myc component at the various transdifferentiation time points. d-f, Single cell trajectories of the B cell state (d), the GMP state (e) and the granulocyte state (f) related to the acquisition of the macrophage state during transdifferentiation. The cells at the respective time points are coloured according to Myc component levels. (g) Distribution of expression similarity between single cells and reference embryonic stem cells (ESCs) during reprogramming. (h) Correlation between Myc component and acquisition of pluripotency from g. (i) Myc component at the various reprogramming time points. (j-l) Single cell trajectories of the B cell state (j), GMP state (k) and inner cell mass state (l) related to the acquisition of the pluripotent state (ESCs) (see also Figure 3—figure supplement 1).

https://doi.org/10.7554/eLife.41627.007
Figure 2—figure supplement 1
Predicting the speed of transdifferentiation.

(a) Expression similarity of single cells with reference bone marrow derived macrophages 0 hr, 6 hr, 18 hr, 42 hr, 66 hr and 114 hr after C/EBPα induction. (b–g) Correlation between each independent component and the expression similarity defined in (a) at 0 hr (b), 6 hr (c), 18 hr (d), 42 hr (e), 66 hr (f) and 114 hr (g) after C/EBPα induction.

https://doi.org/10.7554/eLife.41627.008
Figure 2—figure supplement 2
Predicting the speed of reprogramming.

(a) Correlation between the Myc component and expression similarity of single cells with ESC at 0 hr and 18 hr after C/EBPa induction and at D2, D4, D6, and D8 after OSKM induction. (b–g) Correlation between each independent component and the expression similarity defined in (a) at 0 hr (b) and 18 hr after C/EBPa induction (c), and at D2 (d), D4 (e), D6 (f) and D8 (g) after OSKM induction.

https://doi.org/10.7554/eLife.41627.009
Figure 2—figure supplement 3
High Myc component correlates with faster route towards reprogramming also when factoring out Myc component and cell cycle components before the computation of the similarity score.

(ab) Correlation between Myc component and expression similarity of single cells to reference ESCs (acquisition of pluripotency) at each time point during reprogramming, calculated after factoring out Myc component (a) and both Myc and cell cycle components (b) from both single cell and cell atlas gene expression data (see Materials and Methods). (c–e) Loss of the B cell (c), GMP (d), and monocyte (e) state in relation to acquisition of pluripotency (calculated as in a) at each time point during reprogramming. (f–h) Loss of the B cell (f), (g), and gain of inner-cell-mass-like state (h) and of a placenta-like state in relation to acquisition of pluripotency (calculated as in b) at each time point during reprogramming. Colours indicate the levels of Myc component.

https://doi.org/10.7554/eLife.41627.010
Figure 3 with 2 supplements
Two types of pre-B cells exhibit distinct cell conversion plasticities.

(a) Heatmap showing the expression of Myc target genes, G1/S and G2/M specific genes in the starting pre-B cells sorted by Myc component. (b) Pearson’s correlation between total mRNA molecules per cell and Myc component. (c) Similarity score of single cells binned by Myc component (bottom 20%, mid and top 20%) with reference large and small pre-BII cells. (d) Representative FACS plot of starting pre-B cells showing forward (FSC) and side scatter (SSC). (e) Representative FACS analysis of Myc levels detected in the 30% largest and the 30% smallest pre-B cell fractions. (f) FACS plots of myeloid marker (Mac-1) and B cell marker (CD19) expression during induced transdifferentiation of sorted large and small pre-BII cells. (g) Quantification of the results shown in f (n = 3 biological replicates, error bars indicate mean ± s.d. Statistical significance was determined using multiple t-test with 1% false discovery rate). (h) Visualisation of iPSC-like colonies (stained by alkaline phosphatase) 12 days after OSKM induction of sorted large and small pre-BII cells. (i) Quantification of the results shown in h (n = 10 biologically independent samples (cell cultures) for large and n = 9 biologically independent samples (cell cultures) for small cells, with error bars indicating mean ±s.d. Statistical significance was determined using a two-tailed unpaired Student’s t-test). (j) Scatterplot showing the correlation between Myc expression (Jaitin et al., 2014) in different starting hematopoietic cell types (x-axis) and their corresponding (logit transformed) reprogramming efficiency (y-axis). GMP: granulocyte monocyte progenitor, CMP: common myeloid progenitor, CLP: common lymphoid progenitor, LT-HSC: long term hematopoietic stem cells, HSC-P: short term hematopoietic stem cells. (k) Correlation between Myc component and reprogramming efficiency in various somatic cell types, including the hematopoietic cells shown in j.

https://doi.org/10.7554/eLife.41627.011
Figure 3—figure supplement 1
Experimental data relevant for Figure 3.

(a) Two-dimensional t-SNE dimensionality reduction of starting pre-B cells coloured by the level of Myc component. (b) Similarity score of single cells binned by Myc component (bottom 20%, mid and top 20%) with reference cycling and non-cycling pre-B cells (Painter et al., 2011). (c) Top: FACS plots from pre-B cells obtained from three separate mice, showing the distribution of cells by volume (FSC) and granularity (SSC). Bottom: Myc expression profiles obtained for large, intermediate and small cells (gated in the profiles on the top) after intracellular immunostaining and FACS analysis. (d) Cell proliferation analysis by FACS of uninduced pre-B cells by EdU incorporation for 2 hr. n = 3 biologically independent samples, error bars indicate mean ± s.d.. (e) Monitoring cell volume and granularity during induced transdifferentiation of large and small pre-BII cells by SSC and FSC. n = 3 biologically independent samples, error bars indicate mean ±s.d.. P-values are from T-test, and corrected for multiple testing using false discovery rate (FDR). (f) Cell viability analysis by DAPI incorporation in small and large pre-B cells undergoing reprogramming. Data are represented relative to large pre-BII cells at day 3 of reprogramming, n = 3 biologically independent samples, error bars indicate mean ±s.d.. P-values are from T-test, and corrected for multiple testing using false discovery rate (FDR). (g) Number of AP+ iPSC colonies at day 12 of reprogramming, obtained from large and small pre-BII cells pre-treated for either 6 hr or 18 hr of C/EBPa induction. n = 3 biologically independent samples (cell cultures), error bars indicate mean ± s.d.. P-values are from unpaired two tailed T-test.

https://doi.org/10.7554/eLife.41627.012
Figure 3—figure supplement 2
Gating strategies for FACS analyses.

(a) Gating strategy for Myc staining, corresponding to Figure 3e and Figure 3—figure supplement 1b. (b) Gating strategy for EdU incorporation, corresponding to Figure 3—figure supplement 1b. (b) Gating strategy for transdifferentiation, corresponding to Figure 3f.

https://doi.org/10.7554/eLife.41627.013
Author response image 1
Author response image 2
Author response image 3

Tables

Key resources table
Reagent type
(species) or resource
DesignationSource or referenceIdentifiersAdditional
information
Gene
(Mus musculus)
cebpaNAEnsembl:
ENSG00000245848
Strain, strain
background
(Mus musculus)
Pou5f1GFP transgenic
mouse
Boiani et al., 2002NAStrain: C57Bl
/6 × DBA/2
Strain, strain
background
(Mus musculus)
Gt(ROSA)26Sortm1
(rtTA*M2)Jae Col1a1tm3
(tetO-Pou5f1,-Sox2,-
Klf4,-Myc)Jae/J
The Jackson LaboratoryCat# 011004;
RRID:IMSR_JAX:011004
Strain:
(C57BL/6 × 129S4/
SvJae)F1
Strain, strain
background
(Mus musculus)
Pou5f1-GFP
OSKM-reprogrammable
Jaitin et al. (2014),
Di Stefano et al. (2016)
NAStrain: C57BL
/6 × 129
Cell line
(Homo sapiens)
PlatE retroviral
packaging cell line
Cell BiolabsCat# RV-101;
RRID: CVCL_B488
Cell line
(Mus musculus)
S17 stromal cell lineFrom Dr. Dorshkind,
UCLA.
(Collins and Dorshkind, 1987)
RRID: CVCL_E226
Cell line
(Mus musculus)
Mouse Embryonic
Fibroblasts, Irradiated
GIBCOCat# A34180
Recombinant
DNA reagent
pMSCV-Cebpa-IRES-hCD4Produced in-house,
(Bussmann et al., 2009)
NA
AntibodyMouse monoclonal
APC Anti-human
CD4 (RPA-T4)
BD BiosciencesCat# 555349;
RRID: AB_398593
Dilution
used = 1:400
AntibodyMouse monoclonal
biotin anti-human
CD4 (RPA-T4)
eBioscienceCat# 13–0049;
RRID:AB_466337
Dilution
used = 1:400
AntibodyRat monoclonal
Anti-Mouse CD16/CD32
(Mouse BD Fc Block)
BD BiosciencesCat# 553142;
RRID: AB_394654
Dilution
used = 1:400
AntibodyRat monoclonal
Pe-cy7 Anti-mouse
CD19 (1D3)
BD BiosciencesCat# 552854;
RRID:AB_394495
Dilution
used = 1:400
AntibodyMouse monoclonal
APC Anti-mouse
CD11b (44)
BD BiosciencesCat# 561015;
RRID:AB_10561676
Dilution
used = 1:400
AntibodyRat monoclonal
biotin Anti-mouse
CD19 (1D3)
BD BiosciencesCat# 553784;
RRID: AB_395048
Dilution
used = 1:400
AntibodyRabbit monoclonal
[Y69] to c-Myc
AbcamCat# ab32072;
RRID:AB_731658
Dilution
used = 1:76
AntibodyGoat Polyclonal
Anti-Rabbit IgG
H and L Alexa Fluor 647
Life technologiesCat# A32733;
RRID:AB_2633282
Dilution
used = 1:2000
Strain, strain
background
(Escherichia coli)
E. coli: BL21(DE3)
Competent
New England BiolabsCat# C2527I
Peptide,
recombinant protein
Recombinant
murine IL-7
PeprotechCat# 217–17
Peptide,
recombinant
protein
Recombinant
murine IL-4
PeprotechCat# 214–14
Peptide,
recombinant
protein
Recombinant
murine IL-15
PeprotechCat# 210–15
Peptide,
recombinant
protein
ESGRO Recombinant
mouse LIF protein
Merk MilliporeCat# ESG1106
Commercial
assay or kit
Click-IT EdU
Cytometry assay kit
InvitrogenCat# C10425
Commercial
assay or kit
miRNeasy mini kitQiagenCat# 217004
Commercial
assay or kit
SYBR Green QPCR
Master Mix
Applied BiosystemsCat# 4309155
Commercial
assay or kit
Alkaline Phosphatase
Staining Kit II
StemgentCat# 00–0055
Commercial
assay or kit
High Capacity
RNA-to-cDNA kit
Applied BiosystemsCat# 4387406
Chemical
compound, drug
17β-estradiolMerck MilliporeCat# 3301
Chemical
compound, drug
MEK inhibitor
(PD0325901)
SelleckchemCat# S1036
Chemical
compound, drug
Doxycycline hyclateSigma-AldrichCat# D9891
Chemical
compound, drug
L-Ascorbic AcidSigma-AldrichCat# A92902
Chemical
compound, drug
GSK3b inhibitor
(CHIR-99021)
SelleckchemCat# S1263
OtherDMEM MediumGibcoCat# 12491015
OtherRPMI 1640 MediumGibcoCat# 12633012
OtherKnockout-DMEMGibcoCat# 10829018
OtherNeurobasal MediumGibcoCat# 21103049
OtherDMEM-F12 MediumGibcoCat# 12634010
OtherFetal Bovine Serum,
E.U.-approved, South
America origin
GibcoCat# 10270–106
OtherEmbryonic stem-cell
FBS, qualified, US origin
GibcoCat# 10270–106
OtherKnockOut
Serum Replacement
GibcoCat# A3181502
OtherPen StrepGibcoCat# 15140122
OtherL-Glutamine (200 mM)GibcoCat# 25030081
OtherSodium Pyruvate (100 mM)GibcoCat# 11360070
OtherMEM Non-Essential
Amino Acids Solution (100X)
GibcoCat# 11140068
Other2-MercaptoethanolInvitrogenCat# 31350010
OtherN-2 Supplement (100X)GibcoCat# 17502048
OtherB-27 Serum-Free
Supplement (50X)
GibcoCat# 17504044
OtherTrypLE Express
Enzyme (1X)
GibcoCat# 12605010
OtherTrypsin-EDTA (0.05%)GibcoCat# 25300054
OtherMACS Streptavidin
MicroBeads
Miltenyi BiotecCat# 130-048-101
OtherMACS LS
magnetic columns
Miltenyi BiotecCat# 130-042-401
Software, algorithmRR Project for Statistical
Computing
http://www.r-project.org/
RRID:SCR_001905

Additional files

Supplementary file 1

Gene cluster membership and gene loadings on each independent component for each detected gene.

The sign of cluster membership is positive if the gene has the highest absolute loading on the positive side of the component and negative if the highest absolute loading is on the negative side of the component.

https://doi.org/10.7554/eLife.41627.015
Supplementary file 2

Total mRNA count, number of detected genes, and projection onto each independent component, for each single cell.

https://doi.org/10.7554/eLife.41627.016
Supplementary file 3

Fisher’s test based gene set enrichment analysis on Gene Ontology categories (biological process) for each gene cluster derived from ICA.

Includes odds ratios, p-values and FDR, number of genes associated with each category, number and names of genes included both in the cluster and in the category.

https://doi.org/10.7554/eLife.41627.017
Supplementary file 4

Fisher’s test based gene set enrichment analysis on hallmark genesets for each gene cluster derived from ICA.

It includes odds ratio, p-value and FDR, number of genes included in each category, number and names of genes included both in the cluster and in the category.

https://doi.org/10.7554/eLife.41627.018
Supplementary file 5

Reprogramming efficiencies for different cell types and expression of Myc from Jaitin et al. (2014) and Myc component from the mouse cell type atlas.

https://doi.org/10.7554/eLife.41627.019
Supplementary file 6

Fisher’s test based gene set enrichment analysis on both GO and hallmark gene sets for genes differentially expressed with a fold change of at least 1.3 between adjacent time points during reprogramming and transdifferentiation.

Includes odds ratio, p-value and FDR, number of genes included in each category, number and names of genes both included both in the cluster and in the category.

https://doi.org/10.7554/eLife.41627.020
Supplementary file 7

Fisher’s test based gene set enrichment analysis on both GO and hallmark gene sets for genes in the clusters shown in the heatmaps of supplementary Figure 3j-l.

https://doi.org/10.7554/eLife.41627.021
Transparent reporting form
https://doi.org/10.7554/eLife.41627.022

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Mirko Francesconi
  2. Bruno Di Stefano
  3. Clara Berenguer
  4. Luisa de Andrés-Aguayo
  5. Marcos Plana-Carmona
  6. Maria Mendez-Lago
  7. Amy Guillaumet-Adkins
  8. Gustavo Rodriguez-Esteban
  9. Marta Gut
  10. Ivo G Gut
  11. Holger Heyn
  12. Ben Lehner
  13. Thomas Graf
(2019)
Single cell RNA-seq identifies the origins of heterogeneity in efficient cell transdifferentiation and reprogramming
eLife 8:e41627.
https://doi.org/10.7554/eLife.41627