Overview of the study.

Proteomic characterization of exosomes derived from 285 cell lines from four studies.

(A) Overlapped proteins from four different studies of cell line-derived exosomes. (B) PCA plot of cancer and control cell line-derived exosomes. (C) Positivity for 8 commonly used exosomal protein biomarkers in various cell lines. The percentage of samples expressing each protein is shown in the boxes. Darker red indicates a higher percentage. (D) Annotation of the proteins detected in more than 90% of all samples. (E) GO and KEGG pathway enrichment analysis of the proteins detected in more than 90% of all samples. (F) Plasma membrane proteins detected in more than 90 % of all samples.

Proteomic characterization of exosomes derived from cell lines and tissues.

(A) Proteins detected at higher frequency in cancer cell line-derived exosomes. (B) Positivity for 11 commonly used exosomal protein biomarkers in various tissues. (C) Overlapping proteins (>90% frequency) between cell line- and tissue-derived exosomes. (D) Positivity of five plasma membrane proteins detected in more than 90% of both cell line- and tissue-derived exosomes.

Identification of the signature proteins of plasma or serum-derived exosomes and the evaluation of random forest classifier.

(A) Overlapping exosome proteins detected in the plasma and serum of 205 cancer and 51 control samples from five studies. (B) Heat map of 46 overlapping exosome proteins in cancer and control plasma or serum samples. (C) AUROC score of the random forest classifier on including various numbers of protein features. (D) AUROC of different models in comparison. (E) Classification error matrix of the 75% training set using a random forest classifier for the 18 selected proteins. The number of samples is indicated in each box. (F) AUROC score of the random forest classifier trained using 75% of the dataset. Other metrics are indicated on right. (G) Classification error matrix of 25% testing set using a random forest classifier for the 18 selected proteins. The number of samples is indicated in each box. (H) AUROC score of the random forest classifier tested using 25% of the dataset. Other metrics are indicated on right.

Identification of signature proteins expressed by plasma or serum-derived exosomes for classifying five common cancer types and evaluation of random forest classifier.

(A) PCA plot of plasma or serum-derived exosomes from five cancer types. (B) AUROC score of the random forest classifier by including various number of protein features. (C-D) Classification error matrix of a 60% training set and 40% testing set to classify the five cancer types using a random forest classifier for the 5 selected proteins. The number of samples is indicated in each box. (E) Protein abundance of five selected protein features in five cancer types.

Identification of signature proteins expressed by urine-derived exosomes and evaluation of random forest classifier.

(A) Overlapping exosome proteins detected in the urine from 261 cancer and 124 control samples from four studies. (B) PCA plot of cancer and control urine-derived exosomes. (C) AUROC score of the random forest classifier by including a various number of protein features. (D) Protein abundance of 17 selected protein features in cancer- and control urine-derived exosomes. (E) Classification error matrix of 75% training set using a random forest classifier for the 17 selected proteins. The number of samples is indicated in each box. (F) AUROC score of the random forest classifier trained using 75% of the dataset. Other metrics are indicated on right. (G) Classification error matrix of 25% testing set using a random forest classifier for the 17 selected proteins. The number of samples is indicated in each box. (H) AUROC score of the random forest classifier tested using 25% of the dataset. Other metrics are indicated on right.