Diagnostic potential for a serum miRNA neural network for detection of ovarian cancer

  1. Kevin M Elias
  2. Wojciech Fendler
  3. Konrad Stawiski
  4. Stephen J Fiascone
  5. Allison F Vitonis
  6. Ross S Berkowitz
  7. Gyorgy Frendl
  8. Panagiotis Konstantinopoulos
  9. Christopher P Crum
  10. Magdalena Kedzierska
  11. Daniel W Cramer
  12. Dipanjan Chowdhury  Is a corresponding author
  1. Brigham and Women’s Hospital, Dana-Farber Cancer Institute, United States
  2. Harvard Medical School, United States
  3. Brigham and Women’s Hospital, United States
  4. Medical University of Lodz, Poland
  5. Dana-Farber Cancer Institute, United States
  6. Harvard School of Public Health, United States
10 figures, 6 tables and 12 additional files


Flowchart of study design.

(a) Protocol for miRNA sequencing, filtering, batch adjustment and separation into the training and testing sets. (b) Protocol for model development and testing.

Clinical performance characteristics of the tested models.

Sensitivity (blue bars) and specificity (orange bars) of the classifiers on the testing set depending on the method of variable selection. Whiskers denote 95% Confidence Intervals. (a) – Performance of models created on the subset of miRNAs selected using the significance-based filter. (b) Performance of models created on variables selected using the CFS subset algorithm. (c) Performance of models created using variables selected by the fold change-based filter. The red arrow denotes the model with the best performance characteristics, the neural network analysis using the fold change-based filter variable.

ROC curves for the neural network analysis.

(a) Performance of the neural network on the training set of raw, non-batch-adjusted data (red line) and in the batch-adjusted training set (black line) (b) Performance of the neural network on raw (red line) and batch-adjusted (black line) data in the testing set.

Figure 4 with 2 supplements
ROC curves for neural network analysis compared to CA-125.

The neural network (AUC 0.93; 95% CI 0.88–0.97) significantly outperformed CA125 (AUC 0.74; 95% CI 0.65–0.83) in terms of overall operating characteristics (p=0.001).

Figure 4—figure supplement 1
Correlations between the miRNAs (vertical axes) of the neural network and CA-125 (horizontal axes) in the cancer (red markers) and benign/borderline/control (blue markers) groups.

(a) miR-23b (b) miR-29a (c) miR-32 (d) miR-320d (e) miR-1246 (f) miR-92a (g) miR-150 (h) miR-200a (i) miR305 (j) miR-1307 (k) miR-200c (l) miR-203a (m) miR-320c (n) miR-450b. None of the correlations were significant in either the training or testing set.

Figure 4—figure supplement 2
Performance of a two-tiered algorithm for ovarian cancer diagnosis incorporating both the neural network (NN) and a CA-125 cut-off of 35 U/ml.

Subjecting all negative neural network algorithm results to a second review with CA-125 would increase the probability of a false positive test result from 4.2% (5/120) to 19.2% (23/120) and a false negative rate from 5.8% (7/120) to 13.3% (16/120). If the tests were considered hierarchical so that only samples classified as negative by the neural network were then examined by CA-125, this would identify three additional cases of invasive cancer but at the expense of 19 additional false positive results. FP – false positive, TP – true positive, FN – false negative, TN – true negative.

Specificity of miRNA signature for ovarian cancer compared to other diagnoses.

The neural network 14 miRNA signature did not separate any other diagnoses from the control group in the published dataset by Keller, et al 13. The study also included 70 healthy controls. The number of subjects (n) denotes the number of cases of the given diagnosis in the Keller, et al dataset. (a) Pancreatic ductal cancer (n = 45); (b) Prostate cancer (n = 23); (c) Stomach cancer (n = 13); (d) Other pancreatic cancers (n = 48); (e) Melanoma (n = 35); (f) Lung cancer (n = 32); (g) Periodontitis (n = 18); (h) Pancreatitis (n = 38); (i) Multiple sclerosis (n = 23); (j) Acute MI (n = 20); (k) Chronic obstructive pulmonary disease (n = 24); (l) Sarcoidosis (n = 45). (m) Overall, neural network was highly specific for ovarian cancer cases against all other diagnoses (i.e. healthy controls or other cancers).

ROC curve for neural network analysis using qPCR inputs from the clinical test set.
Change in miRNA expression from preop to post-operative day three after surgical cytoreduction.

n = 27.

In situ expression of selected miRNAs from the serum signature.

Sections of fallopian tubes showing serous tubal intraepithelial carcinoma (STIC) lesions and Stage I high grade serous ovariancancer (HGSOC). Lesional cells are indicated by TP53 and Ki-67 staining. (top) STIC lesion in continuity with normal fallopian tube. 20x. (middle) STIC lesion in continuity with normal fallopian tube and invasive cancer with p53-null lesion. 10x. (bottom) HGSOC intraluminal to the fallopian tube. 10x.

Principal component analysis identified a prominent batch effect among the study populations.

(Left) Before batch effect removal. (Right) After batch effect removal using ComBat . ERASMOS – Effects of Regional Analgesia on Serum miRNA after Oncology Surgery Study. PMP – Pelvic Mass Protocol. NECC – New England Case Control study.

Hierarchical clustering of the eleven statistically significant miRNAs identified using univariate analysis.

While most of the patients with cancer clustered together, considerable heterogeneity was evident, and no clear separation of the groups could be achieved using any single miRNA.



Table 1
Demographics of patients in the model study populations.
(n = 60)
PMP/NECC (n = 119*)p-value
Age, years, median (SD)57 (9.8)56 (7.1)0.44
CA-125, units/ml, median (SD) 155 (689.8)88.1 (1335.5)0.72
Histology, n (%)
 Control0 (0)15 (12.6)<0.0001
 Serous cystadenoma/cystadenofibroma7 (11.7)14 (11.8)
 Endometrioma0 (0)15 (12.6)
 Other benign lesion9 (15.0)0 (0)
 Borderline mucinous tumor2 (3.3)0 (0)
 Borderline serous tumor5 (8.3)15 (12.6)
 Stage I/II serous adenocarcinoma5 (8.3)20 (16.8)
 Stage III/IV serous adenocarcinoma19 (31.2)10 (8.4)
 Stage I/II clear cell/endometrioid adenocarcinoma6 (10.0)20 (16.8)
 Stage III/IV clear cell/endometrioid adenocarcinoma0 (0)10 (8.4)
 Mucinous adenocarcinoma1 (1.7)0 (0)
 Other ovarian cancer10 (10.0)0 (0)
Stage, n (%)
Not applicable16 (26.7)59 (49.6)<0.0001
 I9 (15.0)22 (18.5)
 II8 (13.3)18 (15.1)
 III19 (31.2)18 (15.1)
 IV8 (13.3)2 (1.7)
Grade, n (%)
 Not applicable16 (26.7)44 (37.0)0.07
 Borderline7 (11.7)15 (12.6)
 1 (well-differentiated)6 (10.0)12 (10.1)
 2 (moderately differentiated)3 (5.0)12 (10.1)
 3 (poorly differentiated)28 (46.7)36 (30.3)
  1. ERASMOS – Effects of Regional Analgesia on Serum miRNA after Oncology Surgery Study

    PMP – Pelvic Mass Protocol

  2. NECC – New England Case Control study

    *15samples from NECC, 114 samples from PMP

  3. student’s t-test

    chi-square test

Table 2
Demographics of patients after stratified random sampling into training and testing sets.
(n = 135)
(n = 44)
Age, years, median (SD) *56 (8.1)56 (8.3)1.0
CA-125, units/ml, median (SD) *126.5 (1193.5)105.6 (577.8)0.91
Pathology, n (%)1.0
 Control11 (8.1)4 (9.1)
 Benign lesions34 (25.2)11 (25.0)
 Borderline tumors16 (11.9)5 (11.4)
 Stage I/II invasive cancers41 (30.4)12 (27.3)
 Stage III/IV invasive cancers33 (24.4)12 (27.3)
  1. *student’s t-test

    chi-square test

Table 3
miRNA variables used in model building identified through univariate testing
Significance-based selectionCorrelation-based feature subset selectionExpression fold change selection
miR-200a-3pmiR-200c-3pmiR-32–5 p
miR-320dmiR-320dmiR-150–5 p
miR-486–3 pmiR-320c
miR-1307–5 pmiR-335–5 p
miR-1307–5 p
Table 4
Performance of the eleven statistical models on the testing set by variable selection method.

Results are shown for the testing set.

Variable selection method
Statistical modelSignificance-based variable subset
AUC (95% CI)
Correlation-based feature selection subset
AUC (95% CI)
Fold change-based variable subset
AUC (95% CI)
Linear discriminant analysis0.80 (0.66–0.93)0.76 (0.62–0.90)0.78 (0.64–0.92)
Logistic regression0.81 (0.68–0.94)0.75 (0.61–0.90)0.82 (0.70–0.94)
Neural network0.84 (0.72–0.96)0.75 (0.60–0.89)0.90 (0.81–0.99)
Support vector machine0.77 (0.63–0.91)0.73 (0.58–0.87)0.77 (0.63–0.91)
Multivariate adaptive regression splines0.57 (0.40–0.74)0.66 (0.49–0.82)0.73 (0.58–0.88)
Naive Bayes classifier0.75 (0.60–0.89)0.68 (0.52–0.84)0.75 (0.60–0.89)
Least Absolute Deviation regression tree0.77 (0.63–0.91)0.61 (0.44–0.78)0.69 (0.53–0.84)
Functional tree0.78 (0.64–0.91)0.77 (0.63–0.91)0.68 (0.52–0.84)
Bayesian network0.72 (0.56–0.87)0.67 (0.52–0.83)0.72 (0.56–0.87)
Random forest0.78 (0.64–0.91)0.71 (0.56–0.86)0.76 (0.62–0.90)
Elastic net0.80 (0.67–0.93)0.76 (0.62–0.90)0.79 (0.66–0.92)
Table 5
Clinical characteristics of the qPCR model set.
CharacteristicqPCR model set
(N = 325)
Age, years, median (SD)58.0 (10.1)
Grade, n (%)
 Borderline15 (4.6)
 121 (6.4)
 227 (8.3)
 3100 (30.8)
 unspecified10 (3.1)
 Not applicable150 (46.2)
FIGO Stage, n (%)
 I/II75 (23.1)
 III/IV83 (25.5)
 Not applicable167 (51.4)
Histology, n (%)
 Control123 (37.8)
 Serous cystadenoma/cystadenofibroma14 (4.3)
 Endometrioma15 (4.6)
 Borderline serous tumor15 (4.6)
 Serous adenocarcinoma100 (30.8)
 Endometrioid/clear cell adenocarcinoma48 (14.8)
 Mucinous adenocarcinoma10 (3.8)
Table 6
Clinical characteristics of the external validation set.
CharacteristicPolish external validation set
(N = 51)
Age, years, median (SD)55.5 (16.1)
Grade, n (%)
Borderline4 (7.8)
 12 (3.9)
 27 (13.7)
 313 (25.5)
 unspecified3 (5.9)
 Benign22 (43.1)
FIGO Stage, n (%)
 I7 (13.7)
 II3 (5.9)
 III18 (35.3)
 IV1 (2.0)
 Benign22 (43.1)
Histology, n (%)
 Serous cystadenoma/cystadenofibroma6 (11.8)
 Endometrioma/endometriosis10 (19.6)
 Mature teratoma6 (11.8)
 Borderline serous tumor2 (3.9)
 Borderline seromucinous tumor2 (3.9)
 Serous adenocarcinoma4 (7.8)
 Mucinous adenocarcinoma1 (2.0)
 Endometrioid adenocarcinoma1 (2.0)
 Clear Cell Adenocarcinoma9 (17.6)
 Mixed adenocarcinoma3 (5.9)
 Adenocarcinoma unspecified7 (13.7)

Additional files

Source code 1

miRNA-seq neural network source code.

Source code 2

qPCR 14-miRNA neural network source code.

Source code 3

qPCR 7-miRNA neural network source code.

Source code 4

neural network applied to the Keller, et al dataset.

Supplementary file 1

Performance of the various prediction models on the unadjusted datasets.

(A) Area under the ROC curve analyses for the various testing methods depending on the variable selection protocol using data without batch adjustment. Like the batch-adjusted data, the neural network using the fold change variable outperformed the other methods in terms of classifier accuracy and did not overfit the predictions to the training set. (B) Individual sample predictions of the tested classification models built on the unadjusted fold change-based variable selection miRNA subset.

Supplementary file 2

Post-hoc secondary analyses of the neural network.

(A) Misclassification matrices for the neural network and CA125 predictions with detailed histopathological data. (B) Misclassification matrices for the neural network stratified by age. (C) miRNA expression by tumor histology and stage.

Supplementary file 3

Characteristics of CA125 expression in the study populations.

(A) Serum CA125 measurements among cancer and non-cancer cases in the two study populations. (B) Relationship between CA125 and miRNAs in the neural network.

Supplementary file 4

Comparison of the neural network to existing datasets.

(A) Mapping of the 14-miRNA dataset from the miRNA-sequencing study onto the GSE31568 dataset published by Keller, et al. (B) Comparison of the neural network (NN) classifier with the tissue-based MiROvaR signature by Bagnoli et al.

Supplementary file 5

Univariate comparison of miRNA average expression values between patients with cancer and patients in the benign/borderline/control group.

Supplementary file 6

Supplementary datasets.

Supplementary Dataset (1) TPM data from miRNA sequencing. Supplementary Dataset (2) Batch adjusted, log10-transformed miRNA expression data, filtered for miRNA detection levels in both cohorts. Supplementary Dataset (3) qPCR Validation of neural network. Supplementary Dataset (4) Raw qPCR data. Supplementary Dataset (5) Background Filtered qPCR data. Supplementary Dataset (6) Normalized qPCR data. Supplementary Dataset (7) Normalized expression data from the Keller et al. dataset. Supplementary Dataset (8) Normalized expression data of preoperative and postoperative miRNA expression.

Transparent reporting form
Reporting standard 1

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Kevin M Elias
  2. Wojciech Fendler
  3. Konrad Stawiski
  4. Stephen J Fiascone
  5. Allison F Vitonis
  6. Ross S Berkowitz
  7. Gyorgy Frendl
  8. Panagiotis Konstantinopoulos
  9. Christopher P Crum
  10. Magdalena Kedzierska
  11. Daniel W Cramer
  12. Dipanjan Chowdhury
Diagnostic potential for a serum miRNA neural network for detection of ovarian cancer
eLife 6:e28932.