Peer review process
Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.
Read more about eLife’s peer review process.Editors
- Reviewing EditorAlan TaleviNational University of La Plata, La Plata, Argentina
- Senior EditorEduardo FrancoMcGill University, Montreal, Canada
Reviewer #1 (Public Review):
Summary:
This is a large cohort of ischemic stroke patients from a single centre. The author successfully set up predictive models for PTS.
Strengths:
The design and implementation of the trial are acceptable, and the results are credible. It may provide evidence of seizure prevention in the field of stroke treatment.
Weaknesses:
The methodology needs further consideration. The Discussion needs extensive rewriting.
Reviewer #2 (Public Review):
Summary
The authors present multiple machine-learning methodologies to predict post-stroke epilepsy (PSE) from admission clinical data.
Strengths
The Statistical Approach section is very well written. The approaches used in this section are very sensible for the data in question.
Weaknesses
There are many typos and unclear statements throughout the paper.
There are some issues with SHAP interpretation. SHAP in its default form, does not provide robust statistical guarantees of effect size. There is a claim that "SHAP analysis showed that white blood cell count had the greatest impact among the routine blood test parameters". This is a difficult claim to make.
The Data Collection section is very poorly written, and the methodology is not clear.
There is no information about hyperparameter selection for models or whether a hyperparameter search was performed. Given this, it is difficult to conclude whether one machine learning model performs better than others on this task.
The inclusion and exclusion criteria are unclear - how many patients were excluded and for what reasons?
There is no sensitivity analysis of the SMOTE methodology: How many synthetic data points were created, and how does the number of synthetic data points affect classification accuracy?
Did the authors achieve their aims? Do the results support their conclusions?
The paper does not clarify the features' temporal origins. If some features were not recorded on admission to the hospital but were recorded after PSE occurred, there would be temporal leakage.
The authors claim that their models can predict PSE. To believe this claim, seeing more information on out-of-distribution generalisation performance would be helpful. There is limited reporting on the external validation cohort relative to the reporting on train and test data.
For greater certainty on all reported results, it would be most appropriate to perform n-fold cross-validation, and report mean scores and confidence intervals across the cross-validation splits
The likely impact of the work on the field
If this model works as claimed, it will be useful for predicting PSE. This has some direct clinical utility.
Analysis of features contributing to PSE may provide clinical researchers with ideas for further research on the underlying aetiology of PSE.
Additional context that might help readers
The authors show force plots and decision plots from SHAP values. These plots are non-trivial to interpret, and the authors should include an explanation of how to interpret them.
Reviewer #3 (Public Review):
Summary:
The authors report the performance of a series of machine learning models inferred from a large-scale dataset and externally validated with an independent cohort of patients, to predict the risk of post-stroke epilepsy. Some of the reported models have very good explicative and predictive performances.
Strengths:
The models have been derived from real-world large-scale data.
Performances of the best-performing models are very good according to the external validation results.
Early prediction of the risk of post-stroke epilepsy would be of high interest to implement early therapeutic interventions that could improve prognosis.
Weaknesses:
There are issues with the readability of the paper. Many abbreviations are not introduced properly and sometimes are written inconsistently. A lot of relevant references are omitted. The methodological descriptions are extremely brief and, sometimes, incomplete.
The dataset is not disclosed, and neither is the code (although the code is made available upon request). For the sake of reproducibility, unless any bioethical concerns impede it, it would be good to have these data disclosed.
Although the external validation is appreciated, cross-validation to check the robustness of the models would also be welcome.