Development and evaluation of a live birth prediction model for evaluating human blastocysts from a retrospective study

  1. Hang Liu
  2. Zhuoran Zhang
  3. Yifan Gu
  4. Changsheng Dai
  5. Guanqiao Shan
  6. Haocong Song
  7. Daniel Li
  8. Wenyuan Chen
  9. Ge Lin  Is a corresponding author
  10. Yu Sun  Is a corresponding author
  1. Department of Mechanical Engineering, University of Toronto, Canada
  2. School of Science and Engineering, The Chinese University of Hong Kong-Shenzhen, China
  3. Institute of Reproductive and Stem Cell Engineering, School of Basic Medical Science, Central South University, China
  4. Reproductive and Genetic Hospital of CITIC-Xiangya, China
  5. Department of Electrical and Computer Engineering, Canada
  6. Key Laboratory of Reproductive and Stem Cell Engineering, National Health and Family Planning Commission, China
  7. National Engineering Research Center of Human Stem Cells, China
  8. Institute of Biomedical Engineering, University of Toronto, Canada
  9. Department of Computer Science, University of Toronto, Canada

Abstract

Background:

In infertility treatment, blastocyst morphological grading is commonly used in clinical practice for blastocyst evaluation and selection, but has shown limited predictive power on live birth outcomes of blastocysts. To improve live birth prediction, a number of artificial intelligence (AI) models have been established. Most existing AI models for blastocyst evaluation only used images for live birth prediction, and the area under the receiver operating characteristic (ROC) curve (AUC) achieved by these models has plateaued at ~0.65.

Methods:

This study proposed a multimodal blastocyst evaluation method using both blastocyst images and patient couple’s clinical features (e.g., maternal age, hormone profiles, endometrium thickness, and semen quality) to predict live birth outcomes of human blastocysts. To utilize the multimodal data, we developed a new AI model consisting of a convolutional neural network (CNN) to process blastocyst images and a multilayer perceptron to process patient couple’s clinical features. The data set used in this study consists of 17,580 blastocysts with known live birth outcomes, blastocyst images, and patient couple’s clinical features.

Results:

This study achieved an AUC of 0.77 for live birth prediction, which significantly outperforms related works in the literature. Sixteen out of 103 clinical features were identified to be predictors of live birth outcomes and helped improve live birth prediction. Among these features, maternal age, the day of blastocyst transfer, antral follicle count, retrieved oocyte number, and endometrium thickness measured before transfer are the top five features contributing to live birth prediction. Heatmaps showed that the CNN in the AI model mainly focuses on image regions of inner cell mass and trophectoderm (TE) for live birth prediction, and the contribution of TE-related features was greater in the CNN trained with the inclusion of patient couple's clinical features compared with the CNN trained with blastocyst images alone.

Conclusions:

The results suggest that the inclusion of patient couple’s clinical features along with blastocyst images increases live birth prediction accuracy.

Funding:

Natural Sciences and Engineering Research Council of Canada and the Canada Research Chairs Program.

Editor's evaluation

This article provides important findings that have practical implications for reproductive medicine and would be of interest to IVF specialists. Based on the compelling strength of evidence, the study demonstrates significant results in improving the predictive value of the live birth model based on blastocyst evaluation and clinical features.

https://doi.org/10.7554/eLife.83662.sa0

eLife digest

More than 50 million couples worldwide experience infertility. The most common treatment is in vitro fertilization (IVF). Fertility specialists collect eggs and sperm from the prospective parents. They combine the egg and sperm in a laboratory and allow the fertilized eggs to develop for five days into a multi-celled blastocyst. Then, the specialists select the healthiest blastocysts and return them to the patient's uterus.

Since 1978, more than 8 million children have been conceived through IVF. Yet, only about 30% of IVF attempts result in a successful birth. As a result, fertility patients often undergo multiple rounds of IVF, which can be expensive and emotionally draining. Several factors determine IVF success, one of which is the health of the blastocysts selected for transfer to the uterus. Specialists select the blastocysts using several criteria. But these human assessments are subjective and inconsistent in predicting which ones are most likely to result in a successful birth. Recent studies suggest artificial intelligence technology may help select blastocysts.

Liu et al. show that using artificial intelligence to assess blastocysts and fertility patient characteristics leads to more accurate predictions about which blastocysts are likely to result in a successful birth. In the experiments, the researchers trained an artificial intelligence computer program using pictures of 17,580 blastocysts with known birth outcomes and the parents' clinical characteristics. The model identified 16 parental factors associated with birth outcomes. The top 5 most predictive parental factors were maternal age, the day of blastocyst transfer to the uterus, how many eggs were present in the ovaries, the number of eggs retrieved and the thickness of the uterus lining. The program achieved the highest prediction of healthy births so far, compared to success rates listed in other studies.

Artificial intelligence-aided blastocyte selection using patient and blastocyst characteristics may improve IVF success rates and reduce the number of treatment cycles patient couples undergo. Before specialists can use artificial intelligence in their clinics, they must conduct confirmatory clinical studies that enroll patient couples to compare conventional methods and artificial intelligence.

Introduction

Infertility is a global health issue, affecting more than 50 million couples worldwide (Mascarenhas et al., 2012). Since the birth of the first in vitro fertilization (IVF) child in 1978, over 8 million children were born with IVF treatment (Adamson et al., 2018). Among the various factors contributing to IVF outcomes, the quality of the blastocyst (day 5 embryo) selected for transfer is critical for the success of IVF treatment. Manual grading of blastocyst development stage, inner cell mass (ICM), and trophectoderm (TE) remains the most common method for blastocyst evaluation. While the blastocyst morphological grading is widely used in clinical practice, morphological grades of the development stage, ICM, and TE have shown limited predictive power on clinical outcomes (Seli et al., 2011; Reignier et al., 2019; Bartolacci et al., 2021; Ueno et al., 2021; Xiong et al., 2022). It is desired to identify features for accurate prediction of clinical outcomes of blastocysts.

To achieve this goal, the convolutional neural network (CNN) is expected to play a critical role. CNN is able to automatically detect discriminative features from images and has been the state-of-the-art method in various fields in medical imaging, such as lung cancer prediction (Ardila et al., 2019), breast cancer prediction (McKinney et al., 2020), and diabetic retinopathy screening (Bora et al., 2021). To apply CNN to predict the clinical outcome of a blastocyst, images of blastocysts with a known clinical outcome (e.g., pregnancy and live birth) are collected for the CNN model development. The area under the receiver operating characteristic (ROC) curve (AUC) is the most commonly used metric to evaluate and compare machine learning models on predicting clinical outcomes of blastocysts (Kragh and Karstoft, 2021a). The AUCs reported in the literature using CNN to predict clinical outcomes from blastocyst images range from 0.64 to 0.71 for pregnancy prediction (VerMilyea et al., 2020; Kragh et al., 2021b; Berntsen et al., 2022; Enatsu et al., 2022; Loewke et al., 2022), and are around 0.65 for live birth prediction (Miyagi et al., 2019; Nagaya and Ukita, 2021).

Besides using blastocyst images, attempts have also been made to use time-lapse videos for live birth prediction. These videos contain the entire development process from days 0 to 5–7. However, results in the literature show that CNN using static images achieved similar or slightly better accuracies in predicting clinical outcomes than using time-lapse videos, for instance, AUC=0.68–0.71 (VerMilyea et al., 2020; Enatsu et al., 2022; Loewke et al., 2022) versus 0.64–0.67 (Kragh et al., 2021b; Berntsen et al., 2022) for pregnancy prediction, and 0.66 (Miyagi et al., 2019) versus 0.65 (Nagaya and Ukita, 2021) for live birth prediction. A potential reason is that the redundant frames of images in time-lapse videos may work as noise causing the model to overfit and thus leading to a lower prediction accuracy (Zhu et al., 2018; Wu et al., 2021; Tao et al., 2022). Therefore, we opted to use static blastocyst images in this study.

Different from using blastocyst images alone to predict clinical outcomes, Miyagi et al., 2020 proposed to use blastocyst images together with maternal clinical features including maternal age, AMH, and BMI and reported an AUC of 0.74, the highest accuracy in literature (Miyagi et al., 2020). However, two questions remain elusive. First, the contribution of blastocyst images and the additional contribution of maternal clinical features to live birth prediction are unknown. Second, endometrium status-related features, such as endometrium thickness and pattern, are also critical factors impacting live birth outcomes (Ng et al., 2007; Bu et al., 2016; Mahutte et al., 2022), but were not considered.

In this study, we quantified the effect of blastocyst images and the combined effect of both blastocyst images and patient couple’s clinical features on live birth prediction. The live birth prediction model using only blastocyst images achieved an AUC of 0.67, which was significantly outperformed by the AUC of 0.77 achieved by the model using both blastocyst images and patient couple’s clinical features (p value<0.0001). Additionally, when endometrium status-related features (e.g., endometrium thickness and pattern) were excluded, the AUC of the model using both blastocyst images and patient couple’s clinical features significantly decreased to 0.74 (p value<0.0001), indicating that the inclusion of endometrium status-related clinical features helps improve live birth prediction accuracy. Sixteen patient couple’s clinical features were identified to be most related to live birth outcomes of blastocysts, among which maternal age, the day of blastocyst transfer, antral follicle count (AFC), retrieved oocyte number, and endometrium thickness measured before transfer are the top five features contributing to live birth prediction. Additionally, the CNN heatmaps showed that the CNN mainly focused on ICM and TE for live birth prediction, and the contribution of TE-related features was greater in the CNN trained with the inclusion of patient couple’s clinical features compared with the CNN trained with blastocyst images alone.

Methods

Data set collection

We used retrospectively collected data to develop the live birth prediction model. Transferred blastocysts with known live birth outcomes for patients who underwent frozen embryo transfer cycles from 2016 to 2020, at the Reproductive and Genetic Hospital of CITIC-Xiangya, were reviewed for inclusion in the data set. Informed consent was not necessary because this study used retrospective and fully de-identified data, no medical intervention was performed on the subject, and no biological samples from the patient were collected. This study was approved by the Ethics Committee of the Reproductive and Genetic Hospital of CITIC-Xiangya (approval number: LL-SC-2021-008).

Blastocyst images were captured before transfer using a standard optical light microscope mounted with a camera. Two grayscale images were captured for each blastocyst, one focusing on ICM and the other focusing on TE. Blastocysts were cropped from the original images which have a resolution of 1024×768 and were consistently padded to 500×500 to facilitate model training. Patient couple’s clinical features consist of 103 features including maternal age and BMI, the day of blastocyst transfer, infertility diagnosis and treatment history of patient couples, ovarian stimulation protocols, maternal hormone profiles, and ultrasound results measured during the ovarian stimulation process and before transfer, and paternal semen diagnosis results (see Supplementary file 1 for a complete list). Based on p value analysis and logistic regression (LR)-based sequential forward feature selection (Solorio-Fernández et al., 2020; Raschka, 2018), 16 clinical features that are most relevant to live birth prediction were identified and used for training the machine learning model (see Figure 3). Feature selection reduces the input feature dimensions by removing redundant features and features with limited predictive power, thus improving the model generalization capability (see Figure 3—figure supplement 1). The LR-based feature selection was used due to its computing efficiency, we also presented the result of multilayer perceptron (MLP)-based feature selection in Figure 3—figure supplement 1 and Figure 3—figure supplement 2.

A total of 28,118 blastocysts with known live birth outcomes were reviewed, among which 17,580 blastocysts with two blastocyst images and all the 16 clinical features available were included in the data set.

Model architecture

Figure 1 shows the architecture of the live birth prediction model based on multimodal blastocyst evaluation. It consists of a CNN to process blastocyst images and an MLP to process patient couple’s clinical features. Features from the CNN and the MLP are fused; thus, the model can be trained to simultaneously take into account both blastocyst images and patient couple’s clinical features for live birth prediction. The last fully connected layer in the CNN and the last fully connected layer in the MLP each output a decision-level feature, which has two variables used to classify the blastocyst into the positive or negative live birth outcome category. The adding operation fuses decision-level features from the CNN and the MLP, and the result of addition is taken as the final output of the overall live birth prediction model.

Architecture of the live birth prediction model based on multimodal blastocyst evaluation.

CNN, convolutional neural network; FC, fully connected layer; MLP, multilayer perceptron.

Model implementation and training

The proposed live birth prediction model used EfficientNetV2-S as the backbone CNN. EfficientNetV2-S is the baseline model in the EfficientNetV2 family, which is a new family of CNN models that provide higher accuracy and training speed than conventional models (Tan and Le, 2021). In our work, the output dimension of the final fully connected layer in EfficientNetV2-S was set to be two, representing the positive and negative live birth outcome of a blastocyst, respectively. The model was implemented using PyTorch 1.10.1 (Paszke et al., 2019).

Each of the 17,580 blastocysts had two images taken at different focal planes, one focused more on TE cells and the other on ICM. Furthermore, for each blastocyst, live birth outcomes and all the 16 patient couple’s clinical features were available. The blastocysts were randomly split into 80%:10%:10% to construct the training, validation, and testing data sets. The stratified random sampling approach was used to ensure that all split data sets have the same distribution of minority and majority classes. Since the ratio of blastocysts with a positive live birth outcome in the data set is 0.368, to mitigate the model’s prediction bias toward the majority category (i.e., the negative live birth outcome), the weighted sampling approach, which can help rebalance the class distributions when sampling from an imbalanced data set (Feng et al., 2021), was employed for training the model. In the weighted sampling approach, the probability of each item to be selected is determined by its weight, and the weight of each item is assigned by inverse class frequencies. In this way, the weighted sampling approach rebalances the class distributions by oversampling the minority class and under-sampling the majority class. We also verified the approach of using weighted cross-entropy loss, which assigns greater weights to the loss caused by the prediction error of minority classes. Both approaches helped mitigate the prediction bias toward the majority class, and the results showed that the weighted sampling approach outperformed the weighted cross-entropy loss method.

Model performance is subject to training hyperparameters (e.g., optimizer, learning rate, batch size, and number of layers). Hence, an automatic hyperparameter-tuning tool is used, Facebook Ax (version 0.2.2, https://github.com/facebook/Ax), to search for the optimal hyperparameters for model training. The selected hyperparameters for training the model include a batch size of 16, an SGD optimizer with a learning rate of 0.008, and a momentum of 0.39, and three hidden layers in the MLP. A dropout layer follows each hidden layer in the MLP to prevent overfitting. The number of nodes in each hidden layer is 6836, 5657, and 468, respectively. The dropout rate in each dropout layer is 0.01, 0.07, and 0.67, respectively. The model was trained with four RTX A6000 GPUs. It took about 30 hr to search for the optimal hyperparameters and about an hour to train the model using the optimal hyperparameters.

Statistical analysis

Statistical tests were calculated to compare clinical features between blastocysts with the positive live birth outcome and blastocysts with the negative live birth outcome. Chi-squared test was used for categorical features, t test was used for numerical features. Chi-squared test and t test were performed using Python (version 3.6). ROC curves were compared by the DeLong test implemented in MedCalc software (version 20). All statistical tests were two-tailed and considered significant if p value≤0.05.

Results

The inclusion of patient couple’s clinical features increased AUC for live birth prediction

To quantify the individual effect of blastocyst images and the combined effect of patient couple’s clinical features, we built and compared models that (1) used only blastocyst images for live birth prediction, and (2) used both blastocyst images and patient couple’s clinical features for prediction. In addition, to quantify the specific effect of endometrium status-related features (i.e., endometrium thickness before transfer, endometrium thickness on HCG day, and endometrium pattern B (yes/no) on HCG day) on live birth prediction, a third model trained using blastocyst images and patient couple’s clinical features where endometrium status-related features were excluded, was also built and compared.

Figure 2 shows the ROC curves and AUCs of the three models for predicting live birth outcomes of 1758 blastocysts (i.e., 10% of 17,580) in the test data set. Using only blastocyst images for live birth prediction gave an AUC of 0.67, with a 95% confidence interval (CI) of 0.65–0.70. Using blastocyst images and patient couple’s clinical features (endometrium status-related features excluded) significantly increased the AUC to 0.74 (95% CI: 0.72–0.76, p value<0.0001). Using both blastocyst images and patient couple’s clinical features (endometrium status-related features included) achieved a prediction AUC of 0.77 (95% CI: 0.75–0.79), which is significantly higher than using only blastocyst images for prediction (p value<0.0001) and than using blastocyst images and patient couple’s clinical features where endometrium status-related features were excluded (p value=0.007).

Receiver operating characteristic (ROC) analysis.

ROC curves of the model using only blastocyst images, the model using blastocyst images and patient couple’s clinical features where EM-status related features were excluded, and the model using blastocyst images and patient couple’s clinical features where EM-status related features were included to predict live birth outcomes of 1758 blastocysts in the test data set. AUC, area under the ROC curve; EM, endometrium; EM status-related features, endometrium thickness before transfer, endometrium thickness on HCG day, endometrium pattern B (yes/no) on HCG day.

Figure 2—source data 1

Code and data used to generate the ROC curves.

https://cdn.elifesciences.org/articles/83662/elife-83662-fig2-data1-v2.zip

Ranking the predictive power of patient couple’s clinical features

We then investigated the predictive power of each patient couple’s clinical feature in predicting live birth outcome. Figure 3 shows the 16 features that were identified to be most related to the live birth outcomes of the blastocysts. These features were ranked according to their AUCs for individually predicting the live birth outcomes of blastocysts using univariable LR. The AUC for each feature was reported as the mean AUC over a tenfold cross-validation process.

Figure 3 with 2 supplements see all
Ranking the predictive power of patient couple’s clinical features.

The 16 patient couple’s clinical features that were identified to be most related to the live birth outcomes of the blastocysts ranked by the AUC for individually predicting live birth outcome. AUC, area under the curve.

Figure 3—source data 1

Code and data used to generate the AUC ranking chart.

https://cdn.elifesciences.org/articles/83662/elife-83662-fig3-data1-v2.zip

CNN heatmaps

In blastocyst images, what does CNN focus on to predict the live birth outcome of a blastocyst? Is there a difference in what the CNN focuses on between the model trained without and with the inclusion of patient couple’s clinical features? To answer these questions, we used the class activation mapping method to generate heatmaps. Figure 4 shows blastocyst images, corresponding heatmaps of the CNN trained without including patient couple’s clinical features, and corresponding heatmaps of the CNN trained with including patient couple’s clinical features.

Figure 4 with 1 supplement see all
CNN heatmaps analysis.

Heatmaps of the CNN trained without and with patient couple’s clinical features. Column (A): original blastocyst images. Column (B): corresponding heatmaps of the CNN trained without including patient couple’s clinical features. Column (C): corresponding heatmaps of the CNN trained with the inclusion of patient couple’s clinical features. CNN, convolutional neural network.

Figure 4—source data 1

Code and data used to generate the heatmaps shown in Figure 4.

https://cdn.elifesciences.org/articles/83662/elife-83662-fig4-data1-v2.zip

The blastocyst images were cropped and padded to a consistent size to facilitate model training. The padding value was calculated as the mean pixel value of blastocyst images in the data set. Heatmaps were generated by XGradCAM (Fu et al., 2020). Note that the CNN takes two-channel blastocyst images as the input, one focusing on ICM and the other focusing on TE. The blastocyst images shown in Figure 4 are those focused on ICM, and Figure 4—figure supplement 1 shows the two-channel blastocyst images. As can be seen in Figure 4, when trained using only blastocyst images, the CNN mainly focuses on ICM and TE for predicting live birth outcomes. When training with both blastocyst images and patient couple’s clinical features, TE-related features contributed more to live birth prediction compared with training with blastocyst images only.

Discussion

In this study, the individual effect of blastocyst images and the combined effect of patient couple’s clinical features for live birth prediction were quantified by comparing the AUC values of the model using only blastocyst images and the model using both blastocyst images and patient couple’s clinical features. An AUC of 0.67 was achieved with blastocyst images only while using both blastocyst images and patient couple’s clinical features led to a significantly higher AUC of 0.77 in live birth prediction. When endometrium status-related features were excluded from patient couple’s clinical features, the AUC of the live birth prediction model significantly decreased (p value=0.007) from 0.77 to 0.74, indicating the strong relevance of endometrium status-related features in live birth prediction. Sixteen patient couple’s clinical features were identified to be most related to live birth outcomes of blastocysts, among which maternal age, the day of blastocyst transfer, AFC, retrieved oocyte number, and endometrium thickness before transfer are the top five features contributing to live birth prediction.

This study was based on a comprehensive multimodal data set collected for blastocyst evaluation. The data set includes 17,580 blastocysts with known live birth outcomes, blastocyst images, and 16 patient couple’s clinical features. As shown in Figure 3, 16 patient couple’s clinical features comprehensively include maternal basal characteristics (age and BMI); hormone profiles measured after period (LH and FT4), on HCG day (PE2, P, and LH), and before transfer (E2); endometrium status-related features (endometrium thickness on HCG day and before transfer, endometrium pattern on HCG day); features related to oocytes (AFC, retrieved oocyte number); the day of blastocyst transfer; number of ovarian stimulation cycles; and paternal features (the ratio of grade A sperm after semen processing). For comparison, the data set studied by Miyagi et al., 2020 did not contain endometrium status-related features and key hormone profiles (e.g., P, E2, and LH). There are numerous IVF data sets containing over 100,000 records of clinical features and live birth outcomes (Nelson and Lawlor, 2011; McLernon et al., 2016; La Marca et al., 2021); however, there are no blastocyst images in these data sets, and thus, these data sets cannot be used for building models to evaluate blastocysts from their images.

To handle the multimodal data set, our proposed model was designed to integrate two modules including a CNN and an MLP to enable the model to simultaneously consider images and numerical clinical features for blastocyst evaluation. The large and comprehensive multimodal data set and the proposed CNN+MLP model resulted in the highest AUC value of 0.77 ever reported thus far for predicting live birth outcomes of blastocysts. They also enabled us to quantify the predictive power of each feature in predicting the live birth outcomes of blastocysts.

The blastocyst grading system introduced in 1999 (Gardner and Schoolcraft, 1999; Gardner, 1999) remains the most common method used by embryologists to evaluate blastocyst quality although the morphological grades of blastocyst development stage, ICM and TE have limited predictive power on live birth outcomes (e.g., AUC=0.58–0.61 for live birth prediction reported by Reignier et al., 2019; Bartolacci et al., 2021; Xiong et al., 2022). Since CNN became a state-of-the-art method for image-based classification, many attempts have been made to apply the CNN to blastocyst evaluation for predicting clinical outcomes (e.g., VerMilyea et al., 2020; Kragh et al., 2021b; Berntsen et al., 2022; Enatsu et al., 2022; Loewke et al., 2022; Miyagi et al., 2019; Nagaya and Ukita, 2021). Among these, the AUC values reported in the literature using blastocyst images only were around 0.65 for live birth prediction (Miyagi et al., 2019; Nagaya and Ukita, 2021). Similarly, we achieved an AUC of 0.67 (see Figure 2). Compared with the AUC of 0.58–0.61 reported in the literature using Gardner grades for live birth prediction, these results confirmed that CNN can achieve a higher prediction accuracy. As shown in Figure 4, the CNN mainly focuses on ICM and TE. Different from the Gardner-defined TE grade on the number of TE cells and the cohesiveness of TE cells as a whole, the CNN tends to focus on specific TE clusters. Understanding the heatmaps further requires more investigations.

Miyagi et al., 2020 used both blastocyst images and maternal clinical features (age, AMH, and BMI) to predict live birth outcomes and achieved an AUC of 0.74 (Miyagi et al., 2020). The additional contribution from the three maternal clinical features was not clear since no AUC was reported by using blastocyst images alone. Furthermore, despite their importance in pregnancy and live birth, endometrium status-related features were not considered in their work. Therefore, our study used a comprehensive data set and quantitatively compared the AUC values of live birth prediction using only blastocyst images versus using both blastocyst images and patient couple’s clinical features. We also quantified the usefulness of endometrium status-related features in working with blastocyst images to improve live birth prediction. Furthermore, we revealed that hormone profiles such as E2, LH, P, and FT4, features related to oocyte retrieval such as AFC and number of oocytes retrieved, and the ratio of grade A sperm after processing representing semen quality are able to work with blastocyst images to further improve the live birth prediction accuracy. Note that in this study, only the total testosterone (T) was analyzed, and free T or bioavailable T was not available for clinical feature analysis (see Supplementary file 1). This may cause potential bias in determining the significance of testosterone as a predictor of live birth.

Another finding of this study, by comparing the heatmaps of the CNN trained without and with the inclusion of patient couple’s clinical features, is that the weights of TE-related features increased (see Figure 4). A potential reason may be that TE and the endometrium status-related features (e.g., endometrium thickness and pattern) play critical roles when a blastocyst initiates implantation, and a positive live birth outcome is not possible without the success of this implantation process (Ahlström et al., 2011; Hill et al., 2013; Chen et al., 2014; Bakkensen et al., 2019).

In conclusion, in this retrospective study involving 17,580 blastocysts with known live birth outcomes, blastocyst images and 16 patient couple’s clinical features, we built a live birth prediction model based on multimodal blastocyst evaluation using both blastocyst images and patient couple’s clinical features. We quantified the individual effect of blastocyst images and the combined effect of patient couple’s clinical features on live birth prediction. Results demonstrated that using both blastocyst images and patient couple’s clinical features can significantly improve live birth prediction than using blastocyst images alone.

The proposed live birth prediction model improves the evaluation of a blastocyst in terms of its live birth potential for best blastocyst selection from multiple blastocysts of a patient. The next step is to validate the model’s prediction accuracy using prospectively collected data and verify its effectiveness in blastocyst selection via a randomized controlled trial (RCT). Patients enrolled in the RCT will be split into the study group and the control group (1:1 ratio). In the study group, the model selects a top blastocyst having the highest probability of live birth for transfer, and in the control group, embryologists select a top blastocyst based on their routine morphological grading for transfer. Live birth outcomes of both groups will be tracked and compared.

Data availability

All processed data and code needed to reproduce the findings of the study are made openly available in deidentified form. This can be found in https://github.com/robotVisionHang/LiveBirthPrediction_Data_Code (copy archived at Liu et al., 2023), and attached to this manuscript. All codes and software used to analyze the data can also be accessed through the link. Due to data privacy regulations of patient data, raw data cannot be publicly shared. Interested researchers are welcome to contact the corresponding author with a concise project proposal indicating aims of using the data and how they will use the data. The project proposal will be firstly assessed by Prof. Yu Sun, Prof. Ge Lin, and then by the Ethics Committee of the Reproductive and Genetic Hospital of CITIC-Xiangya. There are no restrictions on who can access the data.

References

  1. Book
    1. Gardner DK
    (1999)
    In-vitro culture of human blastocyst
    In: Gardner DK, editors. Towards Reproductive Certainty: Infertility and Genetics Beyond 1999. CRC Press. pp. 378–388.
  2. Conference
    1. Tan M
    2. Le Q
    (2021)
    Efficientnetv2: Smaller models and faster training
    In International Conference on Machine Learning 2021. pp. 10096–10106.

Decision letter

  1. Larisa V Suturina
    Reviewing Editor; Scientific Center for Family Health and Human Reproduction, Russian Federation
  2. Ricardo Azziz
    Senior Editor; University at Albany, SUNY, United States

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Development and evaluation of a live birth prediction model for evaluating human blastocysts: a retrospective study" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Ricardo Azziz as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

1) Please, expand the section "Model architecture" and clarify the details regarding "decision-level features".

2) Provide the details of how parameter optimization was accomplished as well as the architectural details, i.e., the number of layers and nodes. What was the computational overhead for training these models?

3) Consider presenting the data for all significant predictors to justify the choice for inclusion in the model and verify if the important or top features MLP (using explanation methods) uses for prediction are the same as those inferred by the logistic regression.

4) Please, consider presenting in detail a weighted sampling approach used to tackle the imbalance issue.

5) The code is available only for generating figures 2, 4 reported in the paper. For figure 3, only data is available. Consider presenting this code for reproducibility purposes.

6) Please, improve the discussion of the potential applications of the proposed model in clinical settings and mention the method of testosterone assessment as a study limitation.

Reviewer #1 (Recommendations for the authors):

The authors could mention the method of testosterone assessment as a study limitation that may cause a potential bias regarding the estimation of significance of testosterone as a predictor of live birth.

The authors could consider presenting the data for all significant predictors (of example, in the supplemental Figure 3 data) and justify the choice for inclusion in the model. It will better demonstrate the correctness of the selection of predictors

Reviewer #2 (Recommendations for the authors):

I quite enjoyed reading the article. Overall, the motivation and concepts are well defined; however, the manuscript lacks the necessary methodological details, which hindered my ability to fully understand and appreciate the work.

The link between CNN and MLP architecture in the final integrated model is unclear. The section "Model architecture" needs to be expanded. It is not clear what "decision-level features" are from CNN or MLP. Are these features from the CNN's fully connected layer? In MLP, are they before the final layer? And how do authors concatenate these features? These details are important to understand final architecture.

How was parameter optimization accomplished? Architectural details, i.e., the number of layers and nodes, are missing. What was the computational overhead for training these models?

It is difficult to understand the discrepancy of features between CNN with clinical features and CNN without clinical features. Maybe it is because model architecture is not well defined. For the moment, it seems like CNN was trained independently, even in the concatenated version of the model. How can you explain the discrepancy between the activation maps of these two models?

Important features were identified using logistic regression. I do not observe the link presenting these features as important features when the model was built using MLP instead. Could you verify that the important or top features MLP (using explanation methods) uses for prediction are the same as those inferred by the logistic regression?

The imbalance issue was tackled using a weighted sampling approach. This approach needs to be detailed in the main text. And how were train, val, and test partitions built in view of the distribution of minority and majority classes? Did the author verify other approaches that can help resolve this issue?

The code to reproduce the model is missing.

Discussion regarding implementation in a clinical setting would be informative. How feasible is the model's deployment in a clinic? Maybe you can further elaborate on prospective clinical trial which was mentioned in line # 363.

Reviewer #3 (Recommendations for the authors):

The study is well designed as the Materials and methods were convincing. Results are supporting the Aims. Further studies will be important for establishing the Criteria.

The challenging issue is to keep reporting the effectiveness of this predictive procedure and publish it with LBR.

https://doi.org/10.7554/eLife.83662.sa1

Author response

Essential revisions:

1) Please, expand the section "Model architecture" and clarify the details regarding "decision-level features".

The decision-level features are generated from the last fully connected layer in the CNN and the last fully connected layer in the MLP. Each decision-level feature has two variables used to classify the blastocyst into the positive or negative live birth outcome category. The two decision-level features are fused by the adding operation.

To clarify this issue, we have revised the “Model architecture” section, added text in Figure 1 to indicate the last fully connected layer, and provided the code to reproduce the model. The source code has been submitted with the revised manuscript and can also be accessed at https://github.com/robotVisionHang/LiveBirthPrediction_Data_Code.

The revised “Model architecture” section now reads:

“Figure 1 shows the architecture of the live birth prediction model based on multi-modal blastocyst evaluation. It consists of a CNN to process blastocyst images and an MLP to process patient couple's clinical features. Features from the CNN and the MLP are fused; thus, the model can be trained to simultaneously take into account both blastocyst images and patient couple's clinical features for live birth prediction. The last fully connected layer in the CNN and the last fully connected layer in the MLP each output a decision-level feature, which has two variables used to classify the blastocyst into the positive or negative live birth outcome category. The adding operation fuses decision-level features from the CNN and the MLP, and the result of addition is taken as the final output of the overall live birth prediction model.”

2) Provide the details of how parameter optimization was accomplished as well as the architectural details, i.e., the number of layers and nodes. What was the computational overhead for training these models?

To clarify this issue, we have added paragraph 3 in the “Model implementation and training” section:

“Model performance is subject to training hyperparameters (e.g., optimizer, learning rate, batch size, number of layers). Hence, an automatic hyperparameter-tuning tool is used, Facebook Ax (https://github.com/facebook/Ax), to search for the optimal hyperparameters for model training. The selected hyperparameters for training the model include a batch size of 16, an SGD optimizer with a learning rate of 0.008 and a momentum of 0.39, and three hidden layers in the MLP. A dropout layer follows each hidden layer in the MLP to prevent overfitting. The number of nodes in each hidden layer is 6836, 5657, and 468, respectively. The dropout rate in each dropout layer is 0.01, 0.07, and 0.67, respectively. The model was trained with four RTX A6000 GPUs. It took about 30 hours to search for the optimal hyperparameters and about an hour to train the model using the optimal hyperparameters.”

3) Consider presenting the data for all significant predictors to justify the choice for inclusion in the model and verify if the important or top features MLP (using explanation methods) uses for prediction are the same as those inferred by the logistic regression.

The AUC of the model using all significant predictors and blastocyst images is 0.75, lower than the AUC (0.77) achieved by the model using logistic regression-selected clinical features and blastocyst images, and the AUC (0.76) achieved by the model using MLP-selected clinical features and blastocyst images. Feature selection reduces the input feature dimensions by removing redundant features and features with limited predictive power, thus improving the model generalization capability.

To follow the reviewer’s suggestion, we used the sequential forward feature selection method to perform MLP-based feature selection. It is explainable in that features are sequentially added to an empty candidate set until the addition of further features does not increase the prediction accuracy. Note that we searched for the optimal MLP parameters (e.g., number of layers, nodes, dropout rate) and training parameters (e.g., optimizer, learning rate) to accurately evaluate the predictive power of each feature candidate.

The MLP-based feature selection method selected 14 clinical features. Compared with the 16 clinical features selected by logistic regression (LR), 12 of the 16 clinical features were selected by both MLP and logistic regression, and the top 9 features are the same. The two clinical features selected by MLP but not by LR include fresh semen (yes/no) and follicle stimulating hormone (FSH) on day 3 after period. The four clinical features selected by LR but not MLP include number of ovarian stimulation cycles, progesterone (P) on HCG day, maternal body mass index (BMI), and free thyroxine (FT4) on day 3 after period. The AUC (0.77) achieved by the model using LR-selected clinical features and blastocyst images and the AUC (0.76) achieved by the model using MLP-selected clinical features and blastocyst images show no significant difference (p-value = 0.95 > 0.05).

Furthermore, the MLP-based feature selection took a few days due to the processes of iterative feature searching/testing and tuning hyperparameters. The LR-based feature selection is much more efficient and only takes a few minutes.

To clarify this issue, we have now added the data regarding the AUC of the model using all significant predictors in and blastocyst images and the MLP-based feature selection results in Figure 3-supplement 1, 2.

4) Please, consider presenting in detail a weighted sampling approach used to tackle the imbalance issue.

To clarify this issue, we have added details regarding the weighted sampling approach in paragraph 2 in the “Model implementation and training” section:

“In the weighted sampling approach, the probability of each item to be selected is determined by its weight, and the weight of each item is assigned by inverse class frequencies. In this way, the weighted sampling approach rebalances the class distributions by oversampling the minority class and under-sampling the majority class.”

5) The code is available only for generating figures 2, 4 reported in the paper. For figure 3, only data is available. Consider presenting this code for reproducibility purposes.

We have provided the source code for generating figure 3. The source code has been submitted with the revised manuscript and can also be accessed at https://github.com/robotVisionHang/LiveBirthPrediction_Data_Code

6) Please, improve the discussion of the potential applications of the proposed model in clinical settings and mention the method of testosterone assessment as a study limitation.

1) The potential application of the proposed model in clinical settings is to improve blastocyst selection. Among the various factors contributing to IVF outcomes, the quality (i.e., developmental potential) of the selected blastocyst for transfer is a major factor determining IVF success. Existing approaches for evaluating and selecting blastocysts are based on manually observing blastocyst morphology grade, which have shown limited predictive power on live birth outcomes of blastocysts (e.g., AUC = 0.58-0.61).

The proposed model achieved a significantly higher accuracy in evaluating the live birth potential of blastocysts (AUC = 0.77) for best blastocyst selection to improve the live birth rate. When the proposed model is applied in clinical practice for blastocyst selection, it takes images of multiple blastocysts of a same patient and patient couple’s clinical features as input, outputs the live birth probability of each blastocyst, and identifies the best blastocyst having the highest live birth probability.

To clarify this issue, we have added following contents on page 13:

“The proposed live birth prediction model improves the evaluation of a blastocyst in terms of its live birth potential for best blastocyst selection from multiple blastocysts of a patient. The next step is to validate the model’s prediction accuracy using prospectively collected data and verify its effectiveness in blastocyst selection via a randomized controlled trial (RCT). Patients enrolled in the RCT will be split into the study group and the control group (1:1 ratio). In the study group, the model selects a top blastocyst having the highest probability of live birth for transfer, and in the control group, embryologists select a top blastocyst based on their routine morphological grading for transfer. Live birth outcomes of both groups will be tracked and compared.”

2) The testosterone (T) included in the clinical feature analysis is total T measuring both free T and bioavailable T.

To clarify this issue, we revised the name of testosterone in Supplementary file 1, which now reads: “Total testosterone on day 3 after period”.

We also mentioned this as a study limitation on page 12:

“Note that in this study, only the total testosterone (T) was analyzed, and free T or bioavailable T was not available for clinical feature analysis (see Supplementary file 1). This may cause potential bias in determining the significance of testosterone as a predictor of live birth.”

Reviewer #1 (Recommendations for the authors):

The authors could mention the method of testosterone assessment as a study limitation that may cause a potential bias regarding the estimation of significance of testosterone as a predictor of live birth.

To clarify this issue, we revised the name of testosterone in Supplementary Table Ⅰ, which now reads: “Total testosterone on day 3 after period”.

We also mentioned this as a study limitation on page 12:

“Note that in this study, only the total testosterone (T) was analyzed, and free T or bioavailable T was not available for clinical feature analysis (see Supplementary file 1). This may cause potential bias in determining the significance of testosterone as a predictor of live birth.”

The authors could consider presenting the data for all significant predictors (of example, in the supplemental Figure 3 data) and justify the choice for inclusion in the model. It will better demonstrate the correctness of the selection of predictors

The AUC of the model using all significant predictors and blastocyst images is 0.75, lower than the AUC (0.77) achieved by the model using logistic regression-selected clinical features and blastocyst images, and the AUC (0.76) achieved by the model using MLP-selected clinical features and blastocyst images. Feature selection reduces the input feature dimensions by removing redundant features and features with limited predictive power, thus improving the model generalization capability.

To clarify this issue, we have added the ROC curve comparisons Figure 3-supplement 1:

Reviewer #2 (Recommendations for the authors):

I quite enjoyed reading the article. Overall, the motivation and concepts are well defined; however, the manuscript lacks the necessary methodological details, which hindered my ability to fully understand and appreciate the work.

The link between CNN and MLP architecture in the final integrated model is unclear. The section "Model architecture" needs to be expanded. It is not clear what "decision-level features" are from CNN or MLP. Are these features from the CNN's fully connected layer? In MLP, are they before the final layer? And how do authors concatenate these features? These details are important to understand final architecture.

The decision-level features are generated from the last fully connected layer in the CNN and the last fully connected layer in the MLP. Each decision-level feature has two variables used to classify the blastocyst into the positive or negative live birth outcome category. The two decision-level features are fused by the adding operation.

To clarify this issue, we have revised the “Model architecture” section, added text in Figure 1 to indicate the last fully connected layer, and provided the code to reproduce the model. The source code has been submitted with the revised manuscript and can also be accessed at https://github.com/robotVisionHang/LiveBirthPrediction_Data_Code.

The revised “Model architecture” section now reads:

“Figure 1 shows the architecture of the live birth prediction model based on multi-modal blastocyst evaluation. It consists of a CNN to process blastocyst images and an MLP to process patient couple's clinical features. Features from the CNN and the MLP are fused; thus, the model can be trained to simultaneously take into account both blastocyst images and patient couple's clinical features for live birth prediction. The last fully connected layer in the CNN and the last fully connected layer in the MLP each output a decision-level feature, which has two variables used to classify the blastocyst into the positive or negative live birth outcome category. The adding operation fuses decision-level features from the CNN and the MLP, and the result of addition is taken as the final output of the overall live birth prediction model.”

How was parameter optimization accomplished? Architectural details, i.e., the number of layers and nodes, are missing. What was the computational overhead for training these models?

To clarify this issue, we have added paragraph 3 in the “Model implementation and training” section:

“Model performance is subject to training hyperparameters (e.g., optimizer, learning rate, batch size, number of layers). Hence, an automatic hyperparameter-tuning tool is used, Facebook Ax (https://github.com/facebook/Ax), to search for the optimal hyperparameters for model training. The selected hyperparameters for training the model include a batch size of 16, an SGD optimizer with a learning rate of 0.008 and a momentum of 0.39, and three hidden layers in the MLP. A dropout layer follows each hidden layer in the MLP to prevent overfitting. The number of nodes in each hidden layer is 6836, 5657, and 468, respectively. The dropout rate in each dropout layer is 0.01, 0.07, and 0.67, respectively. The model was trained with four RTX A6000 GPUs. It took about 30 hours to search for the optimal hyperparameters and about an hour to train the model using the optimal hyperparameters.”

It is difficult to understand the discrepancy of features between CNN with clinical features and CNN without clinical features. Maybe it is because model architecture is not well defined. For the moment, it seems like CNN was trained independently, even in the concatenated version of the model. How can you explain the discrepancy between the activation maps of these two models?

The CNN and the MLP are connected by the two decision-level features. The adding operation fuses decision-level features from the CNN and the MLP, and the addition result is taken as the final output of the overall live birth prediction model. Thus, the model weights of the CNN and the MLP were trained simultaneously.

Another finding of this study, by comparing the heatmaps of the CNN trained without and with the inclusion of patient couple's clinical features, is that the weights of TE-related features increased (see Figure 4). A potential reason may be that TE and the endometrium status-related features (e.g., endometrium thickness and pattern) play critical roles when a blastocyst initiates implantation, and a positive live birth outcome is not possible without the success of this implantation process (Ahlström et al., 2011; Hill et al., 2013; Chen et al., 2014; Bakkensen et al., 2019).

To clarify this issue, we have revised the “Model architecture” section. The explanation of the discrepancy between the heatmaps of the CNN trained without and with clinical features is added on page 12:

“Another finding of this study, by comparing the heatmaps of the CNN trained without and with the inclusion of patient couple's clinical features, is that the weights of TE-related features increased (see Figure 4). A potential reason may be that TE and the endometrium status-related features (e.g., endometrium thickness and pattern) play critical roles when a blastocyst initiates implantation, and a positive live birth outcome is not possible without the success of this implantation process (Ahlström et al., 2011; Hill et al., 2013; Chen et al., 2014; Bakkensen et al., 2019).”

Important features were identified using logistic regression. I do not observe the link presenting these features as important features when the model was built using MLP instead. Could you verify that the important or top features MLP (using explanation methods) uses for prediction are the same as those inferred by the logistic regression?

To follow the reviewer’s suggestion, we used the sequential forward feature selection method to perform MLP-based feature selection. It is explainable in that features are sequentially added to an empty candidate set until the addition of further features does not increase the prediction accuracy. Note that we searched for the optimal MLP parameters (e.g., number of layers, nodes, dropout rate) and training parameters (e.g., optimizer, learning rate) to accurately evaluate the predictive power of each feature candidate.

The MLP-based feature selection method selected 14 clinical features. Compared with the 16 clinical features selected by logistic regression (LR), 12 of the 16 clinical features were selected by both MLP and logistic regression, and the top 9 features are the same. The two clinical features selected by MLP but not by LR include fresh semen (yes/no) and follicle stimulating hormone (FSH) on day 3 after period. The four clinical features selected by LR but not MLP include number of ovarian stimulation cycles, progesterone (P) on HCG day, maternal body mass index (BMI), and free thyroxine (FT4) on day 3 after period. The AUC (0.77) achieved by the model using LR-selected clinical features and blastocyst images and the AUC (0.76) achieved by the model using MLP-selected clinical features and blastocyst images show no significant difference (p-value = 0.95 > 0.05).

Furthermore, the MLP-based feature selection took a few days due to the processes of iterative feature searching/testing and tuning hyperparameters. The LR-based feature selection is much more efficient and only takes a few minutes.

To clarify this issue, we have now added the data regarding the AUC of the model using all significant predictors and blastocyst images and the MLP-based feature selection results in Figure 3-supplement 1, 2:

The imbalance issue was tackled using a weighted sampling approach. This approach needs to be detailed in the main text. And how were train, val, and test partitions built in view of the distribution of minority and majority classes? Did the author verify other approaches that can help resolve this issue?

We have added details regarding the weighted sampling approach in paragraph 2, the “Model implementation and training” section. We also verified the approach of using weighted cross-entropy loss. We found that both approaches can be used to mitigate the prediction bias towards the majority class. The weighted sampling approach performed better and was selected for the final model training.

The stratified random sampling approach was used to ensure that all split datasets (training, validation, and testing datasets) have the same distribution of minority and majority classes.

To clarify this issue, we have revised the following contents in paragraph 2, the “Model implementation and training” section:

“The blastocysts were randomly split into 80%:10%:10% to construct the training, validation, and testing datasets. The stratified random sampling approach was used to ensure that all split datasets have the same distribution of minority and majority classes. Since the ratio of blastocysts with a positive live birth outcome in the dataset is 0.368, to mitigate the model's prediction bias towards the majority category (i.e., the negative live birth outcome), the weighted sampling approach, which can help rebalance the class distributions when sampling from an imbalanced dataset (Feng et al., 2021), was employed for training the model. In the weighted sampling approach, the probability of each item to be selected is determined by its weight, and the weight of each item is assigned by inverse class frequencies. In this way, the weighted sampling approach rebalances the class distributions by oversampling the minority class and under-sampling the majority class. We also verified the approach of using weighted cross-entropy loss, which assigns greater weights to the loss caused by the prediction error of minority classes. Both approaches helped mitigate the prediction bias towards the majority class, and the results showed that the weighted sampling approach outperformed the weighted cross-entropy loss method.”

The code to reproduce the model is missing.

We have provided the source code to reproduce the model. The source code has been submitted with the revised manuscript and can also be accessed at https://github.com/robotVisionHang/LiveBirthPrediction_Data_Code

Discussion regarding implementation in a clinical setting would be informative. How feasible is the model's deployment in a clinic? Maybe you can further elaborate on prospective clinical trial which was mentioned in line # 363.

The potential application of the proposed model in clinical settings is to improve blastocyst selection. Among the various factors contributing to IVF outcomes, the quality (i.e., developmental potential) of the selected blastocyst for transfer is a major factor determining IVF success. Existing approaches for evaluating and selecting blastocysts are based on manually observing blastocyst morphology grade, which have shown limited predictive power on live birth outcomes of blastocysts (e.g., AUC = 0.58-0.61).

The proposed model achieved a significantly higher accuracy in evaluating the live birth potential of blastocysts (AUC = 0.77) for best blastocyst selection to improve the live birth rate. When the proposed model is applied in clinical practice for blastocyst selection, it takes images of multiple blastocysts of a same patient and patient couple’s clinical features as input, outputs the live birth probability of each blastocyst, and identifies the best blastocyst having the highest live birth probability.

To clarify this issue, we have added following contents on page 13:

“The proposed live birth prediction model improves the evaluation of a blastocyst in terms of its live birth potential for best blastocyst selection from multiple blastocysts of a patient. The next step is to validate the model’s prediction accuracy using prospectively collected data and verify its effectiveness in blastocyst selection via a randomized controlled trial (RCT). Patients enrolled in the RCT will be split into the study group and the control group (1:1 ratio). In the study group, the model selects a top blastocyst having the highest probability of live birth for transfer, and in the control group, embryologists select a top blastocyst based on their routine morphological grading for transfer. Live birth outcomes of both groups will be tracked and compared.”

https://doi.org/10.7554/eLife.83662.sa2

Article and author information

Author details

  1. Hang Liu

    Department of Mechanical Engineering, University of Toronto, Toronto, Canada
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing
    Contributed equally with
    Zhuoran Zhang and Yifan Gu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-7948-4236
  2. Zhuoran Zhang

    School of Science and Engineering, The Chinese University of Hong Kong-Shenzhen, Shenzhen, China
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing
    Contributed equally with
    Hang Liu and Yifan Gu
    Competing interests
    No competing interests declared
  3. Yifan Gu

    1. Institute of Reproductive and Stem Cell Engineering, School of Basic Medical Science, Central South University, Changsha, China
    2. Reproductive and Genetic Hospital of CITIC-Xiangya, Changsha, China
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing
    Contributed equally with
    Hang Liu and Zhuoran Zhang
    Competing interests
    No competing interests declared
  4. Changsheng Dai

    Department of Mechanical Engineering, University of Toronto, Toronto, Canada
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Investigation, Writing – review and editing
    Competing interests
    No competing interests declared
  5. Guanqiao Shan

    Department of Mechanical Engineering, University of Toronto, Toronto, Canada
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Investigation, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2570-769X
  6. Haocong Song

    Department of Mechanical Engineering, University of Toronto, Toronto, Canada
    Contribution
    Data curation, Software, Investigation, Writing – review and editing
    Competing interests
    No competing interests declared
  7. Daniel Li

    Department of Electrical and Computer Engineering, Toronto, Canada
    Contribution
    Data curation, Software, Validation, Investigation, Writing – review and editing
    Competing interests
    No competing interests declared
  8. Wenyuan Chen

    Department of Mechanical Engineering, University of Toronto, Toronto, Canada
    Contribution
    Conceptualization, Resources, Data curation, Supervision, Validation, Investigation, Project administration, Writing – review and editing
    Competing interests
    No competing interests declared
  9. Ge Lin

    1. Institute of Reproductive and Stem Cell Engineering, School of Basic Medical Science, Central South University, Changsha, China
    2. Reproductive and Genetic Hospital of CITIC-Xiangya, Changsha, China
    3. Key Laboratory of Reproductive and Stem Cell Engineering, National Health and Family Planning Commission, Changsha, China
    4. National Engineering Research Center of Human Stem Cells, Changsha, China
    Contribution
    Conceptualization, Resources, Supervision, Project administration, Writing – review and editing
    For correspondence
    linggf@hotmail.com
    Competing interests
    No competing interests declared
  10. Yu Sun

    1. Department of Mechanical Engineering, University of Toronto, Toronto, Canada
    2. Department of Electrical and Computer Engineering, Toronto, Canada
    3. Institute of Biomedical Engineering, University of Toronto, Toronto, Canada
    4. Department of Computer Science, University of Toronto, Toronto, Canada
    Contribution
    Conceptualization, Resources, Data curation, Formal analysis, Supervision, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing – original draft, Project administration, Writing – review and editing
    For correspondence
    sun@mie.utoronto.ca
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-7895-0741

Funding

Natural Sciences and Engineering Research Council of Canada

  • Yu Sun

Canada Research Chairs

  • Yu Sun

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Ethics

Human subjects: Informed consent was not necessary because this study used retrospective and fully de-identified data, no medical intervention was performed on the subject, and no biological samples from the patient were collected. This study was approved by the Ethics Committee of the Reproductive and Genetic Hospital of CITIC-Xiangya (approval number: LL-SC-2021-008).

Senior Editor

  1. Ricardo Azziz, University at Albany, SUNY, United States

Reviewing Editor

  1. Larisa V Suturina, Scientific Center for Family Health and Human Reproduction, Russian Federation

Version history

  1. Received: September 23, 2022
  2. Preprint posted: October 21, 2022 (view preprint)
  3. Accepted: February 20, 2023
  4. Accepted Manuscript published: February 22, 2023 (version 1)
  5. Version of Record published: April 3, 2023 (version 2)

Copyright

© 2023, Liu, Zhang, Gu et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 969
    Page views
  • 151
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Hang Liu
  2. Zhuoran Zhang
  3. Yifan Gu
  4. Changsheng Dai
  5. Guanqiao Shan
  6. Haocong Song
  7. Daniel Li
  8. Wenyuan Chen
  9. Ge Lin
  10. Yu Sun
(2023)
Development and evaluation of a live birth prediction model for evaluating human blastocysts from a retrospective study
eLife 12:e83662.
https://doi.org/10.7554/eLife.83662

Further reading

    1. Cancer Biology
    2. Computational and Systems Biology
    Megan E Kelley, Adi Y Berman ... Gregory P Way
    Research Article

    Drug resistance is a challenge in anticancer therapy. In many cases, cancers can be resistant to the drug prior to exposure, i.e., possess intrinsic drug resistance. However, we lack target-independent methods to anticipate resistance in cancer cell lines or characterize intrinsic drug resistance without a priori knowledge of its cause. We hypothesized that cell morphology could provide an unbiased readout of drug resistance. To test this hypothesis, we used HCT116 cells, a mismatch repair-deficient cancer cell line, to isolate clones that were resistant or sensitive to bortezomib, a well-characterized proteasome inhibitor and anticancer drug to which many cancer cells possess intrinsic resistance. We then expanded these clones and measured high-dimensional single-cell morphology profiles using Cell Painting, a high-content microscopy assay. Our imaging- and computation-based profiling pipeline identified morphological features that differed between resistant and sensitive cells. We used these features to generate a morphological signature of bortezomib resistance. We then employed this morphological signature to analyze a set of HCT116 clones (five resistant and five sensitive) that had not been included in the signature training dataset, and correctly predicted sensitivity to bortezomib in seven cases, in the absence of drug treatment. This signature predicted bortezomib resistance better than resistance to other drugs targeting the ubiquitin-proteasome system. Our results establish a proof-of-concept framework for the unbiased analysis of drug resistance using high-content microscopy of cancer cells, in the absence of drug treatment.

    1. Computational and Systems Biology
    Barbara Bravi, Andrea Di Gioacchino ... Rémi Monasson
    Research Article Updated

    Antigen immunogenicity and the specificity of binding of T-cell receptors to antigens are key properties underlying effective immune responses. Here we propose diffRBM, an approach based on transfer learning and Restricted Boltzmann Machines, to build sequence-based predictive models of these properties. DiffRBM is designed to learn the distinctive patterns in amino-acid composition that, on the one hand, underlie the antigen’s probability of triggering a response, and on the other hand the T-cell receptor’s ability to bind to a given antigen. We show that the patterns learnt by diffRBM allow us to predict putative contact sites of the antigen-receptor complex. We also discriminate immunogenic and non-immunogenic antigens, antigen-specific and generic receptors, reaching performances that compare favorably to existing sequence-based predictors of antigen immunogenicity and T-cell receptor specificity.