Development and evaluation of a live birth prediction model for evaluating human blastocysts from a retrospective study

  1. Hang Liu
  2. Zhuoran Zhang
  3. Yifan Gu
  4. Changsheng Dai
  5. Guanqiao Shan
  6. Haocong Song
  7. Daniel Li
  8. Wenyuan Chen
  9. Ge Lin  Is a corresponding author
  10. Yu Sun  Is a corresponding author
  1. Department of Mechanical Engineering, University of Toronto, Canada
  2. School of Science and Engineering, The Chinese University of Hong Kong-Shenzhen, China
  3. Institute of Reproductive and Stem Cell Engineering, School of Basic Medical Science, Central South University, China
  4. Reproductive and Genetic Hospital of CITIC-Xiangya, China
  5. Department of Electrical and Computer Engineering, Canada
  6. Key Laboratory of Reproductive and Stem Cell Engineering, National Health and Family Planning Commission, China
  7. National Engineering Research Center of Human Stem Cells, China
  8. Institute of Biomedical Engineering, University of Toronto, Canada
  9. Department of Computer Science, University of Toronto, Canada

Abstract

Background:

In infertility treatment, blastocyst morphological grading is commonly used in clinical practice for blastocyst evaluation and selection, but has shown limited predictive power on live birth outcomes of blastocysts. To improve live birth prediction, a number of artificial intelligence (AI) models have been established. Most existing AI models for blastocyst evaluation only used images for live birth prediction, and the area under the receiver operating characteristic (ROC) curve (AUC) achieved by these models has plateaued at ~0.65.

Methods:

This study proposed a multimodal blastocyst evaluation method using both blastocyst images and patient couple’s clinical features (e.g., maternal age, hormone profiles, endometrium thickness, and semen quality) to predict live birth outcomes of human blastocysts. To utilize the multimodal data, we developed a new AI model consisting of a convolutional neural network (CNN) to process blastocyst images and a multilayer perceptron to process patient couple’s clinical features. The data set used in this study consists of 17,580 blastocysts with known live birth outcomes, blastocyst images, and patient couple’s clinical features.

Results:

This study achieved an AUC of 0.77 for live birth prediction, which significantly outperforms related works in the literature. Sixteen out of 103 clinical features were identified to be predictors of live birth outcomes and helped improve live birth prediction. Among these features, maternal age, the day of blastocyst transfer, antral follicle count, retrieved oocyte number, and endometrium thickness measured before transfer are the top five features contributing to live birth prediction. Heatmaps showed that the CNN in the AI model mainly focuses on image regions of inner cell mass and trophectoderm (TE) for live birth prediction, and the contribution of TE-related features was greater in the CNN trained with the inclusion of patient couple's clinical features compared with the CNN trained with blastocyst images alone.

Conclusions:

The results suggest that the inclusion of patient couple’s clinical features along with blastocyst images increases live birth prediction accuracy.

Funding:

Natural Sciences and Engineering Research Council of Canada and the Canada Research Chairs Program.

Editor's evaluation

This article provides important findings that have practical implications for reproductive medicine and would be of interest to IVF specialists. Based on the compelling strength of evidence, the study demonstrates significant results in improving the predictive value of the live birth model based on blastocyst evaluation and clinical features.

https://doi.org/10.7554/eLife.83662.sa0

eLife digest

More than 50 million couples worldwide experience infertility. The most common treatment is in vitro fertilization (IVF). Fertility specialists collect eggs and sperm from the prospective parents. They combine the egg and sperm in a laboratory and allow the fertilized eggs to develop for five days into a multi-celled blastocyst. Then, the specialists select the healthiest blastocysts and return them to the patient's uterus.

Since 1978, more than 8 million children have been conceived through IVF. Yet, only about 30% of IVF attempts result in a successful birth. As a result, fertility patients often undergo multiple rounds of IVF, which can be expensive and emotionally draining. Several factors determine IVF success, one of which is the health of the blastocysts selected for transfer to the uterus. Specialists select the blastocysts using several criteria. But these human assessments are subjective and inconsistent in predicting which ones are most likely to result in a successful birth. Recent studies suggest artificial intelligence technology may help select blastocysts.

Liu et al. show that using artificial intelligence to assess blastocysts and fertility patient characteristics leads to more accurate predictions about which blastocysts are likely to result in a successful birth. In the experiments, the researchers trained an artificial intelligence computer program using pictures of 17,580 blastocysts with known birth outcomes and the parents' clinical characteristics. The model identified 16 parental factors associated with birth outcomes. The top 5 most predictive parental factors were maternal age, the day of blastocyst transfer to the uterus, how many eggs were present in the ovaries, the number of eggs retrieved and the thickness of the uterus lining. The program achieved the highest prediction of healthy births so far, compared to success rates listed in other studies.

Artificial intelligence-aided blastocyte selection using patient and blastocyst characteristics may improve IVF success rates and reduce the number of treatment cycles patient couples undergo. Before specialists can use artificial intelligence in their clinics, they must conduct confirmatory clinical studies that enroll patient couples to compare conventional methods and artificial intelligence.

Introduction

Infertility is a global health issue, affecting more than 50 million couples worldwide (Mascarenhas et al., 2012). Since the birth of the first in vitro fertilization (IVF) child in 1978, over 8 million children were born with IVF treatment (Adamson et al., 2018). Among the various factors contributing to IVF outcomes, the quality of the blastocyst (day 5 embryo) selected for transfer is critical for the success of IVF treatment. Manual grading of blastocyst development stage, inner cell mass (ICM), and trophectoderm (TE) remains the most common method for blastocyst evaluation. While the blastocyst morphological grading is widely used in clinical practice, morphological grades of the development stage, ICM, and TE have shown limited predictive power on clinical outcomes (Seli et al., 2011; Reignier et al., 2019; Bartolacci et al., 2021; Ueno et al., 2021; Xiong et al., 2022). It is desired to identify features for accurate prediction of clinical outcomes of blastocysts.

To achieve this goal, the convolutional neural network (CNN) is expected to play a critical role. CNN is able to automatically detect discriminative features from images and has been the state-of-the-art method in various fields in medical imaging, such as lung cancer prediction (Ardila et al., 2019), breast cancer prediction (McKinney et al., 2020), and diabetic retinopathy screening (Bora et al., 2021). To apply CNN to predict the clinical outcome of a blastocyst, images of blastocysts with a known clinical outcome (e.g., pregnancy and live birth) are collected for the CNN model development. The area under the receiver operating characteristic (ROC) curve (AUC) is the most commonly used metric to evaluate and compare machine learning models on predicting clinical outcomes of blastocysts (Kragh and Karstoft, 2021a). The AUCs reported in the literature using CNN to predict clinical outcomes from blastocyst images range from 0.64 to 0.71 for pregnancy prediction (VerMilyea et al., 2020; Kragh et al., 2021b; Berntsen et al., 2022; Enatsu et al., 2022; Loewke et al., 2022), and are around 0.65 for live birth prediction (Miyagi et al., 2019; Nagaya and Ukita, 2021).

Besides using blastocyst images, attempts have also been made to use time-lapse videos for live birth prediction. These videos contain the entire development process from days 0 to 5–7. However, results in the literature show that CNN using static images achieved similar or slightly better accuracies in predicting clinical outcomes than using time-lapse videos, for instance, AUC=0.68–0.71 (VerMilyea et al., 2020; Enatsu et al., 2022; Loewke et al., 2022) versus 0.64–0.67 (Kragh et al., 2021b; Berntsen et al., 2022) for pregnancy prediction, and 0.66 (Miyagi et al., 2019) versus 0.65 (Nagaya and Ukita, 2021) for live birth prediction. A potential reason is that the redundant frames of images in time-lapse videos may work as noise causing the model to overfit and thus leading to a lower prediction accuracy (Zhu et al., 2018; Wu et al., 2021; Tao et al., 2022). Therefore, we opted to use static blastocyst images in this study.

Different from using blastocyst images alone to predict clinical outcomes, Miyagi et al., 2020 proposed to use blastocyst images together with maternal clinical features including maternal age, AMH, and BMI and reported an AUC of 0.74, the highest accuracy in literature (Miyagi et al., 2020). However, two questions remain elusive. First, the contribution of blastocyst images and the additional contribution of maternal clinical features to live birth prediction are unknown. Second, endometrium status-related features, such as endometrium thickness and pattern, are also critical factors impacting live birth outcomes (Ng et al., 2007; Bu et al., 2016; Mahutte et al., 2022), but were not considered.

In this study, we quantified the effect of blastocyst images and the combined effect of both blastocyst images and patient couple’s clinical features on live birth prediction. The live birth prediction model using only blastocyst images achieved an AUC of 0.67, which was significantly outperformed by the AUC of 0.77 achieved by the model using both blastocyst images and patient couple’s clinical features (p value<0.0001). Additionally, when endometrium status-related features (e.g., endometrium thickness and pattern) were excluded, the AUC of the model using both blastocyst images and patient couple’s clinical features significantly decreased to 0.74 (p value<0.0001), indicating that the inclusion of endometrium status-related clinical features helps improve live birth prediction accuracy. Sixteen patient couple’s clinical features were identified to be most related to live birth outcomes of blastocysts, among which maternal age, the day of blastocyst transfer, antral follicle count (AFC), retrieved oocyte number, and endometrium thickness measured before transfer are the top five features contributing to live birth prediction. Additionally, the CNN heatmaps showed that the CNN mainly focused on ICM and TE for live birth prediction, and the contribution of TE-related features was greater in the CNN trained with the inclusion of patient couple’s clinical features compared with the CNN trained with blastocyst images alone.

Methods

Data set collection

We used retrospectively collected data to develop the live birth prediction model. Transferred blastocysts with known live birth outcomes for patients who underwent frozen embryo transfer cycles from 2016 to 2020, at the Reproductive and Genetic Hospital of CITIC-Xiangya, were reviewed for inclusion in the data set. Informed consent was not necessary because this study used retrospective and fully de-identified data, no medical intervention was performed on the subject, and no biological samples from the patient were collected. This study was approved by the Ethics Committee of the Reproductive and Genetic Hospital of CITIC-Xiangya (approval number: LL-SC-2021-008).

Blastocyst images were captured before transfer using a standard optical light microscope mounted with a camera. Two grayscale images were captured for each blastocyst, one focusing on ICM and the other focusing on TE. Blastocysts were cropped from the original images which have a resolution of 1024×768 and were consistently padded to 500×500 to facilitate model training. Patient couple’s clinical features consist of 103 features including maternal age and BMI, the day of blastocyst transfer, infertility diagnosis and treatment history of patient couples, ovarian stimulation protocols, maternal hormone profiles, and ultrasound results measured during the ovarian stimulation process and before transfer, and paternal semen diagnosis results (see Supplementary file 1 for a complete list). Based on p value analysis and logistic regression (LR)-based sequential forward feature selection (Solorio-Fernández et al., 2020; Raschka, 2018), 16 clinical features that are most relevant to live birth prediction were identified and used for training the machine learning model (see Figure 3). Feature selection reduces the input feature dimensions by removing redundant features and features with limited predictive power, thus improving the model generalization capability (see Figure 3—figure supplement 1). The LR-based feature selection was used due to its computing efficiency, we also presented the result of multilayer perceptron (MLP)-based feature selection in Figure 3—figure supplement 1 and Figure 3—figure supplement 2.

A total of 28,118 blastocysts with known live birth outcomes were reviewed, among which 17,580 blastocysts with two blastocyst images and all the 16 clinical features available were included in the data set.

Model architecture

Figure 1 shows the architecture of the live birth prediction model based on multimodal blastocyst evaluation. It consists of a CNN to process blastocyst images and an MLP to process patient couple’s clinical features. Features from the CNN and the MLP are fused; thus, the model can be trained to simultaneously take into account both blastocyst images and patient couple’s clinical features for live birth prediction. The last fully connected layer in the CNN and the last fully connected layer in the MLP each output a decision-level feature, which has two variables used to classify the blastocyst into the positive or negative live birth outcome category. The adding operation fuses decision-level features from the CNN and the MLP, and the result of addition is taken as the final output of the overall live birth prediction model.

Architecture of the live birth prediction model based on multimodal blastocyst evaluation.

CNN, convolutional neural network; FC, fully connected layer; MLP, multilayer perceptron.

Model implementation and training

The proposed live birth prediction model used EfficientNetV2-S as the backbone CNN. EfficientNetV2-S is the baseline model in the EfficientNetV2 family, which is a new family of CNN models that provide higher accuracy and training speed than conventional models (Tan and Le, 2021). In our work, the output dimension of the final fully connected layer in EfficientNetV2-S was set to be two, representing the positive and negative live birth outcome of a blastocyst, respectively. The model was implemented using PyTorch 1.10.1 (Paszke et al., 2019).

Each of the 17,580 blastocysts had two images taken at different focal planes, one focused more on TE cells and the other on ICM. Furthermore, for each blastocyst, live birth outcomes and all the 16 patient couple’s clinical features were available. The blastocysts were randomly split into 80%:10%:10% to construct the training, validation, and testing data sets. The stratified random sampling approach was used to ensure that all split data sets have the same distribution of minority and majority classes. Since the ratio of blastocysts with a positive live birth outcome in the data set is 0.368, to mitigate the model’s prediction bias toward the majority category (i.e., the negative live birth outcome), the weighted sampling approach, which can help rebalance the class distributions when sampling from an imbalanced data set (Feng et al., 2021), was employed for training the model. In the weighted sampling approach, the probability of each item to be selected is determined by its weight, and the weight of each item is assigned by inverse class frequencies. In this way, the weighted sampling approach rebalances the class distributions by oversampling the minority class and under-sampling the majority class. We also verified the approach of using weighted cross-entropy loss, which assigns greater weights to the loss caused by the prediction error of minority classes. Both approaches helped mitigate the prediction bias toward the majority class, and the results showed that the weighted sampling approach outperformed the weighted cross-entropy loss method.

Model performance is subject to training hyperparameters (e.g., optimizer, learning rate, batch size, and number of layers). Hence, an automatic hyperparameter-tuning tool is used, Facebook Ax (version 0.2.2, https://github.com/facebook/Ax), to search for the optimal hyperparameters for model training. The selected hyperparameters for training the model include a batch size of 16, an SGD optimizer with a learning rate of 0.008, and a momentum of 0.39, and three hidden layers in the MLP. A dropout layer follows each hidden layer in the MLP to prevent overfitting. The number of nodes in each hidden layer is 6836, 5657, and 468, respectively. The dropout rate in each dropout layer is 0.01, 0.07, and 0.67, respectively. The model was trained with four RTX A6000 GPUs. It took about 30 hr to search for the optimal hyperparameters and about an hour to train the model using the optimal hyperparameters.

Statistical analysis

Statistical tests were calculated to compare clinical features between blastocysts with the positive live birth outcome and blastocysts with the negative live birth outcome. Chi-squared test was used for categorical features, t test was used for numerical features. Chi-squared test and t test were performed using Python (version 3.6). ROC curves were compared by the DeLong test implemented in MedCalc software (version 20). All statistical tests were two-tailed and considered significant if p value≤0.05.

Results

The inclusion of patient couple’s clinical features increased AUC for live birth prediction

To quantify the individual effect of blastocyst images and the combined effect of patient couple’s clinical features, we built and compared models that (1) used only blastocyst images for live birth prediction, and (2) used both blastocyst images and patient couple’s clinical features for prediction. In addition, to quantify the specific effect of endometrium status-related features (i.e., endometrium thickness before transfer, endometrium thickness on HCG day, and endometrium pattern B (yes/no) on HCG day) on live birth prediction, a third model trained using blastocyst images and patient couple’s clinical features where endometrium status-related features were excluded, was also built and compared.

Figure 2 shows the ROC curves and AUCs of the three models for predicting live birth outcomes of 1758 blastocysts (i.e., 10% of 17,580) in the test data set. Using only blastocyst images for live birth prediction gave an AUC of 0.67, with a 95% confidence interval (CI) of 0.65–0.70. Using blastocyst images and patient couple’s clinical features (endometrium status-related features excluded) significantly increased the AUC to 0.74 (95% CI: 0.72–0.76, p value<0.0001). Using both blastocyst images and patient couple’s clinical features (endometrium status-related features included) achieved a prediction AUC of 0.77 (95% CI: 0.75–0.79), which is significantly higher than using only blastocyst images for prediction (p value<0.0001) and than using blastocyst images and patient couple’s clinical features where endometrium status-related features were excluded (p value=0.007).

Receiver operating characteristic (ROC) analysis.

ROC curves of the model using only blastocyst images, the model using blastocyst images and patient couple’s clinical features where EM-status related features were excluded, and the model using blastocyst images and patient couple’s clinical features where EM-status related features were included to predict live birth outcomes of 1758 blastocysts in the test data set. AUC, area under the ROC curve; EM, endometrium; EM status-related features, endometrium thickness before transfer, endometrium thickness on HCG day, endometrium pattern B (yes/no) on HCG day.

Figure 2—source data 1

Code and data used to generate the ROC curves.

https://cdn.elifesciences.org/articles/83662/elife-83662-fig2-data1-v2.zip

Ranking the predictive power of patient couple’s clinical features

We then investigated the predictive power of each patient couple’s clinical feature in predicting live birth outcome. Figure 3 shows the 16 features that were identified to be most related to the live birth outcomes of the blastocysts. These features were ranked according to their AUCs for individually predicting the live birth outcomes of blastocysts using univariable LR. The AUC for each feature was reported as the mean AUC over a tenfold cross-validation process.

Figure 3 with 2 supplements see all
Ranking the predictive power of patient couple’s clinical features.

The 16 patient couple’s clinical features that were identified to be most related to the live birth outcomes of the blastocysts ranked by the AUC for individually predicting live birth outcome. AUC, area under the curve.

Figure 3—source data 1

Code and data used to generate the AUC ranking chart.

https://cdn.elifesciences.org/articles/83662/elife-83662-fig3-data1-v2.zip

CNN heatmaps

In blastocyst images, what does CNN focus on to predict the live birth outcome of a blastocyst? Is there a difference in what the CNN focuses on between the model trained without and with the inclusion of patient couple’s clinical features? To answer these questions, we used the class activation mapping method to generate heatmaps. Figure 4 shows blastocyst images, corresponding heatmaps of the CNN trained without including patient couple’s clinical features, and corresponding heatmaps of the CNN trained with including patient couple’s clinical features.

Figure 4 with 1 supplement see all
CNN heatmaps analysis.

Heatmaps of the CNN trained without and with patient couple’s clinical features. Column (A): original blastocyst images. Column (B): corresponding heatmaps of the CNN trained without including patient couple’s clinical features. Column (C): corresponding heatmaps of the CNN trained with the inclusion of patient couple’s clinical features. CNN, convolutional neural network.

Figure 4—source data 1

Code and data used to generate the heatmaps shown in Figure 4.

https://cdn.elifesciences.org/articles/83662/elife-83662-fig4-data1-v2.zip

The blastocyst images were cropped and padded to a consistent size to facilitate model training. The padding value was calculated as the mean pixel value of blastocyst images in the data set. Heatmaps were generated by XGradCAM (Fu et al., 2020). Note that the CNN takes two-channel blastocyst images as the input, one focusing on ICM and the other focusing on TE. The blastocyst images shown in Figure 4 are those focused on ICM, and Figure 4—figure supplement 1 shows the two-channel blastocyst images. As can be seen in Figure 4, when trained using only blastocyst images, the CNN mainly focuses on ICM and TE for predicting live birth outcomes. When training with both blastocyst images and patient couple’s clinical features, TE-related features contributed more to live birth prediction compared with training with blastocyst images only.

Discussion

In this study, the individual effect of blastocyst images and the combined effect of patient couple’s clinical features for live birth prediction were quantified by comparing the AUC values of the model using only blastocyst images and the model using both blastocyst images and patient couple’s clinical features. An AUC of 0.67 was achieved with blastocyst images only while using both blastocyst images and patient couple’s clinical features led to a significantly higher AUC of 0.77 in live birth prediction. When endometrium status-related features were excluded from patient couple’s clinical features, the AUC of the live birth prediction model significantly decreased (p value=0.007) from 0.77 to 0.74, indicating the strong relevance of endometrium status-related features in live birth prediction. Sixteen patient couple’s clinical features were identified to be most related to live birth outcomes of blastocysts, among which maternal age, the day of blastocyst transfer, AFC, retrieved oocyte number, and endometrium thickness before transfer are the top five features contributing to live birth prediction.

This study was based on a comprehensive multimodal data set collected for blastocyst evaluation. The data set includes 17,580 blastocysts with known live birth outcomes, blastocyst images, and 16 patient couple’s clinical features. As shown in Figure 3, 16 patient couple’s clinical features comprehensively include maternal basal characteristics (age and BMI); hormone profiles measured after period (LH and FT4), on HCG day (PE2, P, and LH), and before transfer (E2); endometrium status-related features (endometrium thickness on HCG day and before transfer, endometrium pattern on HCG day); features related to oocytes (AFC, retrieved oocyte number); the day of blastocyst transfer; number of ovarian stimulation cycles; and paternal features (the ratio of grade A sperm after semen processing). For comparison, the data set studied by Miyagi et al., 2020 did not contain endometrium status-related features and key hormone profiles (e.g., P, E2, and LH). There are numerous IVF data sets containing over 100,000 records of clinical features and live birth outcomes (Nelson and Lawlor, 2011; McLernon et al., 2016; La Marca et al., 2021); however, there are no blastocyst images in these data sets, and thus, these data sets cannot be used for building models to evaluate blastocysts from their images.

To handle the multimodal data set, our proposed model was designed to integrate two modules including a CNN and an MLP to enable the model to simultaneously consider images and numerical clinical features for blastocyst evaluation. The large and comprehensive multimodal data set and the proposed CNN+MLP model resulted in the highest AUC value of 0.77 ever reported thus far for predicting live birth outcomes of blastocysts. They also enabled us to quantify the predictive power of each feature in predicting the live birth outcomes of blastocysts.

The blastocyst grading system introduced in 1999 (Gardner and Schoolcraft, 1999; Gardner, 1999) remains the most common method used by embryologists to evaluate blastocyst quality although the morphological grades of blastocyst development stage, ICM and TE have limited predictive power on live birth outcomes (e.g., AUC=0.58–0.61 for live birth prediction reported by Reignier et al., 2019; Bartolacci et al., 2021; Xiong et al., 2022). Since CNN became a state-of-the-art method for image-based classification, many attempts have been made to apply the CNN to blastocyst evaluation for predicting clinical outcomes (e.g., VerMilyea et al., 2020; Kragh et al., 2021b; Berntsen et al., 2022; Enatsu et al., 2022; Loewke et al., 2022; Miyagi et al., 2019; Nagaya and Ukita, 2021). Among these, the AUC values reported in the literature using blastocyst images only were around 0.65 for live birth prediction (Miyagi et al., 2019; Nagaya and Ukita, 2021). Similarly, we achieved an AUC of 0.67 (see Figure 2). Compared with the AUC of 0.58–0.61 reported in the literature using Gardner grades for live birth prediction, these results confirmed that CNN can achieve a higher prediction accuracy. As shown in Figure 4, the CNN mainly focuses on ICM and TE. Different from the Gardner-defined TE grade on the number of TE cells and the cohesiveness of TE cells as a whole, the CNN tends to focus on specific TE clusters. Understanding the heatmaps further requires more investigations.

Miyagi et al., 2020 used both blastocyst images and maternal clinical features (age, AMH, and BMI) to predict live birth outcomes and achieved an AUC of 0.74 (Miyagi et al., 2020). The additional contribution from the three maternal clinical features was not clear since no AUC was reported by using blastocyst images alone. Furthermore, despite their importance in pregnancy and live birth, endometrium status-related features were not considered in their work. Therefore, our study used a comprehensive data set and quantitatively compared the AUC values of live birth prediction using only blastocyst images versus using both blastocyst images and patient couple’s clinical features. We also quantified the usefulness of endometrium status-related features in working with blastocyst images to improve live birth prediction. Furthermore, we revealed that hormone profiles such as E2, LH, P, and FT4, features related to oocyte retrieval such as AFC and number of oocytes retrieved, and the ratio of grade A sperm after processing representing semen quality are able to work with blastocyst images to further improve the live birth prediction accuracy. Note that in this study, only the total testosterone (T) was analyzed, and free T or bioavailable T was not available for clinical feature analysis (see Supplementary file 1). This may cause potential bias in determining the significance of testosterone as a predictor of live birth.

Another finding of this study, by comparing the heatmaps of the CNN trained without and with the inclusion of patient couple’s clinical features, is that the weights of TE-related features increased (see Figure 4). A potential reason may be that TE and the endometrium status-related features (e.g., endometrium thickness and pattern) play critical roles when a blastocyst initiates implantation, and a positive live birth outcome is not possible without the success of this implantation process (Ahlström et al., 2011; Hill et al., 2013; Chen et al., 2014; Bakkensen et al., 2019).

In conclusion, in this retrospective study involving 17,580 blastocysts with known live birth outcomes, blastocyst images and 16 patient couple’s clinical features, we built a live birth prediction model based on multimodal blastocyst evaluation using both blastocyst images and patient couple’s clinical features. We quantified the individual effect of blastocyst images and the combined effect of patient couple’s clinical features on live birth prediction. Results demonstrated that using both blastocyst images and patient couple’s clinical features can significantly improve live birth prediction than using blastocyst images alone.

The proposed live birth prediction model improves the evaluation of a blastocyst in terms of its live birth potential for best blastocyst selection from multiple blastocysts of a patient. The next step is to validate the model’s prediction accuracy using prospectively collected data and verify its effectiveness in blastocyst selection via a randomized controlled trial (RCT). Patients enrolled in the RCT will be split into the study group and the control group (1:1 ratio). In the study group, the model selects a top blastocyst having the highest probability of live birth for transfer, and in the control group, embryologists select a top blastocyst based on their routine morphological grading for transfer. Live birth outcomes of both groups will be tracked and compared.

Data availability

All processed data and code needed to reproduce the findings of the study are made openly available in deidentified form. This can be found in https://github.com/robotVisionHang/LiveBirthPrediction_Data_Code (copy archived at Liu et al., 2023), and attached to this manuscript. All codes and software used to analyze the data can also be accessed through the link. Due to data privacy regulations of patient data, raw data cannot be publicly shared. Interested researchers are welcome to contact the corresponding author with a concise project proposal indicating aims of using the data and how they will use the data. The project proposal will be firstly assessed by Prof. Yu Sun, Prof. Ge Lin, and then by the Ethics Committee of the Reproductive and Genetic Hospital of CITIC-Xiangya. There are no restrictions on who can access the data.

References

  1. Book
    1. Gardner DK
    (1999)
    In-vitro culture of human blastocyst
    In: Gardner DK, editors. Towards Reproductive Certainty: Infertility and Genetics Beyond 1999. CRC Press. pp. 378–388.
  2. Conference
    1. Tan M
    2. Le Q
    (2021)
    Efficientnetv2: Smaller models and faster training
    In International Conference on Machine Learning 2021. pp. 10096–10106.

Article and author information

Author details

  1. Hang Liu

    Department of Mechanical Engineering, University of Toronto, Toronto, Canada
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing
    Contributed equally with
    Zhuoran Zhang and Yifan Gu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-7948-4236
  2. Zhuoran Zhang

    School of Science and Engineering, The Chinese University of Hong Kong-Shenzhen, Shenzhen, China
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing
    Contributed equally with
    Hang Liu and Yifan Gu
    Competing interests
    No competing interests declared
  3. Yifan Gu

    1. Institute of Reproductive and Stem Cell Engineering, School of Basic Medical Science, Central South University, Changsha, China
    2. Reproductive and Genetic Hospital of CITIC-Xiangya, Changsha, China
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing
    Contributed equally with
    Hang Liu and Zhuoran Zhang
    Competing interests
    No competing interests declared
  4. Changsheng Dai

    Department of Mechanical Engineering, University of Toronto, Toronto, Canada
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Investigation, Writing – review and editing
    Competing interests
    No competing interests declared
  5. Guanqiao Shan

    Department of Mechanical Engineering, University of Toronto, Toronto, Canada
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Investigation, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2570-769X
  6. Haocong Song

    Department of Mechanical Engineering, University of Toronto, Toronto, Canada
    Contribution
    Data curation, Software, Investigation, Writing – review and editing
    Competing interests
    No competing interests declared
  7. Daniel Li

    Department of Electrical and Computer Engineering, Toronto, Canada
    Contribution
    Data curation, Software, Validation, Investigation, Writing – review and editing
    Competing interests
    No competing interests declared
  8. Wenyuan Chen

    Department of Mechanical Engineering, University of Toronto, Toronto, Canada
    Contribution
    Conceptualization, Resources, Data curation, Supervision, Validation, Investigation, Project administration, Writing – review and editing
    Competing interests
    No competing interests declared
  9. Ge Lin

    1. Institute of Reproductive and Stem Cell Engineering, School of Basic Medical Science, Central South University, Changsha, China
    2. Reproductive and Genetic Hospital of CITIC-Xiangya, Changsha, China
    3. Key Laboratory of Reproductive and Stem Cell Engineering, National Health and Family Planning Commission, Changsha, China
    4. National Engineering Research Center of Human Stem Cells, Changsha, China
    Contribution
    Conceptualization, Resources, Supervision, Project administration, Writing – review and editing
    For correspondence
    linggf@hotmail.com
    Competing interests
    No competing interests declared
  10. Yu Sun

    1. Department of Mechanical Engineering, University of Toronto, Toronto, Canada
    2. Department of Electrical and Computer Engineering, Toronto, Canada
    3. Institute of Biomedical Engineering, University of Toronto, Toronto, Canada
    4. Department of Computer Science, University of Toronto, Toronto, Canada
    Contribution
    Conceptualization, Resources, Data curation, Formal analysis, Supervision, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing – original draft, Project administration, Writing – review and editing
    For correspondence
    sun@mie.utoronto.ca
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-7895-0741

Funding

Natural Sciences and Engineering Research Council of Canada

  • Yu Sun

Canada Research Chairs

  • Yu Sun

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Ethics

Human subjects: Informed consent was not necessary because this study used retrospective and fully de-identified data, no medical intervention was performed on the subject, and no biological samples from the patient were collected. This study was approved by the Ethics Committee of the Reproductive and Genetic Hospital of CITIC-Xiangya (approval number: LL-SC-2021-008).

Version history

  1. Received: September 23, 2022
  2. Preprint posted: October 21, 2022 (view preprint)
  3. Accepted: February 20, 2023
  4. Accepted Manuscript published: February 22, 2023 (version 1)
  5. Version of Record published: April 3, 2023 (version 2)

Copyright

© 2023, Liu, Zhang, Gu et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,364
    views
  • 222
    downloads
  • 2
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Hang Liu
  2. Zhuoran Zhang
  3. Yifan Gu
  4. Changsheng Dai
  5. Guanqiao Shan
  6. Haocong Song
  7. Daniel Li
  8. Wenyuan Chen
  9. Ge Lin
  10. Yu Sun
(2023)
Development and evaluation of a live birth prediction model for evaluating human blastocysts from a retrospective study
eLife 12:e83662.
https://doi.org/10.7554/eLife.83662

Share this article

https://doi.org/10.7554/eLife.83662

Further reading

    1. Computational and Systems Biology
    Qianmu Yuan, Chong Tian, Yuedong Yang
    Tools and Resources

    Revealing protein binding sites with other molecules, such as nucleic acids, peptides, or small ligands, sheds light on disease mechanism elucidation and novel drug design. With the explosive growth of proteins in sequence databases, how to accurately and efficiently identify these binding sites from sequences becomes essential. However, current methods mostly rely on expensive multiple sequence alignments or experimental protein structures, limiting their genome-scale applications. Besides, these methods haven’t fully explored the geometry of the protein structures. Here, we propose GPSite, a multi-task network for simultaneously predicting binding residues of DNA, RNA, peptide, protein, ATP, HEM, and metal ions on proteins. GPSite was trained on informative sequence embeddings and predicted structures from protein language models, while comprehensively extracting residual and relational geometric contexts in an end-to-end manner. Experiments demonstrate that GPSite substantially surpasses state-of-the-art sequence-based and structure-based approaches on various benchmark datasets, even when the structures are not well-predicted. The low computational cost of GPSite enables rapid genome-scale binding residue annotations for over 568,000 sequences, providing opportunities to unveil unexplored associations of binding sites with molecular functions, biological processes, and genetic variants. The GPSite webserver and annotation database can be freely accessed at https://bio-web1.nscc-gz.cn/app/GPSite.

    1. Cell Biology
    2. Computational and Systems Biology
    Thomas Grandits, Christoph M Augustin ... Alexander Jung
    Research Article

    Computer models of the human ventricular cardiomyocyte action potential (AP) have reached a level of detail and maturity that has led to an increasing number of applications in the pharmaceutical sector. However, interfacing the models with experimental data can become a significant computational burden. To mitigate the computational burden, the present study introduces a neural network (NN) that emulates the AP for given maximum conductances of selected ion channels, pumps, and exchangers. Its applicability in pharmacological studies was tested on synthetic and experimental data. The NN emulator potentially enables massive speed-ups compared to regular simulations and the forward problem (find drugged AP for pharmacological parameters defined as scaling factors of control maximum conductances) on synthetic data could be solved with average root-mean-square errors (RMSE) of 0.47 mV in normal APs and of 14.5 mV in abnormal APs exhibiting early afterdepolarizations (72.5% of the emulated APs were alining with the abnormality, and the substantial majority of the remaining APs demonstrated pronounced proximity). This demonstrates not only very fast and mostly very accurate AP emulations but also the capability of accounting for discontinuities, a major advantage over existing emulation strategies. Furthermore, the inverse problem (find pharmacological parameters for control and drugged APs through optimization) on synthetic data could be solved with high accuracy shown by a maximum RMSE of 0.22 in the estimated pharmacological parameters. However, notable mismatches were observed between pharmacological parameters estimated from experimental data and distributions obtained from the Comprehensive in vitro Proarrhythmia Assay initiative. This reveals larger inaccuracies which can be attributed particularly to the fact that small tissue preparations were studied while the emulator was trained on single cardiomyocyte data. Overall, our study highlights the potential of NN emulators as powerful tool for an increased efficiency in future quantitative systems pharmacology studies.