Knowledge synthesis of 100 million biomedical documents augments the deep expression profiling of coronavirus receptors
Abstract
The COVID-19 pandemic demands assimilation of all biomedical knowledge to decode mechanisms of pathogenesis. Despite the recent renaissance in neural networks, a platform for the real-time synthesis of the exponentially growing biomedical literature and deep omics insights is unavailable. Here, we present the nferX platform for dynamic inference from 45 quadrillion+ possible conceptual associations from unstructured text and triangulation with insights from Single Cell RNA-sequencing, bulk RNAseq and proteomics from diverse tissue types. A hypothesis-free profiling of ACE2 suggests tongue keratinocytes, olfactory epithelial cells, airway club cells and respiratory ciliated cells as potential reservoirs of the SARS-CoV-2 receptor. We find the gut as the putative hotspot of COVID-19, where a maturation correlated transcriptional signature is shared in small intestine enterocytes among coronavirus receptors(ACE2, DPP4, ANPEP). A holistic data science platform triangulating insights from structured and unstructured data holds potential for accelerating the generation of impactful biological insights and hypotheses.
Data availability
All data used in this manuscript were obtained from published and freely available sources online. A complete list of these can be found in Supplementary File 1.
Article and author information
Author details
Funding
No external funding was received for this work.
Copyright
© 2020, Venkatakrishnan et al.
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 4,141
- views
-
- 526
- downloads
-
- 47
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Epidemiology and Global Health
- Medicine
- Microbiology and Infectious Disease
eLife has published the following articles on SARS-CoV-2 and COVID-19.
-
- Medicine
Background: Several fields have described low reproducibility of scientific research and poor accessibility in research reporting practices. Although previous reports have investigated accessible reporting practices that lead to reproducible research in other fields, to date, no study has explored the extent of accessible and reproducible research practices in cardiovascular science literature.
Methods: To study accessibility and reproducibility in cardiovascular research reporting, we screened 639 randomly selected articles published in 2019 in three top cardiovascular science publications: Circulation, the European Heart Journal, and the Journal of the American College of Cardiology (JACC). Of those 639 articles, 393 were empirical research articles. We screened each paper for accessible and reproducible research practices using a set of accessibility criteria including protocol, materials, data, and analysis script availability, as well as accessibility of the publication itself. We also quantified the consistency of open research practices within and across cardiovascular study types and journal formats.
Results: We identified that fewer than 2% of cardiovascular research publications provide sufficient resources (materials, methods, data, and analysis scripts) to fully reproduce their studies. Of the 639 articles screened, 393 were empirical research studies for which reproducibility could be assessed using our protocol, as opposed to commentaries or reviews. After calculating an accessibility score as a measure of the extent to which an article makes its resources available, we also showed that the level of accessibility varies across study types with a score of 0.08 for Case Studies or Case Series and 0.39 for Clinical Trials (p = 5.500E-5) and across journals (0.19 through 0.34, p = 1.230E-2). We further showed that there are significant differences in which study types share which resources.
Conclusion: Although the degree to which reproducible reporting practices are present in publications varies significantly across journals and study types, current cardiovascular science reports frequently do not provide sufficient materials, protocols, data, or analysis information to reproduce a study. In the future, having higher standards of accessibility mandated by either journals or funding bodies will help increase the reproducibility of cardiovascular research.
Funding: Authors Gabriel Heckerman, Arely Campos-Melendez, and Chisomaga Ekwueme were supported by an NIH R25 grant from the National Heart, Lung and Blood Institute (R25HL147666). Eileen Tzng was supported by an AHA Institutional Training Award fellowship (18UFEL33960207).