Author-sourced capture of pathway knowledge in computable form using Biofactoid
Abstract
Making the knowledge contained in scientific papers machine-readable and formally computable would allow researchers to take full advantage of this information by enabling integration with other knowledge sources to support data analysis and interpretation. Here we describe Biofactoid, a web-based platform that allows scientists to specify networks of interactions between genes, their products, and chemical compounds, and then translates this information into a representation suitable for computational analysis, search and discovery. We also report the results of a pilot study to encourage the wide adoption of Biofactoid by the scientific community.
Data availability
All Biofactoid data are available under the Creative Commons CC0 public domain license. To download the data and code, please refer to the documentation on the Biofactoid GitHub repository (github.com/PathwayCommons/factoid). More information on software availability is available in Materials and methods.
Article and author information
Author details
Funding
National Human Genome Research Institute (U41 HG006623)
- Jeffrey V Wong
- Max Franz
- Metin Can Siper
- Dylan Fong
- Funda Durupinar
- Christian Dallago
- Augustin Luna
- John M Giorgi
- Igor Rodchenkov
- Özgün Babur
- Emek Demir
- Gary D Bader
- Chris Sander
National Human Genome Research Institute (U41 HG003751)
- Jeffrey V Wong
- Max Franz
- Metin Can Siper
- Dylan Fong
- Funda Durupinar
- Christian Dallago
- Augustin Luna
- John M Giorgi
- Igor Rodchenkov
- Özgün Babur
- Emek Demir
- Gary D Bader
- Chris Sander
National Human Genome Research Institute (R01 HG009979)
- Max Franz
- Gary D Bader
National Institute of General Medical Sciences (P41 GM103504)
- Jeffrey V Wong
- Max Franz
- Metin Can Siper
- Dylan Fong
- Funda Durupinar
- Christian Dallago
- Augustin Luna
- John M Giorgi
- Igor Rodchenkov
- Özgün Babur
- Emek Demir
- Gary D Bader
- Chris Sander
Defense Advanced Research Projects Agency (Big Mechanism,ARO W911NF-14-C-0119)
- Metin Can Siper
- Funda Durupinar
- Özgün Babur
- John A Bachman
- Benjamin Gyori
- Emek Demir
Defense Advanced Research Projects Agency (Communicating with Computers,ARO W911NF-15-1-054)
- Metin Can Siper
- Funda Durupinar
- Özgün Babur
- John A Bachman
- Benjamin Gyori
- Emek Demir
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Ethics
Human subjects: Participants of user testing provided written consent to volunteer, have their testing sessions recorded and have quotes obtained in the session published.
Reviewing Editor
- Helena Pérez Valle, eLife, United Kingdom
Publication history
- Preprint posted: March 11, 2021 (view preprint)
- Received: March 11, 2021
- Accepted: December 2, 2021
- Accepted Manuscript published: December 3, 2021 (version 1)
- Version of Record published: December 17, 2021 (version 2)
Copyright
© 2021, Wong et al.
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 1,357
- Page views
-
- 83
- Downloads
-
- 7
- Citations
Article citation count generated by polling the highest count across the following sources: PubMed Central, Crossref, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Cancer Biology
- Computational and Systems Biology
Drug resistance is a challenge in anticancer therapy. In many cases, cancers can be resistant to the drug prior to exposure, i.e., possess intrinsic drug resistance. However, we lack target-independent methods to anticipate resistance in cancer cell lines or characterize intrinsic drug resistance without a priori knowledge of its cause. We hypothesized that cell morphology could provide an unbiased readout of drug resistance. To test this hypothesis, we used HCT116 cells, a mismatch repair-deficient cancer cell line, to isolate clones that were resistant or sensitive to bortezomib, a well-characterized proteasome inhibitor and anticancer drug to which many cancer cells possess intrinsic resistance. We then expanded these clones and measured high-dimensional single-cell morphology profiles using Cell Painting, a high-content microscopy assay. Our imaging- and computation-based profiling pipeline identified morphological features that differed between resistant and sensitive cells. We used these features to generate a morphological signature of bortezomib resistance. We then employed this morphological signature to analyze a set of HCT116 clones (five resistant and five sensitive) that had not been included in the signature training dataset, and correctly predicted sensitivity to bortezomib in seven cases, in the absence of drug treatment. This signature predicted bortezomib resistance better than resistance to other drugs targeting the ubiquitin-proteasome system. Our results establish a proof-of-concept framework for the unbiased analysis of drug resistance using high-content microscopy of cancer cells, in the absence of drug treatment.
-
- Computational and Systems Biology
Antigen immunogenicity and the specificity of binding of T-cell receptors to antigens are key properties underlying effective immune responses. Here we propose diffRBM, an approach based on transfer learning and Restricted Boltzmann Machines, to build sequence-based predictive models of these properties. DiffRBM is designed to learn the distinctive patterns in amino-acid composition that, on the one hand, underlie the antigen’s probability of triggering a response, and on the other hand the T-cell receptor’s ability to bind to a given antigen. We show that the patterns learnt by diffRBM allow us to predict putative contact sites of the antigen-receptor complex. We also discriminate immunogenic and non-immunogenic antigens, antigen-specific and generic receptors, reaching performances that compare favorably to existing sequence-based predictors of antigen immunogenicity and T-cell receptor specificity.