Author-sourced capture of pathway knowledge in computable form using Biofactoid

  1. Jeffrey V Wong  Is a corresponding author
  2. Max Franz
  3. Metin Can Siper
  4. Dylan Fong
  5. Funda Durupinar
  6. Christian Dallago
  7. Augustin Luna
  8. John M Giorgi
  9. Igor Rodchenkov
  10. Özgün Babur
  11. John A Bachman
  12. Benjamin Gyori
  13. Emek Demir  Is a corresponding author
  14. Gary D Bader  Is a corresponding author
  15. Chris Sander  Is a corresponding author
  1. University of Toronto, Canada
  2. Oregon Health and Science University, United States
  3. University of Massachusetts Boston, United States
  4. Technische Universität München, Germany
  5. Dana-Farber Cancer Institute, United States
  6. Harvard University, United States

Abstract

Making the knowledge contained in scientific papers machine-readable and formally computable would allow researchers to take full advantage of this information by enabling integration with other knowledge sources to support data analysis and interpretation. Here we describe Biofactoid, a web-based platform that allows scientists to specify networks of interactions between genes, their products, and chemical compounds, and then translates this information into a representation suitable for computational analysis, search and discovery. We also report the results of a pilot study to encourage the wide adoption of Biofactoid by the scientific community.

Data availability

All Biofactoid data are available under the Creative Commons CC0 public domain license. To download the data and code, please refer to the documentation on the Biofactoid GitHub repository (github.com/PathwayCommons/factoid). More information on software availability is available in Materials and methods.

Article and author information

Author details

  1. Jeffrey V Wong

    The Donnelly Centre, University of Toronto, Toronto, Canada
    For correspondence
    jeffvin.wong@utoronto.ca
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8912-5699
  2. Max Franz

    The Donnelly Centre, University of Toronto, Toronto, Canada
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-0169-0480
  3. Metin Can Siper

    Computational Biology Program, Oregon Health and Science University, Oregon, United States
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7556-093X
  4. Dylan Fong

    The Donnelly Centre, University of Toronto, Toronto, Canada
    Competing interests
    The authors declare that no competing interests exist.
  5. Funda Durupinar

    Computer Science Department, University of Massachusetts Boston, Boston, United States
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-4915-6642
  6. Christian Dallago

    Department of Informatics, Technische Universität München, Garching, Germany
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-4650-6181
  7. Augustin Luna

    Department of Data Sciences, Dana-Farber Cancer Institute, Boston, United States
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-5709-371X
  8. John M Giorgi

    The Donnelly Centre, University of Toronto, Toronto, Canada
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9621-5046
  9. Igor Rodchenkov

    The Donnelly Centre, University of Toronto, Toronto, Canada
    Competing interests
    The authors declare that no competing interests exist.
  10. Özgün Babur

    Computer Science Department, University of Massachusetts Boston, Boston, United States
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0239-5259
  11. John A Bachman

    Laboratory of Systems Pharmacology, Harvard Medical School, Harvard University, Boston, United States
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6095-2466
  12. Benjamin Gyori

    Laboratory of Systems Pharmacology, Harvard Medical School, Harvard University, Boston, United States
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9439-5346
  13. Emek Demir

    Computational Biology Program, Oregon Health and Science University, Portland, United States
    For correspondence
    demire@ohsu.edu
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-3663-7113
  14. Gary D Bader

    The Donnelly Centre, University of Toronto, Toronto, Canada
    For correspondence
    gary.bader@utoronto.ca
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-0185-8861
  15. Chris Sander

    Department of Cell Biology, Harvard Medical School, Harvard University, Boston, United States
    For correspondence
    sander.research@gmail.com
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6059-6270

Funding

National Human Genome Research Institute (U41 HG006623)

  • Jeffrey V Wong
  • Max Franz
  • Metin Can Siper
  • Dylan Fong
  • Funda Durupinar
  • Christian Dallago
  • Augustin Luna
  • John M Giorgi
  • Igor Rodchenkov
  • Özgün Babur
  • Emek Demir
  • Gary D Bader
  • Chris Sander

National Human Genome Research Institute (U41 HG003751)

  • Jeffrey V Wong
  • Max Franz
  • Metin Can Siper
  • Dylan Fong
  • Funda Durupinar
  • Christian Dallago
  • Augustin Luna
  • John M Giorgi
  • Igor Rodchenkov
  • Özgün Babur
  • Emek Demir
  • Gary D Bader
  • Chris Sander

National Human Genome Research Institute (R01 HG009979)

  • Max Franz
  • Gary D Bader

National Institute of General Medical Sciences (P41 GM103504)

  • Jeffrey V Wong
  • Max Franz
  • Metin Can Siper
  • Dylan Fong
  • Funda Durupinar
  • Christian Dallago
  • Augustin Luna
  • John M Giorgi
  • Igor Rodchenkov
  • Özgün Babur
  • Emek Demir
  • Gary D Bader
  • Chris Sander

Defense Advanced Research Projects Agency (Big Mechanism,ARO W911NF-14-C-0119)

  • Metin Can Siper
  • Funda Durupinar
  • Özgün Babur
  • John A Bachman
  • Benjamin Gyori
  • Emek Demir

Defense Advanced Research Projects Agency (Communicating with Computers,ARO W911NF-15-1-054)

  • Metin Can Siper
  • Funda Durupinar
  • Özgün Babur
  • John A Bachman
  • Benjamin Gyori
  • Emek Demir

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Ethics

Human subjects: Participants of user testing provided written consent to volunteer, have their testing sessions recorded and have quotes obtained in the session published.

Reviewing Editor

  1. Helena Pérez Valle, eLife, United Kingdom

Publication history

  1. Preprint posted: March 11, 2021 (view preprint)
  2. Received: March 11, 2021
  3. Accepted: December 2, 2021
  4. Accepted Manuscript published: December 3, 2021 (version 1)
  5. Version of Record published: December 17, 2021 (version 2)

Copyright

© 2021, Wong et al.

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 821
    Page views
  • 59
    Downloads
  • 3
    Citations

Article citation count generated by polling the highest count across the following sources: PubMed Central, Crossref, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Jeffrey V Wong
  2. Max Franz
  3. Metin Can Siper
  4. Dylan Fong
  5. Funda Durupinar
  6. Christian Dallago
  7. Augustin Luna
  8. John M Giorgi
  9. Igor Rodchenkov
  10. Özgün Babur
  11. John A Bachman
  12. Benjamin Gyori
  13. Emek Demir
  14. Gary D Bader
  15. Chris Sander
(2021)
Author-sourced capture of pathway knowledge in computable form using Biofactoid
eLife 10:e68292.
https://doi.org/10.7554/eLife.68292

Further reading

    1. Computational and Systems Biology
    Shunpei Yamauchi, Takashi Nozoe ... Yuichi Wakamoto
    Research Article

    Intracellular states probed by gene expression profiles and metabolic activities are intrinsically noisy, causing phenotypic variations among cellular lineages. Understanding the adaptive and evolutionary roles of such variations requires clarifying their linkage to population growth rates. Extending a cell lineage statistics framework, here we show that a population’s growth rate can be expanded by the cumulants of a fitness landscape that characterize how fitness distributes in a population. The expansion enables quantifying the contribution of each cumulant, such as variance and skewness, to population growth. We introduce a function that contains all the essential information of cell lineage statistics, including mean lineage fitness and selection strength. We reveal a relation between fitness heterogeneity and population growth rate response to perturbation. We apply the framework to experimental cell lineage data from bacteria to mammalian cells, revealing distinct levels of growth rate gain from fitness heterogeneity across environments and organisms. Furthermore, third or higher order cumulants’ contributions are negligible under constant growth conditions but could be significant in regrowing processes from growth-arrested conditions. We identify cellular populations in which selection leads to an increase of fitness variance among lineages in retrospective statistics compared to chronological statistics. The framework assumes no particular growth models or environmental conditions, and is thus applicable to various biological phenomena for which phenotypic heterogeneity and cellular proliferation are important.

    1. Computational and Systems Biology
    Jeffrey Molendijk, Ronnie Blazev ... Benjamin L Parker
    Research Article

    Improving muscle function has great potential to improve the quality of life. To identify novel regulators of skeletal muscle metabolism and function, we performed a proteomic analysis of gastrocnemius muscle from 73 genetically distinct inbred mouse strains, and integrated the data with previously acquired genomics and >300 molecular/phenotypic traits via quantitative trait loci mapping and correlation network analysis. These data identified thousands of associations between protein abundance and phenotypes and can be accessed online (https://muscle.coffeeprot.com/) to identify regulators of muscle function. We used this resource to prioritize targets for a functional genomic screen in human bioengineered skeletal muscle. This identified several negative regulators of muscle function including UFC1, an E2 ligase for protein UFMylation. We show UFMylation is up-regulated in a mouse model of amyotrophic lateral sclerosis, a disease that involves muscle atrophy. Furthermore, in vivo knockdown of UFMylation increased contraction force, implicating its role as a negative regulator of skeletal muscle function.