Cutting Edge: Anatomy of BioJS, an open source community for the life sciences

  1. Guy Yachdav  Is a corresponding author
  2. Tatyana Goldberg
  3. Sebastian Wilzbach
  4. David Dao
  5. Iris Shih
  6. Saket Choudhary
  7. Steve Crouch
  8. Max Franz
  9. Alexander García
  10. Leyla J García
  11. Björn A Grüning
  12. Devasena Inupakutika
  13. Ian Sillitoe
  14. Anil S Thanki
  15. Bruno Vieira
  16. José M Villaveces
  17. Maria V Schneider
  18. Suzanna Lewis
  19. Steve Pettifer
  20. Burkhard Rost
  21. Manuel Corpas  Is a corresponding author
  1. Technische Universität München, Germany
  2. University of Southern California, United States
  3. University of Southampton, United Kingdom
  4. University of Toronto, Canada
  5. Linkingdata I/O LLC, United States
  6. European Molecular Biology Laboratory-European Bioinformatics Institute, United Kingdom
  7. University of Freiburg, Germany
  8. University College London, United Kingdom
  9. The Genome Analysis Centre, United Kingdom
  10. Queen Mary University of London, United Kingdom
  11. Max Planck Institute of Biochemistry, Germany
  12. Lawrence Berkeley National Laboratory, United States
  13. The University of Manchester, United Kingdom

Abstract

BioJS is an open source software project that develops visualization tools for different types of biological data. Here we report on the factors that influenced the growth of the BioJS user and developer community, and outline our strategy for building on this growth. The lessons we have learned on BioJS may also be relevant to other open source software projects.

https://doi.org/10.7554/eLife.07009.001

Introduction

BioJavaScript (BioJS; http://biojs.net/) was set up to meet a need for an open source library of reusable components to visualize and analyse biological data on the web (Figure 1; Corpas et al., 2014). These components are discrete modules that can be reused, extended and combined to meet a particular visualization need. Unlike proprietary (or closed-source) systems, which are typically distributed as ‘executable’ files under restrictive licenses, open source software projects make their source code freely available under a permissive license (Millington, 2012; Balch, et al., 2015). This allows other users to modify, extend and re-distribute the software with few restrictions and at no cost to other users.

Examples of BioJS tools.

Tree Viewer (visualization of phylogeny data in a tree-like graph); MSA Viewer (visualization and analysis of multiple sequence alignments); Proteome (multilevel visualization of proteomes in UniProt; The UniProt Consortium, 2015); 3D structures (visualization of protein structures); Dot-bracket (visualization of RNA secondary structures); Muts-needle plot (presentation of mutation distribution across protein sequences). Protein Feature Viewer (visualization of position-based annotations in protein sequences); Plasmids (visualization of DNA plasmids); Pathway visualization (visualization of data from Pathway Commons; Cerami et al., 2011). Note that all visualization tools are native to the browser and do not require any specialized software (such as Adobe flash, Java Virtual Machine or Microsoft Silverlight) to be installed or loaded.

https://doi.org/10.7554/eLife.07009.002

However, the fact that the community is allowed and encouraged to contribute to an open source software project is no guarantee that they will contribute. It is certainly not a sufficient condition for ensuring sustainability: repositories of open source software such as GitHub (https://github.com) and SourceForge (http://sourceforge.net) are littered with abandoned projects that have failed to gain the support needed to survive beyond the originator's initial enthusiasm and/or funding.

In the case of BioJS we were acutely aware of the need to gain buy-in from the community for two reasons:

  1. To fulfil the vision of a suite of tools capable of displaying diverse biological data requires expertise and capacity that is well beyond that of any individual group of developers working in isolation.

  2. To encourage users to spend time integrating BioJS tools into their websites and applications, the project has to have, and be seen to have, a potential lifespan far beyond that of its initial funding.

Predicting the success of projects is hard, and there is nothing in the nature of open source initiatives that makes this any easier. There have been attempts to quantify success in open source software (see, e.g., the metrics proposed by Crowston et al., 2003), but these have gained little traction. More useful, we feel, have been articles that provide pragmatic advice to would-be open software developers, such as ‘Ten simple rules for the open development of scientific software’ (Prlic and Procter, 2012). Rather than repeating or re-working the general-purpose recommendations of others, here we describe what we have learned from the BioJS project.

Project evolution

BioJS was initially developed in 2012 through a collaboration between the European Bioinformatics Institute (EMBL-EBI) and The Genome Analysis Centre (TGAC). The project began as a set of individual graphical components deposited in a bespoke online registry. Since then it has expanded into a community of 41 code contributors spread across four continents, a Google Group forum with more than 150 members (https://groups.google.com/forum/#!forum/biojs), and 15 published papers (as of May 2015). The project's first paper, published in 2013 (Gomez, et al., 2013), has been cited 31 times to date.

The BioJS registry (http://biojs.io) offers a modern platform for fast and customizable access to components. The registry provides a centralized resource where deposited components can be discovered, tested and downloaded at the push of a button. BioJS components are organized as packages. This modular approach reduces duplication of effort and has been used by other projects, such as BioGem (Bonnal, et al., 2012). Several recognized projects and institutions have already shown commitment to BioJS by utilizing and developing components: examples of this include PredictProtein (Yachdav et al., 2014), CATH (Sillitoe et al., 2013), Genome3D (Lewis et al., 2013), Reactome (Croft et al., 2014), Expression Atlas (Petryszak et al., 2014), Ensembl (Cunningham et al., 2015), InterMine (Smith et al., 2012), PolyMarker (Ramirez-Gonzalez et al., 2015) and the TGAC Browser (http://browser.tgac.ac.uk).

Building and growing an open source community

In this section we discuss the factors that, we believe, have played an important role in the growth of the BioJS open source community.

Clear mission and vision statements for the project

A vision is an ambitious aim that clearly states the value that the project adds. BioJS's vision is that ‘every online biological dataset in the world should be visualized with BioJS tools’. This vision statement communicates the way that BioJS aims to ‘change the world’. The members of our community are energized and motivated to contribute to this worthy cause. We also have a mission statement that describes what we do in order to achieve the our vision: ‘we develop an open-source library of JavaScript components to visualize biological data’. Whereas the vision is broad, ambitious and overarching, the mission statement is more practical and actionable. Having both a mission statement and a vision helps the BioJS community define what to do and who we are.

Aligning interests with the wider community

BioJS aims to create the largest, most comprehensive repository of JavaScript-based tools that visualize online biological data. Many research groups that generate and make their data available will benefit if we are successful. Groups developing new components will now find that a host of core JavaScript functionality (I/O, parsers, basic visualization) is already available as part of the BioJS library. This creates an ecosystem where new contributors can plug into and extend existing components according to their needs and preferences. Furthermore, developers can leverage the visibility of the BioJS registry to increase the exposure of their own tools, and attract community adoption and feedback. Creating incentives for developers to add their tools into BioJS aligns the interest of many groups with our own project. This in turn has many benefits: (1) group leaders are more inclined to expend scarce resources that advance both their own and BioJS's goals; (2) the BioJS community grows as more participants find that their own goals are aligned with those of BioJS; and (3) some people who join the project may become long-term participants or even become evangelists who promote the project to others. Aligning interests with the community is crucial to grow contribution for projects that lack funding: indeed, the development of 27 components was done by groups that are not directly affiliated with BioJS, but who appreciated the value of contributing to BioJS and decided to integrate the components they had developed into the registry.

Low technological barriers

Shared interests and goals may not be enough if contributing to a project involves significant and costly technical effort, so we have designed BioJS such that a potential contributor is faced with a relatively small set of technical requirements: s/he needs to know JavaScript, to ensure that their component conforms to the NPM package manager (https://www.npmjs.com), and to follow some simple naming conventions; contributors are not required to understand the core system. Moreover, multiple parallel contributions can be worked on at once, eliminating cross-development dependencies and affording contributors independence in creating their own components.

Detailed and complete documentation and tutorials

The educational section of the BioJS website (http://edu.biojs.net) includes a comprehensive tutorial tailored for different types and levels of developers. Newcomers will find that the ‘getting started’ section sheds light on what the project is about and how core concepts (such as packaging) help to achieve modularity. Tutorials offer a step-by-step set of instructions on how to create a new BioJS component and detail the two-step process for publishing components. Contributors can then ‘graduate’ to the more advanced and detailed sections of the tutorial to explore the nuances of BioJS packages, components, the registry and the JavaScript technologies development stack. Good documentation is crucial for an open source project such as BioJS as it helps to ‘flatten’ the learning curve and thus promote participation, contribution and ultimately adoption.

An open, fair and inclusive recognition strategy

BioJS's recognition strategy aims to be as inclusive and fair as possible. In 2013 we published an ‘Application Note’ in Bioinformatics including all contributing members, and in 2014 we published a collection of 14 articles in F1000RESEARCH that provided an update on the project at the time (Corpas, et al., 2014). By allowing publication of independent BioJS components as separate ‘Application Note’-type articles, we have enabled contributors who spend most of their time on voluntary basis to claim ownership and gain recognition for the work they do.

Prompt and unfailingly positive communication

We have created several channels of communication to respond to needs of different project stakeholders. Our Gitter channel (https://gitter.im/biojs), which currently includes 47 members, is a public chat-room that is directly linked to our GitHub repository. Developers use the Gitter channel to announce new commits, bug fixes, patches and forks. Questions and answers about source code are posted to the room on a daily basis and, as a matter of principle, the core development team attempts to provide immediate responses on the Gitter channel. A timely response is crucial for keeping volunteers engaged and demonstrating the vitality of the project. The positive tone and friendly conduct on the chat room and mailing list also exemplify the inclusive ethos of the BioJS community.

Early decentralization of responsibilities

Since its early days the BioJS community has adopted a culture that encourages and supports members who wish to assume direct responsibility for parts of the project. As a result, we do not have one central ‘benevolent dictator’, but rather a core group of motivated and committed community members.

Promotion of the project through outreach and education

Time spent on promoting, evangelizing and networking is one of the most fruitful investments in the community. We regularly schedule tutorials, talks and meetings at events such as VizBi, ECCB, ISMB and its satellite meeting BOSC, as these are natural venues to connect with our target audience. BioJS first organized a half-day tutorial at the 2014 VizBi conference (held in Heidelberg, Germany) where 12 participants got hands-on experience in creating and adding new components to the registry. Later that year five coders travelled to Munich, Germany, to participate in the first ever BioJS hackathon, an event that gave birth to BioJS 2.0. Other hackathons, workshops and tutorials have been held in different venues across Europe and the US, aimed at increasing the visibility of the project and promoting contribution. During the winter semester of 2015 the Technical University of Munich ran an undergraduate level biological data visualization course in which 24 students were introduced to BioJS and contributed 11 components to the registry.

Being ‘cool’ by joining community development programs

While many in science would prefer to just focus on traditional research activities, we discovered that joining high-profile programs, such as the Google Summer of Code (GSoC) program, was a highly effective way to advance the project goals. Such programs offer outstanding opportunities to interact with the best and the brightest students. The GSoC program offers student developers stipends to write code for various open source software projects, and BioJS received sponsorship for five GSoC internship positions in 2014. The impact of entering this program was immediate: we saw a sharp increase in the number of subscriptions to our community mailing list (from ∼40 members to nearly 100) in less than a month. The impact is also long lasting as we now have five peer-reviewed articles underway, describing projects that broaden the scope of the BioJS library and make it more useful for our community. Moreover, two of the students from the GSoC program (SW, DD) have now become members of the core group and, led by SW, have themselves orchestrated a series of hackathons in which BioJS was transitioned to version 2.0. The PhyloWidget project has also benefitted from the GSoC program (Jordan and Piel, 2008).

The BioJS project has also received support from the UK Software Sustainability Institute (http://www.software.ac.uk/open-call) to assess and help improve BioJS (by, e.g., evaluating documentation, code maintainability and the ease of development).

Comparable approaches from other open-source communities

None of the points raised above are unique to BioJS as there are a number of other open source software projects in the life sciences. For instance, Cytoscape.js (http://js.cytoscape.org), an open source community developing a web-based viewer for the Cytoscape network viewer (Shannon, et al., 2003), is also mostly decentralized, being organized through some simple rules and with a community of contributors from both the public and private sectors. These contributors add new features to the library and discuss new feature ideas and bug reports. However, code contributions (i.e., pull requests) do require centralized review before merging into the main codebase.

The Galaxy project (Giardine, et al., 2005; Blankenberg, et al., 2010; Goecks, et al., 2010) is making advanced bioinformatics software accessible to biologists by directly providing an intuitive web interface to these applications. The Galaxy software enables the reproduction of computational research workflows by capturing and organizing the pipeline's layout and components into re-runnable protocols. The project also makes use of public collaboration tools to coordinate code contributions with the development community, and it maintains a dedicated portal (https://biostar.usegalaxy.org/) to communicate and support its user base. Like BioJS, the Galaxy core framework can be easily extended by plugins of all kinds, and the Galaxy core team offers the Galaxy ToolShed to distribute and share all these extensions in an App-Store like fashion. The annual Galaxy Community Conference (GCC) also attracts more than 200 participants and offers a variety of workshops and a hackathon.

The Bionode project (http://www.bionode.io) aims to provide modular, scalable and highly reusable tools for bioinformatics analysis. The community that surrounds this project is mainly composed of bioinformaticians and JavaScript (Node.js) developers who dedicate their time either as a hobby or because they need to add a feature (or fix a bug) for their own projects. Bionode is very open to any kind of contribution and does not enforce too many rules or governance to the community.

Conclusions

Life science open source communities have their own special peculiarities and motivations. We have presented a compilation of experiences that, in our view, have helped BioJS to become a robust international project within a relatively short period of time. Apart from fostering the right skills and technical experience to develop BioJS, we have established formal means to reward contributors, to keep the community members motivated, and to increase our impact. If used with care and fairness, recognition can be a powerful lever with which to build communities. We believe that some of the lessons we have learned and the practices we have established will help others to build similarly robust open source projects and communities.

References

    1. Crowston K
    2. Annabi H
    3. Howison J
    (2003)
    Proceedings of the International Conference on Information Systems
    Proceedings of the International Conference on Information Systems.

Article and author information

Author details

  1. Guy Yachdav

    Bioinformatik, Technische Universität München, Garching, Germany
    Contribution
    GY, Conception and design, Drafting or revising the article
    For correspondence
    gyachdav@bio-sof.com
    Competing interests
    The authors declare that no competing interests exist.
  2. Tatyana Goldberg

    Bioinformatik, Technische Universität München, Garching, Germany
    Contribution
    TG, Conception and design, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  3. Sebastian Wilzbach

    Bioinformatik, Technische Universität München, Garching, Germany
    Contribution
    SW, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
  4. David Dao

    Bioinformatik, Technische Universität München, Garching, Germany
    Contribution
    DD, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
  5. Iris Shih

    Bioinformatik, Technische Universität München, Garching, Germany
    Contribution
    IS, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
  6. Saket Choudhary

    Molecular and Computational Biology, University of Southern California, Los Angeles, United States
    Contribution
    SC, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-5202-7633
  7. Steve Crouch

    Web and Internet Science, University of Southampton, Southampton, United Kingdom
    Contribution
    SC, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  8. Max Franz

    Banting and Best Department of Medical Research, Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Canada
    Contribution
    MF, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  9. Alexander García

    Linkingdata I/O LLC, Austin, United States
    Contribution
    AG, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  10. Leyla J García

    European Molecular Biology Laboratory-European Bioinformatics Institute, Cambridge, United Kingdom
    Contribution
    LJG, Drafting or revising the article, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
  11. Björn A Grüning

    Bioinformatics Group, Department of Computer Science and Centre for Biological Systems Analysis, University of Freiburg, Freiburg, Germany
    Contribution
    BAG, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-3079-6586
  12. Devasena Inupakutika

    Web and Internet Science, University of Southampton, Southampton, United Kingdom
    Contribution
    DI, Drafting or revising the article, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
  13. Ian Sillitoe

    Institute of Structure and Molecular Biology, University College London, London, United Kingdom
    Contribution
    IS, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
  14. Anil S Thanki

    The Genome Analysis Centre, Norwich, United Kingdom
    Contribution
    AST, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
  15. Bruno Vieira

    School of Biological and Chemical Sciences, Queen Mary University of London, London, United Kingdom
    Contribution
    BV, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-4878-6431
  16. José M Villaveces

    Max Planck Institute of Biochemistry, Planegg, Germany
    Contribution
    JMV, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
  17. Maria V Schneider

    The Genome Analysis Centre, Norwich, United Kingdom
    Contribution
    MVS, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
  18. Suzanna Lewis

    Lawrence Berkeley National Laboratory, Berkeley, United States
    Contribution
    SL, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
  19. Steve Pettifer

    School of Computer Science, The University of Manchester, Manchester, United Kingdom
    Contribution
    SP, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  20. Burkhard Rost

    Bioinformatik, Technische Universität München, Garching, Germany
    Contribution
    BR, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  21. Manuel Corpas

    The Genome Analysis Centre, Norwich, United Kingdom
    Contribution
    MC, Conception and design, Drafting or revising the article
    For correspondence
    manuel.corpas@tgac.ac.uk
    Competing interests
    The authors declare that no competing interests exist.

Funding

Biotechnology and Biological Sciences Research Council (BBSRC)

  • Manuel Corpas
  • Anil S Thanki
  • Maria V Schneider

Google

  • Sebastian Wilzbach
  • David Dao

Engineering and Physical Sciences Research Council (EPSRC) (EP/H043160/1)

  • Steve Crouch
  • Devasena Inupakutika

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Henning Hermjakob and Rafael Jimenez, who founded and supported BioJS through its early days, and John Gómez, who developed the first release of the BioJS library. We also thank Ian Mulvany for helping us to define and refine our vision.

Publication history

  1. Received: February 14, 2015
  2. Accepted: June 20, 2015
  3. Version of Record published: July 8, 2015 (version 1)

Copyright

This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

Metrics

  • 3,442
    Page views
  • 306
    Downloads
  • 18
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Guy Yachdav
  2. Tatyana Goldberg
  3. Sebastian Wilzbach
  4. David Dao
  5. Iris Shih
  6. Saket Choudhary
  7. Steve Crouch
  8. Max Franz
  9. Alexander García
  10. Leyla J García
  11. Björn A Grüning
  12. Devasena Inupakutika
  13. Ian Sillitoe
  14. Anil S Thanki
  15. Bruno Vieira
  16. José M Villaveces
  17. Maria V Schneider
  18. Suzanna Lewis
  19. Steve Pettifer
  20. Burkhard Rost
  21. Manuel Corpas
(2015)
Cutting Edge: Anatomy of BioJS, an open source community for the life sciences
eLife 4:e07009.
https://doi.org/10.7554/eLife.07009
  1. Further reading

Further reading

    1. Cancer Biology
    2. Computational and Systems Biology
    Bingrui Li, Fernanda G Kugeratski, Raghu Kalluri
    Research Article

    Non-invasive early cancer diagnosis remains challenging due to the low sensitivity and specificity of current diagnostic approaches. Exosomes are membrane-bound nanovesicles secreted by all cells that contain DNA, RNA, and proteins that are representative of the parent cells. This property, along with the abundance of exosomes in biological fluids makes them compelling candidates as biomarkers. However, a rapid and flexible exosome-based diagnostic method to distinguish human cancers across cancer types in diverse biological fluids is yet to be defined. Here, we describe a novel machine learning-based computational method to distinguish cancers using a panel of proteins associated with exosomes. Employing datasets of exosome proteins from human cell lines, tissue, plasma, serum, and urine samples from a variety of cancers, we identify Clathrin Heavy Chain (CLTC), Ezrin, (EZR), Talin-1 (TLN1), Adenylyl cyclase-associated protein 1 (CAP1), and Moesin (MSN) as highly abundant universal biomarkers for exosomes and define three panels of pan-cancer exosome proteins that distinguish cancer exosomes from other exosomes and aid in classifying cancer subtypes employing random forest models. All the models using proteins from plasma, serum, or urine-derived exosomes yield AUROC scores higher than 0.91 and demonstrate superior performance compared to Support Vector Machine, K Nearest Neighbor Classifier and Gaussian Naive Bayes. This study provides a reliable protein biomarker signature associated with cancer exosomes with scalable machine learning capability for a sensitive and specific non-invasive method of cancer diagnosis.

    1. Computational and Systems Biology
    2. Immunology and Inflammation
    Alain Pulfer, Diego Ulisse Pizzagalli ... Santiago Fernandez Gonzalez
    Tools and Resources

    Intravital microscopy has revolutionized live-cell imaging by allowing the study of spatial–temporal cell dynamics in living animals. However, the complexity of the data generated by this technology has limited the development of effective computational tools to identify and quantify cell processes. Amongst them, apoptosis is a crucial form of regulated cell death involved in tissue homeostasis and host defense. Live-cell imaging enabled the study of apoptosis at the cellular level, enhancing our understanding of its spatial–temporal regulation. However, at present, no computational method can deliver robust detection of apoptosis in microscopy timelapses. To overcome this limitation, we developed ADeS, a deep learning-based apoptosis detection system that employs the principle of activity recognition. We trained ADeS on extensive datasets containing more than 10,000 apoptotic instances collected both in vitro and in vivo, achieving a classification accuracy above 98% and outperforming state-of-the-art solutions. ADeS is the first method capable of detecting the location and duration of multiple apoptotic events in full microscopy timelapses, surpassing human performance in the same task. We demonstrated the effectiveness and robustness of ADeS across various imaging modalities, cell types, and staining techniques. Finally, we employed ADeS to quantify cell survival in vitro and tissue damage in mice, demonstrating its potential application in toxicity assays, treatment evaluation, and inflammatory dynamics. Our findings suggest that ADeS is a valuable tool for the accurate detection and quantification of apoptosis in live-cell imaging and, in particular, intravital microscopy data, providing insights into the complex spatial–temporal regulation of this process.