BioJS is an open source software project that develops visualization tools for different types of biological data. Here we report on the factors that influenced the growth of the BioJS user and developer community, and outline our strategy for building on this growth. The lessons we have learned on BioJS may also be relevant to other open source software projects.https://doi.org/10.7554/eLife.07009.001
However, the fact that the community is allowed and encouraged to contribute to an open source software project is no guarantee that they will contribute. It is certainly not a sufficient condition for ensuring sustainability: repositories of open source software such as GitHub (https://github.com) and SourceForge (http://sourceforge.net) are littered with abandoned projects that have failed to gain the support needed to survive beyond the originator's initial enthusiasm and/or funding.
In the case of BioJS we were acutely aware of the need to gain buy-in from the community for two reasons:
To fulfil the vision of a suite of tools capable of displaying diverse biological data requires expertise and capacity that is well beyond that of any individual group of developers working in isolation.
To encourage users to spend time integrating BioJS tools into their websites and applications, the project has to have, and be seen to have, a potential lifespan far beyond that of its initial funding.
Predicting the success of projects is hard, and there is nothing in the nature of open source initiatives that makes this any easier. There have been attempts to quantify success in open source software (see, e.g., the metrics proposed by Crowston et al., 2003), but these have gained little traction. More useful, we feel, have been articles that provide pragmatic advice to would-be open software developers, such as ‘Ten simple rules for the open development of scientific software’ (Prlic and Procter, 2012). Rather than repeating or re-working the general-purpose recommendations of others, here we describe what we have learned from the BioJS project.
BioJS was initially developed in 2012 through a collaboration between the European Bioinformatics Institute (EMBL-EBI) and The Genome Analysis Centre (TGAC). The project began as a set of individual graphical components deposited in a bespoke online registry. Since then it has expanded into a community of 41 code contributors spread across four continents, a Google Group forum with more than 150 members (https://groups.google.com/forum/#!forum/biojs), and 15 published papers (as of May 2015). The project's first paper, published in 2013 (Gomez, et al., 2013), has been cited 31 times to date.
The BioJS registry (http://biojs.io) offers a modern platform for fast and customizable access to components. The registry provides a centralized resource where deposited components can be discovered, tested and downloaded at the push of a button. BioJS components are organized as packages. This modular approach reduces duplication of effort and has been used by other projects, such as BioGem (Bonnal, et al., 2012). Several recognized projects and institutions have already shown commitment to BioJS by utilizing and developing components: examples of this include PredictProtein (Yachdav et al., 2014), CATH (Sillitoe et al., 2013), Genome3D (Lewis et al., 2013), Reactome (Croft et al., 2014), Expression Atlas (Petryszak et al., 2014), Ensembl (Cunningham et al., 2015), InterMine (Smith et al., 2012), PolyMarker (Ramirez-Gonzalez et al., 2015) and the TGAC Browser (http://browser.tgac.ac.uk).
In this section we discuss the factors that, we believe, have played an important role in the growth of the BioJS open source community.
BioJS's recognition strategy aims to be as inclusive and fair as possible. In 2013 we published an ‘Application Note’ in Bioinformatics including all contributing members, and in 2014 we published a collection of 14 articles in F1000RESEARCH that provided an update on the project at the time (Corpas, et al., 2014). By allowing publication of independent BioJS components as separate ‘Application Note’-type articles, we have enabled contributors who spend most of their time on voluntary basis to claim ownership and gain recognition for the work they do.
We have created several channels of communication to respond to needs of different project stakeholders. Our Gitter channel (https://gitter.im/biojs), which currently includes 47 members, is a public chat-room that is directly linked to our GitHub repository. Developers use the Gitter channel to announce new commits, bug fixes, patches and forks. Questions and answers about source code are posted to the room on a daily basis and, as a matter of principle, the core development team attempts to provide immediate responses on the Gitter channel. A timely response is crucial for keeping volunteers engaged and demonstrating the vitality of the project. The positive tone and friendly conduct on the chat room and mailing list also exemplify the inclusive ethos of the BioJS community.
Since its early days the BioJS community has adopted a culture that encourages and supports members who wish to assume direct responsibility for parts of the project. As a result, we do not have one central ‘benevolent dictator’, but rather a core group of motivated and committed community members.
Time spent on promoting, evangelizing and networking is one of the most fruitful investments in the community. We regularly schedule tutorials, talks and meetings at events such as VizBi, ECCB, ISMB and its satellite meeting BOSC, as these are natural venues to connect with our target audience. BioJS first organized a half-day tutorial at the 2014 VizBi conference (held in Heidelberg, Germany) where 12 participants got hands-on experience in creating and adding new components to the registry. Later that year five coders travelled to Munich, Germany, to participate in the first ever BioJS hackathon, an event that gave birth to BioJS 2.0. Other hackathons, workshops and tutorials have been held in different venues across Europe and the US, aimed at increasing the visibility of the project and promoting contribution. During the winter semester of 2015 the Technical University of Munich ran an undergraduate level biological data visualization course in which 24 students were introduced to BioJS and contributed 11 components to the registry.
While many in science would prefer to just focus on traditional research activities, we discovered that joining high-profile programs, such as the Google Summer of Code (GSoC) program, was a highly effective way to advance the project goals. Such programs offer outstanding opportunities to interact with the best and the brightest students. The GSoC program offers student developers stipends to write code for various open source software projects, and BioJS received sponsorship for five GSoC internship positions in 2014. The impact of entering this program was immediate: we saw a sharp increase in the number of subscriptions to our community mailing list (from ∼40 members to nearly 100) in less than a month. The impact is also long lasting as we now have five peer-reviewed articles underway, describing projects that broaden the scope of the BioJS library and make it more useful for our community. Moreover, two of the students from the GSoC program (SW, DD) have now become members of the core group and, led by SW, have themselves orchestrated a series of hackathons in which BioJS was transitioned to version 2.0. The PhyloWidget project has also benefitted from the GSoC program (Jordan and Piel, 2008).
The BioJS project has also received support from the UK Software Sustainability Institute (http://www.software.ac.uk/open-call) to assess and help improve BioJS (by, e.g., evaluating documentation, code maintainability and the ease of development).
None of the points raised above are unique to BioJS as there are a number of other open source software projects in the life sciences. For instance, Cytoscape.js (http://js.cytoscape.org), an open source community developing a web-based viewer for the Cytoscape network viewer (Shannon, et al., 2003), is also mostly decentralized, being organized through some simple rules and with a community of contributors from both the public and private sectors. These contributors add new features to the library and discuss new feature ideas and bug reports. However, code contributions (i.e., pull requests) do require centralized review before merging into the main codebase.
The Galaxy project (Giardine, et al., 2005; Blankenberg, et al., 2010; Goecks, et al., 2010) is making advanced bioinformatics software accessible to biologists by directly providing an intuitive web interface to these applications. The Galaxy software enables the reproduction of computational research workflows by capturing and organizing the pipeline's layout and components into re-runnable protocols. The project also makes use of public collaboration tools to coordinate code contributions with the development community, and it maintains a dedicated portal (https://biostar.usegalaxy.org/) to communicate and support its user base. Like BioJS, the Galaxy core framework can be easily extended by plugins of all kinds, and the Galaxy core team offers the Galaxy ToolShed to distribute and share all these extensions in an App-Store like fashion. The annual Galaxy Community Conference (GCC) also attracts more than 200 participants and offers a variety of workshops and a hackathon.
Life science open source communities have their own special peculiarities and motivations. We have presented a compilation of experiences that, in our view, have helped BioJS to become a robust international project within a relatively short period of time. Apart from fostering the right skills and technical experience to develop BioJS, we have established formal means to reward contributors, to keep the community members motivated, and to increase our impact. If used with care and fairness, recognition can be a powerful lever with which to build communities. We believe that some of the lessons we have learned and the practices we have established will help others to build similarly robust open source projects and communities.
Galaxy: a web-based genome analysis tool for experimentalistsCurrent Protocols in Molecular Biology 19:19–21.https://doi.org/10.1002/0471142727.mb1910s89
Pathway commons, a web resource for biological pathway dataNucleic Acids Research 39:D685–D690.https://doi.org/10.1093/nar/gkq1039
Proceedings of the International Conference on Information SystemsProceedings of the International Conference on Information Systems.
Galaxy: a platform for interactive large-scale genome analysisGenome Research 15:1451–1455.https://doi.org/10.1101/gr.4086505
PhyloWidget: web-based visualizations for the tree of lifeBioinformatics 24:1641–1642.https://doi.org/10.1093/bioinformatics/btn235
Buzzing communities: how to build bigger, better, and more active online communitiesAvailable at: www.FeverBee.com.
Ten simple rules for the open development of scientific softwarePLOS Computational Biology 8:e1002802.https://doi.org/10.1371/journal.pcbi.1002802
PredictProtein–an open resource for online prediction of protein structural and functional featuresNucleic Acids Research 42:W337–W343.https://doi.org/10.1093/nar/gku366
- Manuel Corpas
- Anil S Thanki
- Maria V Schneider
- Sebastian Wilzbach
- David Dao
- Steve Crouch
- Devasena Inupakutika
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
We thank Henning Hermjakob and Rafael Jimenez, who founded and supported BioJS through its early days, and John Gómez, who developed the first release of the BioJS library. We also thank Ian Mulvany for helping us to define and refine our vision.
- Received: February 14, 2015
- Accepted: June 20, 2015
- Version of Record published: July 8, 2015 (version 1)
This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.
Downloads (link to download the article as PDF)
Download citations (links to download the citations from this article in formats compatible with various reference manager tools)
Open citations (links to open the citations from this article in various online reference manager services)
Sepsis is not a monolithic disease, but a loose collection of symptoms with diverse outcomes. Thus, stratification and subtyping of sepsis patients is of great importance. We examine the temporal evolution of patient state using our previously-published method for computing risk of transition from sepsis into septic shock. Risk trajectories diverge into four clusters following early prediction of septic shock, stratifying by outcome: the highest-risk and lowest-risk groups have a 76.5% and 10.4% prevalence of septic shock, and 43% and 18% mortality, respectively. These clusters differ also in treatments received and median time to shock onset. Analyses reveal the existence of a rapid (30–60 min) transition in risk at the time of threshold crossing. We hypothesize that this transition occurs as a result of the failure of compensatory biological systems to cope with infection, resulting in a bifurcation of low to high risk. Such a collapse, we believe, represents the true onset of septic shock. Thus, this rapid elevation in risk represents a potential new data-driven definition of septic shock.
Predicting gene expression from DNA sequence remains a major goal in the field of gene regulation. A challenge to this goal is the connectivity of the network, whose role in altering gene expression remains unclear. Here, we study a common autoregulatory network motif, the negative single-input module, to explore the regulatory properties inherited from the motif. Using stochastic simulations and a synthetic biology approach in E. coli, we find that the TF gene and its target genes have inherent asymmetry in regulation, even when their promoters are identical; the TF gene being more repressed than its targets. The magnitude of asymmetry depends on network features such as network size and TF-binding affinities. Intriguingly, asymmetry disappears when the growth rate is too fast or too slow and is most significant for typical growth conditions. These results highlight the importance of accounting for network architecture in quantitative models of gene expression.