Announcing the next phase of Executable Research Articles

Views 667
Annotations

By Giulia Guizzardi, Nokome Bentley, and Giuliano Maciocci

Over the past three years, Stencila has partnered with eLife to take the Executable Research Articles project from ideation, to demonstration and, in the past 6 months, into production. As the opportunities provided by this new open format for executable research publishing have become clear, the project has received substantial interest from eLife authors, the broader research community, and other publishers. There are currently six published ERA articles, with several more in the pipeline.

Today, we are excited to announce work has started on the next steps in the development of ERA, with a focus on further reducing barriers to the authoring and publishing of executable research, based on the feedback and lessons learned from the community. Read on to learn about the exciting developments in executable research publishing coming throughout 2021 and beyond.

Offline conversion, execution and previewing of ERAs

Early ERA authors have expressed a keen interest in a local command line functionality. Other users on Stencila Hub, ERA’s workflow and project management portal, are finding that they have to repeatedly upload new versions of their articles whenever they need to deploy an edit or a fix.

The plan for the next phase of ERA development is to provide binary executables, for Windows, MacOS and Linux, of both a Stencila command line interface (CLI) and Desktop application which would supersede the previous, now deprecated, CLI and Desktop tools.

A single binary, developed in the main Stencila repository, will act as an “umbrella” package that bundles Stencila's other packages (for example, Encoda, Executa, Dockta and Thema), and will not require users to have Node.js and dependencies to be installed. Together, these developments will make it easier and faster to preview ERAs in a user's local environment before uploading the finished product to Stencila Hub for eventual submission to a journal, preprint server or self-hosted site.

Online and offline editing of ERAs using Libero Editor

Although there have been some improvements in the user-friendliness of tools such as Jupyter Notebooks and R Studio, many researchers still find them intimidating, or at least, not the most intuitive environments for the final authoring stages of an article.

Building on the foundation of Libero Editor, eLife’s JATS-native manuscript editing tool, Stencila will add the ability to insert, edit and execute code chunks and code expressions. Stencila has developed customisable web components for these document elements that are designed to be compatible with React and ProseMirror (the web technologies upon which Libero Editor is based), and will further develop Encoda’s codec to support encoding code chunks and expressions as JATS XML. This will allow on-the-fly conversion of executable formats such as Jupyter Notebooks and R Markdown, allowing them to be opened within Libero Editor.

Libero Editor (with the Stencila plugin) will then be deployed to Stencila Hub to streamline the online authoring, editing and previewing experience for ERAs.

To further streamline the process of creating ERAs, we will also provide an Electron-based desktop editor which will allow users to easily edit, preview, and interact with their ERAs in their own local environment.

Closing the reproducibility loop

Currently, most ERAs rely on a large, generic Docker image containing many of the packages that are more commonly used by researchers. This has several advantages, including faster session start times when executing an ERA in the browser (because we do not need to pull or build a custom image for each project). However, there are instances where a project needs particular, less often used, dependencies. In these cases the user must build their own Docker image and specify it in the ERA project’s settings.

Stencila has already developed specific tools to respond to these needs: Dockta, which provides an easy and relatively robust way to replicate a user’s software environment, and Nixta, a tool that dramatically simplifies the creation of reproducible computing environments using Nix. Both will be integrated with the ERA workflows and tool sets on Stencila Hub to deliver smaller, more optimised Docker containers, which will make it easier and more cost-effective to distribute containerised ERAs as additional downloadable assets to a published research paper.

Reactive and Polyglot Reactive ERAs

Increasing uptake of tools such as R Shiny by researchers suggests that there is increasing demand, and indeed increasing expectation, of the sort of reactive, interactive functionality researchers are used to from spreadsheets. Moreover, several of the ERAs that have been, and are going to be, published use more than one language.

Over the next phase of ERA development we aim to take a more gradual approach to introducing greater reactivity into ERAs. Parameters will now be made first-class citizens in articles, alongside the existing code chunks and code expressions. Authors will be able to specify the parameters of their ERA in whatever language they are using to author it – for example, RMarkdown or JATS XML. They will also be able to manually specify which code chunks and expressions are dependent on those parameters.

Stencila’s language executors are currently able to semantically analyse code to automatically determine dependencies. Moving forward, ERA authors will be able to decide whether to use this automatic dependency analysis or to manually specify it. Stencila’s execution engine will be augmented so that it is aware of the dependency between code blocks and will automatically update them as needed.

To make reactive ERAs polyglot, it will be necessary to pass variables between languages. Data is currently transferred between different language executors and the UI using schema.org compatible JSON, additional future development will make the transfer of variables between languages more efficient and remove the need for it to be done through the client.

Self-hosted data and compute for ERAs

Several of our early users, particularly those based at universities or in industry, have large databases that cannot be uploaded to the cloud, and/or that they would like to be able to work with by making use of their own High Performance Computing resources (which are usually co-located with that data).

Stencila Hub has been architected around job queues. When a reader runs an ERA, in the background a job to start a session is sent to a job queue. The job is then picked up by a “worker” process which will start the session and return its output to the browser.

Moving forward, institutions will be able to run worker instances that would listen to the institutions’ own job queue, and which would receive jobs for that institution's own projects.

Next steps

The deliverables listed here will be in development in the next 18 months, and we will be providing regular progress updates here on eLife Labs. You can also keep up-to-date on the project by signing up to our ERA newsletter.

For any questions or interests in ERA’s capabilities and functionalities, please read our FAQ, watch our webinar, or click the “share your feedback” button above to annotate this article. You can also contact us at innovation [at] elifesciences [dot] org.

Do you have an idea or innovation to share? Send a short outline for a Labs blogpost to innovation [at] elifesciences [dot] org.

For the latest in innovation, eLife Labs and new open-source tools, sign up for our technology and innovation newsletter. You can also follow @eLifeInnovation on Twitter.