The eLife method and methodology of data collection

By Melissa Dodd, Head of Production Operations ( m.dodd@elifesciences.org)

Production is largely an invisible part of the publishing process. For many people it’s a box through which content has to progress between peer review and publication, with a one-stage pop out where a PDF proof is sent to the author for review.

It’s vaguely recognized that content is edited and processed in some way, but other than that production is a mysterious process for most authors and people working outside the area. As a result, workflows and systems often stop and start between editorial and production and production and publication.

Workflows and systems often stop and start between editorial and production

At eLife, we’ve approached the whole publishing process end to end as one workflow – to make it more transparent for authors, to prevent unnecessary duplication of effort and to reduce the introduction of errors caused by breaks in the flow.

Here are some of the ways in which our approach to data collection is helping to integrate stages in the process:

At initial submission, authors can very easily upload and submit an article to us for consideration. Only a PDF of the article and cover letter are required, and minimal information is collected on the submission screens. Only if the article passes this first hurdle will the author be required to fill out the full submission forms (which are used to collect information about authors, affiliations, competing interests, funding, and so on).

At full submission, every piece of information collected from the submission screen questions is used at some point. We don’t collect any information that is not used to either inform the peer review processes or directly for publication. We indicate where this information is used exclusively for peer review, but otherwise all entries the author adds are ultimately published from this source.

Article and author information (‘metadata’) collected from the submission screens is included on the reviewers’ PDF so reviewers and editors can clearly see important information about each paper in a standard way – at a glance.

Staff can perform quality and ethical/standards checks very easily and pick up potential issues well before publication because this information is so readily available. This helps to minimize delays that might occur later in the publication process.

There is ONE stream of metadata to keep up-to-date. The author is not required (or allowed) to provide this information in the Word file. This prevents duplicating data entry and interpreting two sources, and reduces the potential for errors on publication. It also means the information supplied in the submission system is kept current up until acceptance.

What the author provides is what’s published. The metadata is transmitted to our production processors in XML format – largely in the format it will be published in. It is automatically processed, preventing the introduction of manual (human) errors and increasing the speed of the process.

Here’s an example:

[caption id="attachment_1898" align="alignnone" width="410"]

Submission screen image

This screenshot demonstrates how funding information is collected at submission.[/caption]

As the information is collected in separate fields, the information is independently tagged when it is sent to production; there’s no interpretation required by the vendor or production staff when processing.

Data is collected on the submission screen in a standard way, making the production process faster and more automated.

On publication, a simple table or statement in the text will have a wealth of granular tagging in the XML behind it; text miners will be able to find who the funder is, what the grant reference number is, and which authors are associated with the funding in a programmatic way.

The submission system we use (eJournal Press) generates a unique ID for each author, which is included in the XML that goes to publication. In time, we will use that ID to programmatically build profiles for authors who publish ineLife– including all this additional information such as their funders, competing interests, contributions to an article and so on. Notice in the screenshot above that the author names are already available for selection. This means the output from editorial uses the unique author ID to link all information relevant to a given author.

How could we make it even easier for authors?

A valid question I’m asked is, “Why can’t you extract all the information you need from my Word or PDF file, so I don’t have to answer all these questions?”

This is something we’d love to develop in future and – piece by piece – we will work towards this. For example, we do have a mechanism whereby author names and addresses can be auto-added to the submission system by uploading an Excel or CSV file.

Because there’s currently no standard way all publishers collect and manage the data for accepted articles, and thus no standard way for authors to provide it, it’s hard to create a one-size-fits-all automatable solution. Making the submission process clean, easy, and fast will remain our objective after the initial launch ofeLife.

The good news is that the more effort we put into this process, the more potential reach the content has.

The good news is that the more effort we put into this process, the more potential reach the content has. We want to push the content published ineLifeout to as many different places as possible, for maximum impact and usability. Authors are directly taking part in this and contributing to the widespread access, impact, and availability of their works.

If you’re aneLifeauthor and have been through this process, let us know what you think. Email m.dodd@elifesciences.org. Other comments and questions are also always welcome.