Open-Source Community Call: AI and machine-learning technologies for open research

We share a recording of our discussion on the recent efforts and opportunities in applying cutting-edge technologies to research communication.
  • Views 285
  • Annotations

Several open project leads and contributors joined our open-source community call last month to share updates on their work in AI and machine-learning technologies for open science and research communication.

Tiago Lubiana Alves, a researcher at the University of São Paulo and a participant at the eLife Innovation Sprint 2019, presented a primer on using machine learning for knowledge extraction, using the work of the InstruMinetal team at the eLife Sprint as an example. Based on prior open projects on knowledge extraction, the team worked on a method for automatically extracting candidates for scientific equipment terms from papers using word2vec and regular expressions. You can find out more about the team’s work in the Sprint project roundup.

Daniel Ecer, Data Scientist at eLife, introduced PeerTax (built by Alessio Caciagli), a machine learning approach to investigate the fundamental components of peer-review reports. The initial model, built on the eLife peer-review reports corpus, identified main clusters of peer-review sentences. The team hopes to improve the performance of the model by gathering more annotated training data and by refining the categories.

Tim Vines, the project lead of DataSeer, shared a live demo of the current prototype. DataSeer is designed to help researchers share data. The tool detects mentions of datasets within a manuscript, provides data policy guidelines based on the type of data detected, and allows users to then share links to the data hosted on repositories and supply additional metadata. We look forward to the official launch of the tool in a few weeks.

Last but not least, Victor Venema from Grassroot Journals discussed an idea to build a machine learning algorithm which would suggest and find papers based on a collection of articles supplied by the searcher. The algorithm would make use of not only keywords but also authors and citations from the articles supplied. He invites interested innovators to get in touch.

Our next community call will take place in February 2020. If you’d also like to present your work then, please get in touch with us on innovation [at] or @eLifeInnovation on Twitter – the community will be interested to hear about your open-source tool for improving the way science is communicated.

Full notes from the October call, including questions and comments that were made in writing only during and/or directly after the meeting, are available here.


We welcome comments, questions and feedback. Please annotate publicly on the article or contact us at innovation [at] elifesciences [dot] org.

Do you have an idea or innovation to share? Send a short outline for a Labs blogpost to innovation [at] elifesciences [dot] org.

Supercharge your project by joining eLife Innovation Leaders– a 14-week mentorship and open leadership training programme designed for innovators developing open-source tools and projects for open research communication. Find out more and apply here by December 8.

For the latest in innovation, eLife Labs and new open-source tools, sign up for our technology and innovation newsletter. You can also follow @eLifeInnovation on Twitter.