- Views 355
This post is part of the Innovator Stories Series. At eLife Innovation, we are constantly amazed by open-science ideas and prototypes, but perhaps even more so, we are continuously inspired by the open-source community: its talents, creativity and generosity. In bringing into focus the people behind the projects, we hope that you too will be inspired to make a difference.
By Junseon Yoo
I believe that the best way of solving complex problems is to collaborate intellectually with a group of people: we owe much of the prosperity we enjoy to previous intellectual collaborations. Most types of collaboration usually have spatial and time constraints. Intellectual collaboration, on the other hand, is not bound to these constraints as long as there are methods to communicate and share information.
In my school days, I always wanted to be a scientist. Brian Greene’s fascinating book on string theory, The Elegant Universe, made me dream of becoming a theoretical physicist like Edward Witten. Thus, I went to attend POSTECH, one of the top research-oriented universities in Korea. I soon realized, however, that I was not smart enough to uncover the secrets of nature with complex mathematics, and I changed my major to industrial engineering.
Soon after, many of my colleagues pursued PhDs in STEM fields. We frequently talked about the typical “lab life”, and their existing challenges with experiments, professors, and things that bothered them in the academic publishing process. During those days, I tried to solve some of their problems with data-mining techniques, resulting in some co-publications with them. It was not long after that I had my first encounter with some of the nuisance of academic publishing.
One of the biggest bottlenecks to better intellectual collaboration is the inefficient knowledge discovery process. There are major differences between a general keyword-based web search and a search for academic papers: papers within a specific research field tend to contain very similar keywords, as they usually consist of both existing knowledge from previous findings, as well as new knowledge that the paper’s authors have discovered. Word order is also more important in research papers--a small difference in wording or order of the words could mean a completely different topic--for instance the term “surface roughness” in materials science. Applying a general knowledge search algorithm to research papers therefore generally produces poor results and user experiences.
The problem goes beyond the search algorithm. When I search for papers on Google Scholar, for example, I always have to open each individual result in a new tab to view each paper’s basic information, then close all of the tabs again. This is because when you look at your search results, the title seldom tells you anything useful, and a lot of the relevant information you need to determine whether this paper is relevant for your research is not shown in the small result preview. The accurate and efficient retrieval of information beyond the title is not trivial: we soon discovered many problems including author disambiguation, inaccurate citation count, and many more.
Based on our experience and findings, we founded the startup named Pluto and started developing a standardised database for metadata of academic papers. The by-product of this effort is an academic search engine which we now call Scinapse.
Scinapse uses a search algorithm specifically designed to facilitate the discovery of research papers.
We take into account H-index, impact factor (IF), authors’ previous publication records, recent citation trends, and many more signals, to provide a semantic search experience superior to Google Scholar. We want to make sure the results are relevant for each discipline and field, so we take into account the prestige and quality of conferences as well as sensitivity of grouping and ordering of keywords based on each discipline and research trend. Our algorithm utilises highly specialised data-mining and machine learning techniques to distinguish a paper’s characteristics.
In addition, we summarise and provide more key information on the search results page itself to help you decide whether to read the paper further, reducing your time looking into each paper by opening each of the search result links. What information is most valuable in helping researchers in making this decision? The impact factor is a controversial choice: it certainly does not reflect the quality of the papers. However, I don’t think that the indices are totally useless for judging the papers’ quality. At times, indices do reduce the time spent finding valuable papers. Ideally, it is better to evaluate papers without any bias, but additional information about the authors, journals and some indices could help researchers find papers with more efficiency and accuracy. What we want to do at the moment is to provide all the tools and data that researchers may want to see at a glance, so we included the impact factor in our search results page.
Using AI, we can also email you recommend papers based on your search activities. In the future, we plan to recommend related papers by taking into account both bibliographical analysis and your general activity logs. You can also read or download the paper right on our site if they are licensed for unrestricted online access. Where the paper is inaccessible because it is behind paywalls or of other reasons, we provide a “request author/university” function where users can email authors using our template.
I don’t think that the indices are totally useless for judging the papers’ quality. At times, indices do reduce the time spent finding valuable papers.
We decided to make Scinapse an open-source project because we value the power of community. Many modern software companies have shown how open source can make much higher quality software. We believe that if people really need and like a piece of software, they will want to help maintain it and contribute to providing more features if they can.
Our choice of building open source was also dictated by the realities of our available resources: we just can’t develop every feature that users need on our own; we can’t catch and fix every bug; we sometimes don’t know what users want. To this end, we realised the need to create a community that loves research, and to rely on them. We don’t own the Scinapse project – we’re just the initiators. We also plan to release some exciting new features on Scinapse in the following months.
Building a sustainable community is also our biggest challenge. To increase usage, we need to ensure that our service adds value to researchers’ workflows, to show them that it’s something worth contributing to. We also need to make contributing easy – realising the importance of good documentation, we’ve started some clear and easy-to-follow guidelines for contributors. Finally, we need to make sure that our service is robust, which means the implementation of carefully considered tests built into our development lifecycle.
Another challenge we face is data access. To improve the performance of our paper search and recommendations, it is important to secure publications’ data in legal and stable ways. We draw metadata sources from Springer Nature, PubMed, and Semantic Scholar, and we’ve also been partnering with Microsoft Research to include over 50 thousand major journals in our search. For even better search performance, we would need to analyse and index parts of full-text data, but we can’t do that due to restricted access. Moreover, the metadata schema of academic papers are not standardised: we use our own analysis and AI algorithms to process them, we currently cannot process and normalise the metadata of all available papers.
Our search algorithm gets better as more professional scientists and researchers use our platform to look for very specific papers. Being a group of engineers, we have limited opportunities to meet researchers new to Scinapse and receive meaningful feedback from users. For this reason, in the last year, we held several events to introduce open science to researchers at research-oriented universities. Through these events, we learned that many researchers are not familiar with the concept of open science, and they do not have time to notice or address the problems surrounding academia. They also don’t have anyone to introduce them to ideas like open science. At least in Korea, raising awareness for the open science movement remains crucial to its and Scinapse’s success.
As a researcher, it is easy to overlook the role of the library in publishing and viewing articles. It wasn’t long after I released Scinapse and was involved in a lot of events related to access and research, when I realised how essential the role of libraries is in providing solutions to some of the difficulties researchers face.
Libraries have also been struggling with recent changes in the academic publishing environment. As the academic publishing world has become ever more fragmented, the difficulty of decision-making in purchasing or subscribing to academic resources has sharply increased. Meanwhile, digitised and electronic versions of publications have further led to challenges in integrating and managing older publications that have yet to be digitized. The most visible issue that we have observed from their problems is that they seem to not have enough communication or modes of communication with their fellow researchers.
Such issues lead to problems like libraries purchasing or subscribing to resources that researchers do not need. Researchers frequently requesting additional resources that they need from the library, while the library faces pressure to cut the number of resources they fund access to based on their budget constraints.
Because of these difficulties, a lot of unnecessary resources are currently being consumed by libraries. To help address this problem, we are currently working with several universities to see if we can strategically analyse and suggest the most useful resources they need every year for their researchers, based on citation analysis and usage data analysis of the articles published by each institution. We also provide a customised version of Scinapse to facilitate further communication between institutions and researchers. We believe that these tasks are the most urgent tasks towards providing researchers with a better search and access experience in the research process.
At least in Korea, raising awareness for the open science movement remains crucial to its and Scinapse’s success.
I was really lucky to find my vision at a young age. Though I’m not a scientist or a scholar, I think this is my unique way of contributing to scientific progress and am very satisfied with it. After starting Pluto, I discovered many other problems in academia other than searching for and publishing papers, for example, unequal distribution of funds and resources that draws talented human capital to one discipline or another, and inefficient or bad practices and culture in lab and research settings. These are just some of the problems that I hope to tackle in the next few years.
In the end, we want all our services and our efforts to focus on how we can help researchers optimise their research, and to remove any barriers they may come across. We want our platform to be where researchers can really save time and money, while also enhancing communication, both between researchers, and between researchers and their university libraries. In our recent efforts to further support researchers, we are also looking into ways to help societies and researchers in the publishing and conference spaces. This includes a new service called Biblion for academic societies and others who want to publish and manage their journals and memberships.
Although the web has brought information access to a lot of people, it hasn’t necessarily been the best environment for an innovative research environment. Beginning with Scinapse, and evolving it for use by universities, we hope to address some of the most significant problems perceived as important by researchers and libraries, as well as tackle challenges that are often unseen but are which impede progress towards making better research possible. By creating more open, more optimal research environments online, we want to encourage researchers to make great discoveries during our lifetime.
We welcome comments, questions and feedback. Please annotate publicly on the article or contact us at innovation [at] elifesciences [dot] org.
Do you have an idea or innovation to share? Send a short outline for a Labs blogpost to innovation [at] elifesciences [dot] org.