By Peter Kraker, Maxi Schramm and Christopher Kittel
Introduction
Open Knowledge Maps is a charitable, non-profit organization dedicated to dramatically increasing the visibility of scientific knowledge for science and society alike. Headquartered in Vienna, Austria, Open Knowledge Maps comprises an international community including team and supporting members, advisors and partner organizations. As part of our mission, we operate the largest visual search engine for research in the world. On https://openknowledgemaps.org, users can create knowledge maps of research topics in any discipline. Knowledge maps provide an instant overview of a topic by showing the main areas at a glance, with relevant papers and concepts attached to each area.
Open Knowledge Maps enables a diverse set of stakeholders to openly explore, discover, and make use of scientific content. Among our users are researchers, students, librarians, educators, science journalists and practitioners across all scientific disciplines and geographic regions. Last year alone, we served close to a million pages, and more than 200,000 knowledge maps have been created to date. Our service increases the visibility of content from a wide variety of sources such as academic repositories, funding organisations, research institutions, and publishers.
Ultimately, we want to enable our users to collaboratively edit the knowledge maps on our platform, transforming a closed, individual process to an open and collaborative one. By sharing the results of our discoveries, we can build on top of each other’s knowledge. This process will be augmented by our machine learning core and result in a large-scale system of open, interactive and interlinked knowledge maps for every research topic, every field and every discipline. Users will be able to explore the entirety of scientific knowledge within this system.
Ultimately, we want to enable our users to collaboratively edit the knowledge maps on our platform, transforming a closed, individual process to an open and collaborative one.
Discovery for an open science
Open Knowledge Maps is an open infrastructure based on the principles of open science: source code, content and data are shared under an open license. We are a building block of the open discovery infrastructure, with which we extensively collaborate. As a community-driven initiative, we develop our services in a participatory process together with our stakeholders. Our aim is to create an inclusive, sustainable and equitable infrastructure that can be used by anyone, independent of geography, age, or stakeholder group. We are conducting workshops to introduce potential users to the tool and help improve their discovery skills along the way. In our community program, the Enthusiasts program, we support power users and ambassadors to conduct workshops themselves and to give feedback from their community’s perspective.
Software and architecture
Head Start, the underlying software library, is an open-source knowledge mapping framework developed by the Open Knowledge Maps team. Head Start provides an interactive, web-based interface and comes with a backend that is capable of automatically producing knowledge maps from a variety of data, including text, metadata and references. Since 2016, we integrated a range of data sources from the Open Science ecosystem (BASE, PubMed, OpenAIRE, DOAJ, and PLOS) and created customized adaptations for diverse user requirements (our web service, VIPER, CRIS Vis, LinkedCat+).
Head Start follows a client-server architecture with a user-facing search interface and map frontend based on JavaScript, a service and API layer based on Apache/PHP, a natural language processing/machine learning backend based on R, and an SQLite database for the persistence of map representations.
To create a knowledge map, we first retrieve the metadata of a set of documents from the respective database (e.g. the most relevant documents for a search term). Then, we compute the map representation in four steps:
- Documents are preprocessed by cleaning metadata like title, abstract, author, and journal by removing punctuation, filtering stopwords, transforming to lower-case and stemming.
- The similarity of documents is then calculated based on the cosine-similarity of cleaned metadata.
- The documents are then clustered into groups with Ward’s method of minimum variance, and placed on the map with non-metric multidimensional scaling.
- Finally, clusters are labelled by taking the top three n-grams of keywords, weighted by Term Frequency - Inverse Document Frequency.
Using a unique map identifier generated from the search query, the search options and possibly other search parameters, a map representation can be retrieved from the server via the API and passed to the client. The client provides an interactive, web-based knowledge map based on D3.js, following Shneiderman's principle of "overview first, zoom and filter, then details-on-demand".
When presenting the knowledge map, the client first performs a force-directed layout optimization to reduce overlap and increase the distinction between clusters. Users can then explore the visualization using zoom, filters and an in-map text search. Every integration has optimized search, filter and exploration capabilities depending on the use cases and available metadata. The map is complemented by a synchronized list-based presentation. Once users have identified a document they are interested in, they can view the PDF within the same interface. For each PDF, we have integrated the open annotation service Hypothes.is.
For more information on Head Start and the technical aspects covered above, please refer to this paper.
Limits of the current architecture
The first iteration of the Head Start client was written in native JavaScript, and was built around the mediator design pattern. A mediator keeps track of the overall status of the visualization, for example, if and where the map is zoomed in, whether certain papers are to be hidden in bubbles according to the filter function in the list, or the state of third-party integrations such as Hypothes.is. The advantage of the mediator is that objects (for example bubbles or papers) do not interact directly with each other, so it is not necessary to define how one object interacts with every other object. Over time, however, the mediator has grown overly complex, making it difficult and error-prone, and limiting our ability to create new interactions or improve existing ones.
Our frontend framework has grown to the limits of complexity the original design can support. This impedes future development, and makes maintenance increasingly arduous. In addition, the complexity of the current system requires significant training, making it difficult to onboard new developers. The frontend requires a substantial upgrade to accommodate functionality crucial to fulfilling important use cases, such as integrating Open Knowledge Maps into user workflows, enabling future integrations (e.g. of dataset indices), and reaching our goal of a collaborative discovery environment. This need to upgrade our infrastructure has become a major stumbling block for our development.
This will provide Open Knowledge Maps with a sustainable technological foundation towards our goal of a comprehensive, inclusive, and collaborative open discovery environment.
The frontend refactoring project
In order to restore our ability to innovate on Open Knowledge Maps, we will carry out a much-needed upgrade of our technical platform in an upcoming refactoring project, which will be funded by the eLife Innovation Initiative. The project involves further code modularization, a switch to a stateful framework and the introduction of test-driven development. This will provide Open Knowledge Maps with a sustainable technological foundation towards our goal of a comprehensive, inclusive, and collaborative open discovery environment.
Recent frontend frameworks like React or Vue.js address the complexity problem outlined above by providing a built-in global state management and composable components. The built-in state management enables us to manage interactions purposefully and selectively with a specialized global state store, replacing the catchall mediator. In consequence, it becomes easier to add new interactions and to implement new features, especially regarding collaborative editing (for examples, sharing a link to a single enlarged paper in the map, enabling forward and back button to walk through the different interactions, editing map data, moving documents). In prototyping projects, we evaluated both React and Vue.js, with no clear winner regarding the overall performance experienced by users. Making a final decision will be part of this project.
Parallel to the refactoring in a stateful framework, we will split up the interface’s accumulated functionalities into re-usable, modular components. We will also decouple functional and design elements. This will leave us with building blocks that can be configured depending on end-user requirements.
This refactoring project will bring about many benefits for our stakeholders. Developers, designers, and maintainers will be able to focus only on code relevant for a desired change. Users will be able to immediately use a number of features that we could not implement before, and they will be able to use and access new functionalities and integrations quicker. Maintainers and contributors will also be able to contribute and review more easily. Finally, deployers will find our software much easier to re-use and adapt.
Overall, once this project is completed, it will be easier for volunteers to contribute to Open Knowledge Maps, whether they are end-users, developers, or designers. As a direct effect of this, Head Start will be easier to reuse, extend, and adapt, moving towards the more mature form of a distributable software package for making collections of any kind visible and discoverable.
We want you!
As part of this project, we are currently hiring a UX-minded front-end developer to join the Open Knowledge Maps team in Vienna. If you know your way around reactive JavaScript frameworks and are interested in a position where your work positively affects people all around the world, then this job might be for you. This is a fixed-term, entry-level position for 20 hours/week in the Open Knowledge Maps office in Vienna. For more information, please refer to the full job advert.
#
We welcome comments, questions and feedback. Please annotate publicly on the article or contact us at innovation [at] elifesciences [dot] org.
Do you have an idea or innovation to share? Send a short outline for a Labs blogpost to innovation [at] elifesciences [dot] org.
For the latest in innovation, eLife Labs and new open-source tools, sign up for our technology and innovation newsletter. You can also follow @eLifeInnovation on Twitter.