Research Data Management
The following post was originally published on the FORCE 11 Upstream blog on February 4, 2025 by Rory Macneil and Vaida Plankytė. The original post can be found at https://doi.org/10.54900/8m9by-kfy03.
---
In recent years, interoperability has become an important concept in the field of research data management. Discussions about interoperability have taken place in published articles and numerous presentations [1] [2], and several interoperability standards have been developed [3], [4], [5]. The term has been applied, among other things, to data, metadata and research tools.
This post focuses on interoperability between research tools. Thus far this topic has been primarily explored through looking at interoperability between different tools which fall into the same tool category. The most prominent examples are generalist data repositories and data management planning tools. Some work has also been done in the electronic lab notebook interoperability sphere. After a brief summary of these three examples of ‘horizontal’ interoperability, we explore an important emerging area of interest which we call ‘vertical’ interoperability: interoperability between tools that belong to different tool categories. This post builds on work done as part of an NSF grant [6] we are carrying out in collaboration with the California Digital Library, and a presentation and discussion of vertical interoperability at the annual meeting of the Generalist Repository Ecosystem Initiative (GREI) project which took place in Chicago in September 2024 [7].
For data repositories, the RO-Crate format has been designed to be compatible with all generalist repositories and provides well-described data objects and metadata through a ZIP bundle [8] [9]. Additionally, GREI is enabling collaboration and the development of a shared standard for the major generalist repositories [10].
With regards to data management planning tools, much effort has been put into developing machine-actionable DMPs [11] and defining a common metadata standard to enable interoperability both between DMP tools themselves, and tools that consume their data [12]. What is more, a shared codebase in the form of DMP Roadmap, that is extended on by several other DMP tools, enables a base of shared features to be reused across similar tools.
When it comes to electronic lab notebooks, the ELN format [13], an extension of RO-Crate that is designed for files generated by ELNs, provides a format for the exchange of experimental research materials and data through a common archive format. Custom-built import/export capabilities using open formats such as CSV are sometimes available for ELNs but are often designed around the conversion of data from one tool’s schema into another tool’s schema, rather than through utilizing a shared metadata format.
As the number and categories of research tools continue to expand at a rapid pace, so does their use in research. Research often takes the form of complex chains of interaction across various specialist and generalist tools, and addressing the challenge of enabling researchers to gracefully use multiple tools together in their workflows has taken on increasing importance. We now look at vertical interoperability in more detail, by putting it in context first.
Until relatively recently, research was generally thought of as a linear process: an experiment was designed, which produced data, which was processed and analyzed, and the results were written up in a publication. Along with the increasing focus on data and the FAIR principles [14], another element was added to what was still conceived of as a linear process: the deposit of research into a data repository. This would generally happen at the end of the process, and often in conjunction with publication. In the past ten years or so, another element was added to the process: preparation of a data management plan at the start of the process.
The advent of data repositories and data management planning tools, the proliferation of different kinds of research tools, and the rise of the FAIR principles, along with other trends such as the development of regional and national research infrastructures have contributed to a shift from thinking about research as a linear process to conceptualising it as a continuous lifecycle. This change in thinking helps stimulate a rethinking of data and their flows. The repository is no longer the final resting place of data produced in the research process, but rather a staging platform on an ongoing journey where the data will continue to be used and modified by further research. Importantly, this modification and re-use will mostly take place using the same categories of research tools (though not necessarily the same exact tools) that were used across the pre-deposit stages of the research lifecycle.
In this way of thinking, vertical interoperability between research tools becomes a critical enabling factor for the FAIRification of data. Without interoperability between tools, it’s impossible for data to pass through different stages of the lifecycle without requiring time-consuming manual reformatting before ingestion, and runs the risk of losing its integrity due to the differences in focus and metadata formats between tools. Continuous use, re-use and modification of research data cannot take place in a sustainable manner.
Currently, vertical interoperability between research tools is a rarity and the exception rather than the rule. Examples include integrations between the Argos DMP tool and the Zenodo repository, which enables deposit of DMPs from Argos into Zenodo, and between the protocols-sharing app protocols.io and Lab Archives, which enables two way transfer of protocols between the two applications.
The benefits of these vertical integrations are many:
As far as we are aware, our research data platform RSpace is the only tool that has systematically developed integrations with multiple tools from different stages of the research lifecycle. We present our ecosystem of integrations in the following graphic:
We outline below some of the challenges we’ve encountered when building vertical interoperability:
Extensive time and effort are needed to ensure the proposed integration solution is not overly specific for one specific use case nor too generic and is flexible enough to be adjusted based on actual researcher workflows. User research is essential to validate the proposed integration design, but requires experience in user interviewing, mockup generation and high-level technical specification writing to be performed effectively. The development of these intermediary materials has the added benefit of creating a shared language between tool developers and institutions, both between and among themselves. This enables broader conversations and comparisons on a constantly growing topic.
To successfully design powerful integrations, all actions that are achievable through the tool’s user interface should ideally be available through an API. However, differences in API maturity, documentation, availability, and differences in design approaches can result in bottlenecks that greatly limit the scope of an integration, especially if there is direct interaction between more than two tools. Integrations are also costly to maintain since they need regular maintenance and testing to ensure workflows are working as intended.
Adopting vertical interoperability means adopting modular tool design, where each tool in the ecosystem is responsible for achieving its concrete purpose well. This means resisting the temptation to build a tool that reinvents the wheel with the hopes that it will be better for a specific research context. The amount of design, development and planning resources required to build a tool from scratch can be greatly reduced by exploring how existing and proven tools could be expanded upon to provide additional flexibility.
Of course, it is essential that the many tools in use are presented to researchers as a unified experience that directly enables their workflows, thus there is a need for a new kind of tool—a connector, or front-end—that provides a unified interface. As an example, RSpace integrates data repositories into the document export flow, and provides access to chemistry tools from within a document, where these tools would naturally be relevant to access from.
It must be noted that for all three of the challenges listed above, the ability to work closely with members of each team is invaluable and necessary for the success of an interoperability project. This is especially the case for vertical interoperability projects, as they might involve co-creation with people across domains who do not share the same core assumptions and knowledge.
The MaLDReTH Map of the Digital Research Tools Landscape [15], a recent output of the Research Data Alliance, provides a conceptual scaffolding for thinking about and implementing vertical interoperability. MaLDReTH is centered on a harmonized model of the research data lifecycle: for each stage of the cycle, three representative categories of tools have been identified, with each tool category listing three concrete examples of tools used in that stage of the lifecycle.
For the first time, MaLDReTH provides a visual representation of the research data lifecycle which includes concrete examples of research tools used in each stage of the lifecycle, with supporting explanation and documentation. As such, MaLDReTH has the potential to act as a significant stimulant in the development of more widespread vertical interoperability between research tools, which is a core prerequisite for enabling streamlined flows of data and metadata throughout the research lifecycle.
Three kinds of actors can take advantage of MaLDReTH and help drive the development of research infrastructures which are built around vertical interoperability.
MaLDReTH is already being used by research organizations as a reference for assessing and building research infrastructures and has been adopted for this purpose by Oxford University, University College London, JISC and the California Digital Library. In the coming year, the RDA MaLDReTH II Working Group plans to host a series of workshops to introduce MaLDReTH more widely, which should lead to more widespread adoption. The use cases this will result in will highlight the benefits of existing vertical interoperability, as well as identify the gaps remaining. The development of concrete vertical interoperability examples that all utilise the MaLDReTH model as a base will enable shared understanding of core interoperability concepts, and easier comparison of the benefits and drawbacks of implementation approaches.
Most research tools utilised in academic research benefit from at least some public funding, and in many cases are the result of dedicated public funding. Thus, funders have the ability to play a key role in driving vertical interoperability. Firstly, they can do so by requiring a demonstration of awareness of adjacent tools that are already in use elsewhere, that could be built upon to fit within current workflows and ecosystems, to discourage the projects that aim to create new, all-encompassing and highly specific tools from scratch. Similarly, approaches that consider multi-tool workflows could be encouraged. Secondly, they can require functional APIs to be developed alongside the product and treated as core functionality, rather than, as is often the case, treating APIs as an optional and incomplete afterthought.
Direction from adopters of research tools, i.e. research organizations and funders, can drive a change in approach by the developers of new research tools as well as existing tools. With a change in focus to supporting research workflows, which in most cases require application of multiple tools, collaborations and partnerships between tools to provide a joint service come to the fore. A particular tool typically can’t address needs at every stage of the workflow, so even in the initial design of a tool, thought needs to be given to ‘adjacent’ tools and how integrations with them can best support the relevant workflow(s). Interoperability and the development of the relevant APIs then become a core part of tool design, rather than an afterthought.
Copyright © 2025 Rory Macneil, Vaida Plankytė. Distributed under the terms of the Creative Commons Attribution 4.0 License.
RSpace is an open-source platform that orchestrates research workflows into FAIR data management ecosystems: request a demo or contact us to learn more.
January 20, 2025
IGSN ID Guides from DataCite & RSpaceIntegrations
DataCite and RSpace collaborated on a couple materials to demystify IGSN IDs and help support their adoption: a comprehensive guide and an implementation exemplar.
Read moreSeptember 24, 2024
Web Accessibility: Our Improvements to RSpaceProduct Updates
In the last year, we have been working to improve the accessibility of the RSpace product. We describe our work in complying with accessibility guidelines, as well as support for high contrast mode and reduced motion mode.
Read moreJune 26, 2024
Research Space Embraces Open-Source to Empower FAIR Data WorkflowsOpen source
RSpace Opens its Research Data Management Platform to the Community
Read moreJune 6, 2024
RSpace Solutions: Empowering Research Organizations with FAIR Data WorkflowsOpen source
In light of the upcoming open-source transition, we reflect on Research Space's services around RSpace tailored to enable research organisations as well as research cloud and research commons providers to implement, adopt, and maintain a secure, robust, and future-proof digital research data management environment.
Read more