How to share and store data in an electronic lab notebook

Posted by Rory on September 23rd, 2010 @ 5:16 pm

In this blog I usually look at data sharing from the point of view of the core research unit, the lab.   That was the perspective I adopted a couple of weeks ago in a presentation, Electronic lab notebooks in biomedical research, at the Storing, Accessing and Sharing Data: Addressing the Challenges and Solutions event co-hosted by the Scottish Bioinformatics Forum and S3 in Edinburgh.  I’ll come back to that perspective in a minute, but first I’d like to contrast two very different institutional perspectives on data management described at the conference.

Sanger Institute:  centralized institutional data management

Phil Butcher, head of IT at the Sanger Institute, started with a high level overview of data management issues at Sanger.  He focussed mainly on the rapid growth in the amount of data generated at Sanger, and the other institutes with which it has large scale collaborations, and the issues relating to storing and finding data when there is so much of it.  The impression I came away with is that at Sanger data is viewed as an institutional matter, not something that individual labs or scientists manage or, apparently, have much of a say in.  That makes sense, because the research projects Phil mentioned were all large scale, involving large numbers of scientists, and the generation of huge amounts of data.  The title of Phil’s talk, Scaling up Science and IT: Sanger Institute’s Perspective, reflects the centralized approach.

London Research Institute:  decentralized institutional data management

The next speaker, Jeremy Olsen, head of IT at the London Research Institute, started by saying that based on Phil’s description of Sanger, the London Research Institute was very different indeed, more  a collection of individual research groups.  In describing his LRI  perspective Jeremy said that he would be sticking up for the “little guy”.  He proceeded to briefly overview how research is carried out at the LRI, introducing the various research groups and their research interests.  The LRI represents a very different paradigm from Sanger; at the LRI decentralization rules, as reflected by the title of Jeremy’s talk, Data Growth and Management in a Diverse Life Sciences Environment.  At the LRI there are fundamental issues relating to getting a handle on what research the various groups are involved in, what data they generate and how they manage it. Progress would need to be made on understanding  these issues before it would be possible even to consider a centralized approach to data management and what that might entail.

The lab: bottom up data management

When it came time for my presentation, I started by saying that if Phil was representing the centralized  institutional approach, and Phil was looking at  the “little guys” from an institutional perspective, I was going to look at the issue of data management and sharing from the point of view of the little guy him/herself, i.e. the PI.  In the academic context, it’s important to note that the Sanger model is the exception and the LRI  decentralized model is the rule.  In fact it is almost certainly the case that the LRI, decentralized as it is, is still towards the more organized and centralized end of the spectrum of academic biomedical institutions. That point was reinforced to me when speaking recently with the IT director of a medium – large biomedical research institute in Australia (800 people including 700 scientific staff).  His description of the issues he faced with getting a grip on what data there was in the labs at the institute, how they managed it (if they managed it all), and uncertainty about how to help PIs get a better handle on their data was uncannily reminiscent of Jeremy’s description of the situation at the LRI.

From the perspective of IT managers tasked with, among other things, trying to bring some order to the data generated by the research groups at their institution, to store it in a cost effective fashion and have it archived in a way that is useful in the future, multiple PIs generating ever increasing amounts of data may be a ‘problem’ to be managed or dealt with.  But from the PIs’ point of view it is their data and theirs to manage (or not) as they want.  There is a pretty fundamental difference in outlook here.

Electronic lab notebooks — part of the solution?

In my presentation I asked where electronic lab notebooks might fit into this picture, and whether they could have a role to play in crafting better data management solutions that meet the objectives of both PIs and IT directors.

ELNs tick some of the key boxes IT directors look for in best practice in data storage and sharing, including:

  1. Storing metadata in a structured fashion and ensuring controlled access.
  2. Effectively managing different data types, including attachments and imports.
  3. Allowing improved indexing  and search, through the use of structured metadata.

Electronic lab notebooks can also solve  the key data management problem facing many PIs:  coordinating a wide diversity of data type sets generated by a large number of people within the lab.  They can, that is, if they meet the following key requirements of today’s PIs:

  1. The ELN is flexible and can be set up the way the PI and their lab want it set up.
  2. It’s easy for the lab to transfer to the ELN.
  3. The ELN facilitates better exchange of information between members of the lab and, over time, better archiving.
  4. the ELN is web based and hence accessible anywhere, anytime.

So, electronic lab notebooks can help to solve the key data management  issue faced by  the core unit in academic institutions — labs.  And they provide a platform for data management that IT directors looking at the problem from an institutional perspective can work with.  As such they can be part of a solution which benefits both PIs, who are concerned with the research done in their group, and IT directors, who are concerned with the data generated throughout their institution.

What is an electronic lab notebook III: the benefits of structure

Posted by Rory on July 21st, 2010 @ 8:43 pm

The last post and the one before that looked at different views on who electronic lab notebooks are for — individuals or the lab — and how wikis measure up as environments that  enable lab members to enter and share experimental data.  Notwithstanding their attraction as convenient online tools for sharing general information, wikis lack structure, and it is primarily this which has kept even labs that use wikis wedded to the paper lab notebook for documenting experiments.

In this post we’ll look at why the ability to add structure to research data  is the key enabler permitting the transition from paper lab notebooks to electronic lab notebooks.

Paper lab notebooks support as much structure as you like.  You can create sections, paste copies of images, make notes in the margin, draw diagrams — the only limit to adding structure to a paper lab notebook is the scientist’s imagination.    Unlike a wiki, an electronic lab notebook allows you to replicate the structure that you put into a paper lab notebook.  Why?  Because an electronic lab notebook allows the creation of records with different kinds of fields.  This supports structuring your research data in two ways. First, the different types of fields support entry of information in differing ways, e.g. by date or time, by entry of text, by number, with radio buttons signalling  series of mutually exclusive options, etc.  Second, different classes of records can be put together with different combinations of various kinds of fields, so creating types of records that are appropriate to different aspects of research, e.g. a CHiP experiment, a freezer, a particular protocol,or an antibody.   This is in stark contrast to the wiki, which has only one type of record — the wiki page — and an undifferentiated one at that with no support for separation into different fields.

The benefits of this structure extend further to the other  things you use in your research like images and spreadsheets.  Like wikis, electronic lab notebooks have the advantage over paper lab notebooks of being able to make links to images and spreadsheets, which can also be inserted into the electronic repository — wiki or electronic lab notebook.  But electronic lab notebooks offer superior structuring capabilities in this respect too, because with an electronic lab notebook, unlike a wiki, you can associate a spreadsheet, image or other electronic item with a particular field of a particular kind within a record.

Making use of an electronic lab notebook’s ability to create records with different kinds of fields allows you to put structure into the record of your research in an online electronic environment much as you did with a paper lab notebook and at the same time gain the benefits of associations between bits of information which can only be made in an online environment, so they actually enable taking structuring of research data to a new and higher level.  It is this element – the ability to add structure to research data – which explains why that electronic lab notebooks — and not wikis — provide the best platform for labs  wishing to move from paper to electronic recording and management of their research data.  This is the unstated driver that lies behind wikipedia’s definition of electronic lab notebook as “a software program designed to replace paper lab notebooks“.    And so, I would revise that definition and say that an electronic lab notebook is an online environment that provides a sufficient capability for structuring research data to enable scientists to document and share their research data in that environment without the need to also resort to a paper lab  notebook.