How to share and store data in an electronic lab notebook

Posted by Rory on September 23rd, 2010 @ 5:16 pm

In this blog I usually look at data sharing from the point of view of the core research unit, the lab.   That was the perspective I adopted a couple of weeks ago in a presentation, Electronic lab notebooks in biomedical research, at the Storing, Accessing and Sharing Data: Addressing the Challenges and Solutions event co-hosted by the Scottish Bioinformatics Forum and S3 in Edinburgh.  I’ll come back to that perspective in a minute, but first I’d like to contrast two very different institutional perspectives on data management described at the conference.

Sanger Institute:  centralized institutional data management

Phil Butcher, head of IT at the Sanger Institute, started with a high level overview of data management issues at Sanger.  He focussed mainly on the rapid growth in the amount of data generated at Sanger, and the other institutes with which it has large scale collaborations, and the issues relating to storing and finding data when there is so much of it.  The impression I came away with is that at Sanger data is viewed as an institutional matter, not something that individual labs or scientists manage or, apparently, have much of a say in.  That makes sense, because the research projects Phil mentioned were all large scale, involving large numbers of scientists, and the generation of huge amounts of data.  The title of Phil’s talk, Scaling up Science and IT: Sanger Institute’s Perspective, reflects the centralized approach.

London Research Institute:  decentralized institutional data management

The next speaker, Jeremy Olsen, head of IT at the London Research Institute, started by saying that based on Phil’s description of Sanger, the London Research Institute was very different indeed, more  a collection of individual research groups.  In describing his LRI  perspective Jeremy said that he would be sticking up for the “little guy”.  He proceeded to briefly overview how research is carried out at the LRI, introducing the various research groups and their research interests.  The LRI represents a very different paradigm from Sanger; at the LRI decentralization rules, as reflected by the title of Jeremy’s talk, Data Growth and Management in a Diverse Life Sciences Environment.  At the LRI there are fundamental issues relating to getting a handle on what research the various groups are involved in, what data they generate and how they manage it. Progress would need to be made on understanding  these issues before it would be possible even to consider a centralized approach to data management and what that might entail.

The lab: bottom up data management

When it came time for my presentation, I started by saying that if Phil was representing the centralized  institutional approach, and Phil was looking at  the “little guys” from an institutional perspective, I was going to look at the issue of data management and sharing from the point of view of the little guy him/herself, i.e. the PI.  In the academic context, it’s important to note that the Sanger model is the exception and the LRI  decentralized model is the rule.  In fact it is almost certainly the case that the LRI, decentralized as it is, is still towards the more organized and centralized end of the spectrum of academic biomedical institutions. That point was reinforced to me when speaking recently with the IT director of a medium – large biomedical research institute in Australia (800 people including 700 scientific staff).  His description of the issues he faced with getting a grip on what data there was in the labs at the institute, how they managed it (if they managed it all), and uncertainty about how to help PIs get a better handle on their data was uncannily reminiscent of Jeremy’s description of the situation at the LRI.

From the perspective of IT managers tasked with, among other things, trying to bring some order to the data generated by the research groups at their institution, to store it in a cost effective fashion and have it archived in a way that is useful in the future, multiple PIs generating ever increasing amounts of data may be a ‘problem’ to be managed or dealt with.  But from the PIs’ point of view it is their data and theirs to manage (or not) as they want.  There is a pretty fundamental difference in outlook here.

Electronic lab notebooks — part of the solution?

In my presentation I asked where electronic lab notebooks might fit into this picture, and whether they could have a role to play in crafting better data management solutions that meet the objectives of both PIs and IT directors.

ELNs tick some of the key boxes IT directors look for in best practice in data storage and sharing, including:

  1. Storing metadata in a structured fashion and ensuring controlled access.
  2. Effectively managing different data types, including attachments and imports.
  3. Allowing improved indexing  and search, through the use of structured metadata.

Electronic lab notebooks can also solve  the key data management problem facing many PIs:  coordinating a wide diversity of data type sets generated by a large number of people within the lab.  They can, that is, if they meet the following key requirements of today’s PIs:

  1. The ELN is flexible and can be set up the way the PI and their lab want it set up.
  2. It’s easy for the lab to transfer to the ELN.
  3. The ELN facilitates better exchange of information between members of the lab and, over time, better archiving.
  4. the ELN is web based and hence accessible anywhere, anytime.

So, electronic lab notebooks can help to solve the key data management  issue faced by  the core unit in academic institutions — labs.  And they provide a platform for data management that IT directors looking at the problem from an institutional perspective can work with.  As such they can be part of a solution which benefits both PIs, who are concerned with the research done in their group, and IT directors, who are concerned with the data generated throughout their institution.

5 Things PIs want in an electronic lab notebook — other suggestions?

Posted by Rory on July 28th, 2010 @ 7:00 am

What PIs want in an electronic lab notebook is often different from what postdocs and graduate students want because PIs are looking for a tool for recording the entire lab’s work, rather than an individual note taking tool.  I looked around the web at recent discussions of what PIs are looking for in an ELN, and identified five common themes:

  1. Something that’s easy to learn and easy to use in order to ensure (relatively stress free) lab-wide buy in and take up.  Joshua Shaevitz, at Princeton, has a good description of the considerations that went into adopting an ELN, and the adoption process, in his recent  post on My Lab’s Wiki-based Electronic Lab Notebook System.  He says, “Before implementing our wiki system, I setup a mock wiki ELN on my laptop and presented it during a  lab meeting to show everyone the benefits firsthand. I especially wanted to convince them that the new system would not generate extra work, but would instead make their lives easier.”
  2. Something that’s flexible in terms of providing for, on the one hand, common structures for group records and records that need to be accessed by multiple members of the group, and, on the other hand, scope for individuals to ‘do their own thing’ in terms of both research style and having their own private space.  Joshua Shaevitz again: “I didn’t want to impose too much structure on each lab member, as I think notebook style is very personal thing. But, I also wanted to ensure that the results would be compatible with features such as search and would work well with our archiving strategies.”
  3. Something that facilitates integrated handling of  experimental data (i.e. the lab notebook function) in the same environment as other information the lab deals with, e.g. protocols, meeting notes, etc. Alex Swarbrick at the Garvan Institute: we use our electronic lab notebook “to compile the diverse collections of data that we generate as biologists, such as images and spreadsheets, and to take minutes of meetings.”
  4. Related to the previous point, something that provides the capacity to manage physical inventory as well as data in electronic form, and the ability to link the two together.  This point is brought out by Cameron Neylon in a thread accessible in a great recent discussion started by Jonathan Eisen at U.C. Davis, Possible electronic lab notebook systems – update.  In discussing what kinds of data a system needs to able to handle, Cameron says, “generating, storing, analysing and publishing research objects, explicitly including samples and other physical objects.”  And Alex Swarbrick again: “the ability to link records, reagents and experiments. For example, to connect an experimental mouse with the tube containing its tissues in the freezer, to the 6 different experiments (conducted over a year) that analysed those tissues in different ways. Managing this kind of ‘metadata’ is absolutely essential to our work.”
  5. Something that can “help to deal with information and data overload (sorting and filtering)” — a scientist interviewed in a recent study of the research practices of seven life sciences research labs Patterns of information use and exchange:  case studies of researchers in the life sciences.

How does this list sound?  Is it an accurate reflection of what others want in an ELN? Is it comprehensive?  Are key requirements missing?  Comments welcome!