What kind of data sharing do scientists want?

Posted by Rory on October 20th, 2010 @ 6:21 pm

In a recent article, Data Sharing:  Empty Archives, Bryn Nelson points out that although

“Some [scientific] communities have been quite open to sharing [data] . . . those discipline-specific successes are the exception rather than the rule in science. All too many observations lie isolated and forgotten on personal hard drives and CDs, trapped by technical, legal and cultural barriers.”

The communities where most data is not widely shared include disciplines like biology, chemistry, medicine and materials where the bulk of experimental data is generated by individual researchers and labs.  In this post I’d like to take a closer look at data sharing practices and attitudes in these ‘bottom up’ research disciplines.  Let me disclose my bias at the outset — this is that data sharing practices in these disciplines should be determined by the members of the community, not by funding bodies or other policy makers.  If data sharing is to become more widespread, it should be because the members of the community want that to happen, and it should happen in ways that they determine.

The starting point has to be existing attitudes to data sharing in the communities. A recent study of seven labs doing various forms of biology: Patterns of information use and exchange:  case studies of researchers in the life sciences came up with some interesting observations about those attitudes.  What the study found was that life sciences researchers

“are in principle in favour of sharing many kinds of information, and are remarkably willing to do so in order to facilitate each other’s research, without any apparent formal reward.  Thus information is extensively shared within research groups and laboratories, and informally across organisational boundaries through wider research networks, both before and after formal publication.  This may include sharing . . . standard operating procedures, plasmids, computer programmes, scripts and statistical analysis tools.”

So researchers are willing to share  information, but on their own terms.  Essentially they are willing to share information with people they are collaborating with, inside and outside the lab.  In other words, researchers are willing to share information with whom they the researchers want.

The study went on to report that

“Researchers are much more ready to share methods and tools than experimental data . . .  They are reluctant to share the data that makes up their ‘intellectual capital’.  In particular, they are wary of giving away their data for someone else to analyse and get the credit.  “

Researchers are willing to share experimental data, but only subject to two provisos

  • “First they are concerned that they need sufficient time to complete the analysis and, in some cases, to explore intellectual property rights . . .
  • Second they want to publish their results before or simultaneously to publishing their data — and they want to be the ones to publish the data.”

So researchers are also willing to share experimental data, but again only on their own terms.  And those terms are that researchers want to decide what data gets shared and when it gets shared.

Seven labs is hardly a representative sample of the hundreds of thousands if not millions of labs in ‘bottom up’ research disciplines like biology, chemistry, medicine and materials.  And yet the attitudes noted by the study have a strong feeling of familiarity about them, and it’s not implausible to assume that they are  broadly representative of  attitudes that are widespread throughout these disciplines.  The message from the scientists interviewed  in the study is loud and clear:  they are willing to share their data only when they can decide which data to share, whom to share it with, and when.

In the next post I’m going to discuss why many existing ‘top down’ data sharing initiatives have failed to take off because of lack of interest and support from the communities, and speculate about what kind of application, tool, or environment would be needed to enable scientists in bottom up research disciplines like biology, chemistry, medicine and materials to share their data in the controlled fashion they seem to prefer.

Enhanced by Zemanta

Who are electronic lab notebooks for?

Posted by Rory on October 13th, 2010 @ 10:48 am

In last week’s post I looked at what electronic lab notebooks are for, and said that they enable groups of researchers to conveniently carry out four central aspects of the research process:

  • Record experimental data and other kinds of information
  • Add structure to the data and information
  • Share the data and information
  • Communicate about their research

The answer to the question, ‘who are electronic lab notebooks for?’ is implied within that statement, namely ‘groups of researchers’.  This week I’d like to zero in on the makeup of a typical group of researchers in an academic lab, and ask the question, among that group, who benefits from use of an electronic notebook and why?  Just the PI?  Postdocs? Graduate students?  Research assistants and technicians?

PIs

Lets start with the PI.  In most cases the PI drives the decision to adopt an electronic lab notebook.  PIs benefit from their lab using an electronic lab notebook in lots of ways — some of the things they like include:

  1. Having everyone in the lab working in an integrated environment.
  2. Having everyone using common structures to document their research, e.g. for experiments and protocols.
  3. For online electronic lab notebooks, the ability to access their own work, and that of other members of the lab, 24/7, from any web browser, when they are at home in the evening or on the other side of the world at a conference.
  4. The ability for lab members to conveniently work in groups.
  5. The ability to link experimental data with other kinds of information, e.g. protocols, and with physical things like reagents.
  6. Having an integrated searchable archive of the lab’s work that allows them and other lab members to find and make use of work done by existing lab members and those who have already left the lab.

Postdocs

Postdocs don’t have the same global view on the lab’s needs as PIs, nor do they have the PI’s long term interest in the efficiency of the lab or creating a usable, searchable archive of its research.  But they almost certainly will be working closely with others in the lab — the PI, one or more graduate students, and perhaps a research associate or technician — on their own projects and other projects involving lab members.  For that they will benefit from many of the same group advantages of using and electronic lab notebook that are valued by PIs:

  1. Having everyone using common structures to document their research, e.g. for experiments and protocols.
  2. For online electronic lab notebooks, the ability to access their own work, and that of other members of the lab, 24/7, from any web browser, when they are at home in the evening or on the other side of the world at a conference.
  3. The ability for lab members to conveniently work in groups.
  4. The ability to link experimental data with other kinds of information, e.g. protocols, and with physical things like reagents.

In addition, PIs are likely to particularly appreciate the benefits an electronic lab notebook brings in terms of documenting their own research, including:

  1. The automatic introduction of structure into their research record.
  2. Ease of use.
  3. Having data well organized in a form that it can conveniently be incorporated into presentations and papers.

PhD students

PhD students, and other graduate students, are likely to value these last three benefits just as highly as postdocs.  And they, too, benefit from the group aspects of the electronic lab notebook. For example,with an electronic lab notebook they can share information and ideas online with others in the lab — the PI and other students whose brain they want to pick or whose experiments they are keeping up with.   An electronic lab notebook means easier access to the PI — it’s no longer necessary to corral the PI during office hours to look at your paper lab notebook — with an electronic lab notebook you can send the PI a message with a link to your latest experiment and ask the PI to comment when he or she has time.

Unlike the PI and postdocs, at the end of the day graduate students are focussed on getting a degree, and that means their own work is of paramount importance.  In this respect they are likely to particularly appreciate the fact that electronic lab notebooks provide the flexibility for some records to be shared and others to kept entirely private.  So graduate students (and of course others in the lab) benefit from the group and sharing capabilities of the electronic lab notebook while at the same time having their own private space.  They can keep some records private forever, and other records private until they are ready to be shared, or reviewed, by others in the lab.

Research associates and technicians

What makes the life of research associates and technicians different from that of PIs, postdocs and graduate students is that their focus  is not their own research but  the work of  others.  They are supporting others, and  helping to organize the lab and ensure that the lab and its equipment, computers, and systems are running as well as possible.  In that regard they will from the enhanced structure, organization, and communication the introduction of an electronic lab notebook brings to the lab.

  1. With the structured records you can set up in  an electronic lab notebook it is much easier to get buy in from everyone to use common forms and formats for things that everyone in the lab needs to use, like protocols.
  2. With an electronic lab notebook lab members can document their experiments in the same online environment that is used to store and share general information like meeting notes and protocols — that makes for much better organization.
  3. With the electronic lab notebook’s messaging system technicians and research associates can communicate with lab members 24/7, and do this in a targeted way — for example if they have a question about a protocol someone has submitted, they can send a message with the question and insert a link to the protocol, making it easier for the recipient to respond quickly.

The bottom line?  Everyone in the lab benefits from the introduction of an electronic lab notebook!

What are electronic lab notebooks for?

Posted by Rory on October 6th, 2010 @ 4:23 pm

1.  Introduction

Electronic lab notebooks enable groups of researchers to conveniently carry out four central aspects of the research process:

  • Record experimental data and other kinds of information
  • Add structure to the data and information
  • Share the data and information
  • Communicate about their research

Electronic lab notebooks differ from other tools used in recording experimental data, like paper lab notebooks and electronic media such as word documents, spreadsheets,and wikis, in that they enable researchers to carry out all four of these functions in an integrated, ideally online, environment.

2.  Recording experimental data and other information

Electronic lab notebooks enable recording of experimental data, and other information like meeting notes and protocols, in two ways. First, they allow import of data which has already been captured elsewhere — e.g. in word documents, spreadsheets and images.  Second, they permit direct recording of data in various forms — text, tables, images, etc.

3.  Adding structure to data and information

Like paper lab notebooks, but unlike other electronic media such as word documents, spreadsheets, and wikis, electronic lab notebooks enable research groups to bring structure to their data.  They do this in a variety of ways:

  • By providing the ability to use records which, unlike the blank page of the word document or wiki, themselves have structure.  This is illustrated In the example below, where the record has a series of fields; Alternative name, Source, Lab, etc.

  • With preformatted template records likely to be of use to many researchers, e.g. for experiments, antibodies, protocols,and  inventory
  • By providing the ability to create records with a structure desired by the user, and including a range of field types, such as strings, radio buttons, dates, etc.

The structure which is added to the research record is invaluable not only in terms of immediate organization, but also in terms of later search and archiving.  The field structure make it possible to conduct fine grained searches which go below the record level.  In the above example, the lab might have thousands of antibody records; taking advantage of the field structure it would be possible to search on all the’ validation status’ fields containing the term ‘No signal’.

Electronic lab notebooks also make it possible to build in a second level of structure through the ability to create links between records, for example between a record of an experiment and a record of an antibody used in the experiment.  Links are useful at this one-to-one level.  Moreover, by creating a series of links it is also possible to build databases, as reflected in the visualization below of a series of linked records.

class-diagram.png

4.  Sharing data and information

Electronic lab notebooks are designed to facilitate collaboration among a group of researchers. They do this with a permissions system that permits some records to be accessed by the entire group, some records to be accessed by subsets of the group, and some records to be kept entirely private.  In addition, they provide different kinds of access to different records or sets of records.  For example, the PI and the student conducting an experiment might have view and edit permission on the experiment record, so that the student could document the experiment and the PI could comment on it, and other members of the lab might have view only permission, so that they could observe and learn.

Electronic lab notebooks also permit permissions to be inherited by ‘child records’.  So, once the permissions are set on a particular project folder, all the experiments created within that folder have the same set of permissions, and it is not necessary to reset permissions each time a new experiment is set up.

Electronic lab notebooks also allow the creation of groups of users.  Typically there is an ‘all users’ group, and groups of smaller sets of users working together on particular projects.  Again, this makes setting permissions more streamlined.  For example, on records which everyone is to have access to, permissions are set for the all users group, and since everyone is a member of that group, it is not necessary to set permissions for each individual.

5.  Communicating

It’s pretty hard to collaborate if you can’t communicate, so good electronic lab notebooks include a messaging system.  This acts as an internal email capability, but it should also do more.  Ideally there should be the ability to make links in messages to other records in the ELN, so for example when a student sends a message to their PI to say that a particular experiment is ready for review and comment, the student can put a link in the message to the experiment record, so that all the PI has to do to access the record is to click on the link.

Online electronic lab notebooks are accessible 24/7 through any web browser, so they allow a new level of flexibility in communication between lab members.  No need to make an appointment during offfice hours to look at someone’s paper lab notebook.  You can now view it, and comment on it, at a time that is convenient to you, for example at home in the evening.  And when you are on the road you can stay in touch with the work that’s going on back in the lab because you can login over the internet, in the evening, between meetings, or whenever it suits you, and see what people have been doing.

Privacy versus sharing: electronic lab notebooks, Facebook and wikis compared

Posted by Rory on September 29th, 2010 @ 9:08 am

Common misconceptions about sharing and privacy in ELNs

A couple of weeks ago I fielded the following question (assertion, really) at a conference on data sharing and storage in biomedical research:

“An electronic lab notebook is not useful because everyone can see everyone else’s work — there’s no privacy.”

To which I responded that good ELNs have a permissions system that allows records to be kept private.

The person who asked the question, still on the attack, then said something to the effect of, that’s no good because people can’t share their data.

To which I responded that the permissions system in a good ELN allows fine level controls so that any record can be completely private, completely public to the entire universe of users, or accessible only to a particular group of users.   In other words, it supports privacy and sharing.

I was a bit taken aback by the aggressiveness of the questioner, and felt quite pleased with myself in that I had, I thought, successfully countered both lines of his attack  on ELNs.  But reflecting on the exchange afterwards, I began to have second thoughts.  The questioner said that he was in a research support role with a group of academic biomedical researchers.  So presumably his comments reflected concerns/preconceptions the researchers he works with have about ELNs.  And judging by the tack he adopted, the prevailing view about ELNs is not positive — they don’t allow privacy, or they don’t allow  sharing, and in any event they are inflexible.

ELNs:  neither Facebook nor wiki

I don’t know how representative these views are.  Since ELNs have yet to be widely adopted by academic scientists, it’s probably the case that few people have first hand experience with them, so whatever the prevailing view is, it will be based on vague impressions rather than a good set of information.    Many labs have adopted wikis for sharing general information like meeting notes and protocols, and most of these wikis will be inflexible, and not offer scope for keeping private records.  So it’s quite possible that people just assume that electronic lab notebooks are beset by the same restrictions.  It’s also possible that people assume ELNs are only capable of replicating the crude and inflexible privacy/sharing regime you get with your Facebook account.  In other words, many people probably project on to ELNs concerns they have with information sharing applications they are familiar with without any understanding of how sharing actually works in ELNs.

Fine-grained and flexible sharing in ELNs and the benefits it brings

In fact there are some key differences between the sharing/privacy system of Facebook, wikis, and ELNs designed for documenting and sharing experimental data. Here are three of them.

1. Sharing and privacy in ELNs is simpler than on Facebook, and more flexible than in wikis.

When you think about it, sharing on Facebook is very complex!  You’ve got three categories of things you can share — things you share, things on your Wall and things you’re tagged in, and then within each of these a whole variety of subcategories.  And then you’ve got a variety of categories of people you can share with — everyone, friends and friends of friends.    Most people ignore most of the sharing  functionality — the system is just too unwieldy.  It’s also very inflexible — the categories of what you can share and what kinds of groups you can share with are decided by Facebook, not you!

Sharing on wikis is at the other end of the spectrum:  exceedingly simple, but it’s even more limiting.  The way most wikis are configured you are part of one or more groups and the pages in that groups or groups can be viewed by everyone in the group.  In other words, there is no privacy!  And of course no flexibility, since the decision about what group(s) you are in is made by the administrator, not you.

In contrast to both Facebook and wikis, sharing and privacy in the best ELNs are (a) simple, and (b) flexible.  They are simple because they don’t require distinctions between different kinds of things that can be shared or between different categories of people that are involved in the sharing.  For any record in the system sharing is set in the same way. They are flexible because a record can be shared with one other person, with everyone, or with any subset of people  using the system at the discretion of the person setting the permissions, and a different sharing regime can be set for each record if so desired.

2.  ELNs give equal weight to individuals and groups

Facebook, like most social media, is designed around individuals — sharing is about individuals creating groups centering on themselves.  Wikis are just the opposite — they are designed around groups — individuals are slotted in to an environment which is focussed on achieving group objectives.  Neither of these extreme orientations is appropriate to  labs.  When you think about what makes a scientific research lab tick, it’s the fact that it is designed to facilitate both group and individual objectives.  So what a lab really needs is a collaboration and communication tool that has been designed with both individuals and the group in mind. Enter the ELN!  As noted, ELNs allow for some records to be completely private.  So a PhD student, for example, can have their private space where their experiments are accessible to no one but themselves.  But ELNs also allow for the flexible sharing described above, so records which everyone needs to see, e.g. lab protocols and meeting notes, can be made accessible to everyone, and the records in certain projects can be restricted to a specified set of users, e.g. just to the group of students working on the project and the PI.

3.  ELNs  enable  sharing of a particular kind of information — experimental data — in the same environment as other general information.

ELNs bring another kind of benefit to labs engaged in creating and sharing scientific data that is not supported by the sharing regime in either wikis or Facebook.  This is that they are specifically designed to handle sharing of experimental data, the bread and butter of labs engaged in scientific research.    They do this by making it easy to put structure into the research record.  And with structure comes better organization, more targeted search, and better archiving.  So current and future members of the lab can more easily find and use data which they, and other members of the lab, have entered into the ELN.

So that’s a brief overview of how ELNs facilitate both sharing and privacy, and enable labs and lab members to record and share experimental data.    They are superior to wikis in these respects, and they don’t suffer from the sharing and privacy concerns people have as a result of their experience with Facebook.   That’s not too surprising since ELNs have been specifically designed with labs in mind!

How to share and store data in an electronic lab notebook

Posted by Rory on September 23rd, 2010 @ 5:16 pm

In this blog I usually look at data sharing from the point of view of the core research unit, the lab.   That was the perspective I adopted a couple of weeks ago in a presentation, Electronic lab notebooks in biomedical research, at the Storing, Accessing and Sharing Data: Addressing the Challenges and Solutions event co-hosted by the Scottish Bioinformatics Forum and S3 in Edinburgh.  I’ll come back to that perspective in a minute, but first I’d like to contrast two very different institutional perspectives on data management described at the conference.

Sanger Institute:  centralized institutional data management

Phil Butcher, head of IT at the Sanger Institute, started with a high level overview of data management issues at Sanger.  He focussed mainly on the rapid growth in the amount of data generated at Sanger, and the other institutes with which it has large scale collaborations, and the issues relating to storing and finding data when there is so much of it.  The impression I came away with is that at Sanger data is viewed as an institutional matter, not something that individual labs or scientists manage or, apparently, have much of a say in.  That makes sense, because the research projects Phil mentioned were all large scale, involving large numbers of scientists, and the generation of huge amounts of data.  The title of Phil’s talk, Scaling up Science and IT: Sanger Institute’s Perspective, reflects the centralized approach.

London Research Institute:  decentralized institutional data management

The next speaker, Jeremy Olsen, head of IT at the London Research Institute, started by saying that based on Phil’s description of Sanger, the London Research Institute was very different indeed, more  a collection of individual research groups.  In describing his LRI  perspective Jeremy said that he would be sticking up for the “little guy”.  He proceeded to briefly overview how research is carried out at the LRI, introducing the various research groups and their research interests.  The LRI represents a very different paradigm from Sanger; at the LRI decentralization rules, as reflected by the title of Jeremy’s talk, Data Growth and Management in a Diverse Life Sciences Environment.  At the LRI there are fundamental issues relating to getting a handle on what research the various groups are involved in, what data they generate and how they manage it. Progress would need to be made on understanding  these issues before it would be possible even to consider a centralized approach to data management and what that might entail.

The lab: bottom up data management

When it came time for my presentation, I started by saying that if Phil was representing the centralized  institutional approach, and Phil was looking at  the “little guys” from an institutional perspective, I was going to look at the issue of data management and sharing from the point of view of the little guy him/herself, i.e. the PI.  In the academic context, it’s important to note that the Sanger model is the exception and the LRI  decentralized model is the rule.  In fact it is almost certainly the case that the LRI, decentralized as it is, is still towards the more organized and centralized end of the spectrum of academic biomedical institutions. That point was reinforced to me when speaking recently with the IT director of a medium – large biomedical research institute in Australia (800 people including 700 scientific staff).  His description of the issues he faced with getting a grip on what data there was in the labs at the institute, how they managed it (if they managed it all), and uncertainty about how to help PIs get a better handle on their data was uncannily reminiscent of Jeremy’s description of the situation at the LRI.

From the perspective of IT managers tasked with, among other things, trying to bring some order to the data generated by the research groups at their institution, to store it in a cost effective fashion and have it archived in a way that is useful in the future, multiple PIs generating ever increasing amounts of data may be a ‘problem’ to be managed or dealt with.  But from the PIs’ point of view it is their data and theirs to manage (or not) as they want.  There is a pretty fundamental difference in outlook here.

Electronic lab notebooks — part of the solution?

In my presentation I asked where electronic lab notebooks might fit into this picture, and whether they could have a role to play in crafting better data management solutions that meet the objectives of both PIs and IT directors.

ELNs tick some of the key boxes IT directors look for in best practice in data storage and sharing, including:

  1. Storing metadata in a structured fashion and ensuring controlled access.
  2. Effectively managing different data types, including attachments and imports.
  3. Allowing improved indexing  and search, through the use of structured metadata.

Electronic lab notebooks can also solve  the key data management problem facing many PIs:  coordinating a wide diversity of data type sets generated by a large number of people within the lab.  They can, that is, if they meet the following key requirements of today’s PIs:

  1. The ELN is flexible and can be set up the way the PI and their lab want it set up.
  2. It’s easy for the lab to transfer to the ELN.
  3. The ELN facilitates better exchange of information between members of the lab and, over time, better archiving.
  4. the ELN is web based and hence accessible anywhere, anytime.

So, electronic lab notebooks can help to solve the key data management  issue faced by  the core unit in academic institutions — labs.  And they provide a platform for data management that IT directors looking at the problem from an institutional perspective can work with.  As such they can be part of a solution which benefits both PIs, who are concerned with the research done in their group, and IT directors, who are concerned with the data generated throughout their institution.

What is an electronic lab notebook III: the benefits of structure

Posted by Rory on July 21st, 2010 @ 8:43 pm

The last post and the one before that looked at different views on who electronic lab notebooks are for — individuals or the lab — and how wikis measure up as environments that  enable lab members to enter and share experimental data.  Notwithstanding their attraction as convenient online tools for sharing general information, wikis lack structure, and it is primarily this which has kept even labs that use wikis wedded to the paper lab notebook for documenting experiments.

In this post we’ll look at why the ability to add structure to research data  is the key enabler permitting the transition from paper lab notebooks to electronic lab notebooks.

Paper lab notebooks support as much structure as you like.  You can create sections, paste copies of images, make notes in the margin, draw diagrams — the only limit to adding structure to a paper lab notebook is the scientist’s imagination.    Unlike a wiki, an electronic lab notebook allows you to replicate the structure that you put into a paper lab notebook.  Why?  Because an electronic lab notebook allows the creation of records with different kinds of fields.  This supports structuring your research data in two ways. First, the different types of fields support entry of information in differing ways, e.g. by date or time, by entry of text, by number, with radio buttons signalling  series of mutually exclusive options, etc.  Second, different classes of records can be put together with different combinations of various kinds of fields, so creating types of records that are appropriate to different aspects of research, e.g. a CHiP experiment, a freezer, a particular protocol,or an antibody.   This is in stark contrast to the wiki, which has only one type of record — the wiki page — and an undifferentiated one at that with no support for separation into different fields.

The benefits of this structure extend further to the other  things you use in your research like images and spreadsheets.  Like wikis, electronic lab notebooks have the advantage over paper lab notebooks of being able to make links to images and spreadsheets, which can also be inserted into the electronic repository — wiki or electronic lab notebook.  But electronic lab notebooks offer superior structuring capabilities in this respect too, because with an electronic lab notebook, unlike a wiki, you can associate a spreadsheet, image or other electronic item with a particular field of a particular kind within a record.

Making use of an electronic lab notebook’s ability to create records with different kinds of fields allows you to put structure into the record of your research in an online electronic environment much as you did with a paper lab notebook and at the same time gain the benefits of associations between bits of information which can only be made in an online environment, so they actually enable taking structuring of research data to a new and higher level.  It is this element – the ability to add structure to research data – which explains why that electronic lab notebooks — and not wikis — provide the best platform for labs  wishing to move from paper to electronic recording and management of their research data.  This is the unstated driver that lies behind wikipedia’s definition of electronic lab notebook as “a software program designed to replace paper lab notebooks“.    And so, I would revise that definition and say that an electronic lab notebook is an online environment that provides a sufficient capability for structuring research data to enable scientists to document and share their research data in that environment without the need to also resort to a paper lab  notebook.