What kind of application would enable controlled data sharing among scientists: Google Docs? Facebook? Something else?

Posted by Rory on October 27th, 2010 @ 9:49 am

In the last post I looked at a study which concludes that researchers in the life sciences are willing to share their data, but only on their own terms:  they want to decide which data to share, with whom, and when. My starting point was a recent article by Bryn Nelson lamenting the fact that data sharing is the exception rather than the rule among scientists.  In the article he gives a number of examples — some successful and some not — where top down data storage/sharing repositories have been established for particular disciplines and particular kinds of data.

Are these kinds of repositories a model for what might encourage widespread data sharing by scientists in bottom up research fields  like life sciences?  I think not, because these top down, centralized repositories generally remove control from the individual or group who is contributing the data.  They are really focused on data storage rather than data sharing. In most cases, moreover, once the data has been contributed it is open for all to see — that is the point of the exercise.  So it’s not surprising that when databases like these are offered to the researchers in the bottom up fields, they usually opt not to contribute their data.  There are few incentives to make contributing their data  an attractive proposition, and it’s yet another administrative burden.

An environment (or system, application, call it what you will) that would stand a better chance of attracting large numbers of scientists is one that does what they want — i.e. lets them  share that part of their research  data they want shared, with whom, and when.  The only kind of environment that will maks that possible is one that puts individual scientists, and groups of scientists, at the center, and in control. That is a bottom up application, not a top down, centralized application like the ones noted by Bryn Nelson.

Google Docs

Is there a model for the kind of application that might work?   How about Google Docs, which is already used by many scientists to share documents, spreadsheets and presentations? Google Docs allows users to share information in a way they control. However, it  lacks a number of necessary capabilities  to enable scientists share their research data when and with whom they want.  These include:

  1. The ability to create records with structure — i.e. the kind of structure scientists are used to putting into their paper labbooks to record experimental data
  2. The ability to link between records
  3. An audit trail of changes made to records
  4. A messaging capability

Facebook

What about Facebook?  Like Google Docs, Facebook permits sharing of information in ways the user controls.  Facebook, moreover, has the ‘social’ features that Google Docs lacks, particular the ability to communicate with other users.  But Facebook has its own set of shortcomings as a potential tool for sharing scientific research data. First, it is viewed as a tool for communicating with friends and family about social rather than work matters.  Second, there are serious problems with the privacy — or rather lack thereof — of data people put in Facebook, which might be acceptable for personal data but is not for scientific research data.  The most fundamental problem, however, is that Facebook, like Google Docs, does not provide support for recording and sharing experimental data because it does not provide the ability to create records with structure.

Essential elements of a data sharing application for scientific research

If neither Google Docs nor Facebook looks like a suitable candidate for a general data sharing application/environment for scientists in bottom up disciplines like biology, chemistry, medicine and materials, the above discussion does provide an idea of the capabilities such an application/environment would need to have:

  1. An individual, user-centric focus
  2. The ability for individual users to control with whom they share data, and when
  3. The ability to create records with structure so that experimental data can be recorded
  4. The ability to create links between records
  5. An audit trail of changes made to records
  6. A messaging capability

If and when an application with these elements becomes available, it would stand a good chance of being taken up by large numbers of scientists in bottom up disciplines like biology, chemistry, medicine and materials.  And that would bring benefits not only to the individual scientists themselves and those with whom they are directly collaborating, it also would lead to a far greater percentage data that is generated being, first, captured electronically, and, second, shared.

What kind of data sharing do scientists want?

Posted by Rory on October 20th, 2010 @ 6:21 pm

In a recent article, Data Sharing:  Empty Archives, Bryn Nelson points out that although

“Some [scientific] communities have been quite open to sharing [data] . . . those discipline-specific successes are the exception rather than the rule in science. All too many observations lie isolated and forgotten on personal hard drives and CDs, trapped by technical, legal and cultural barriers.”

The communities where most data is not widely shared include disciplines like biology, chemistry, medicine and materials where the bulk of experimental data is generated by individual researchers and labs.  In this post I’d like to take a closer look at data sharing practices and attitudes in these ‘bottom up’ research disciplines.  Let me disclose my bias at the outset — this is that data sharing practices in these disciplines should be determined by the members of the community, not by funding bodies or other policy makers.  If data sharing is to become more widespread, it should be because the members of the community want that to happen, and it should happen in ways that they determine.

The starting point has to be existing attitudes to data sharing in the communities. A recent study of seven labs doing various forms of biology: Patterns of information use and exchange:  case studies of researchers in the life sciences came up with some interesting observations about those attitudes.  What the study found was that life sciences researchers

“are in principle in favour of sharing many kinds of information, and are remarkably willing to do so in order to facilitate each other’s research, without any apparent formal reward.  Thus information is extensively shared within research groups and laboratories, and informally across organisational boundaries through wider research networks, both before and after formal publication.  This may include sharing . . . standard operating procedures, plasmids, computer programmes, scripts and statistical analysis tools.”

So researchers are willing to share  information, but on their own terms.  Essentially they are willing to share information with people they are collaborating with, inside and outside the lab.  In other words, researchers are willing to share information with whom they the researchers want.

The study went on to report that

“Researchers are much more ready to share methods and tools than experimental data . . .  They are reluctant to share the data that makes up their ‘intellectual capital’.  In particular, they are wary of giving away their data for someone else to analyse and get the credit.  “

Researchers are willing to share experimental data, but only subject to two provisos

  • “First they are concerned that they need sufficient time to complete the analysis and, in some cases, to explore intellectual property rights . . .
  • Second they want to publish their results before or simultaneously to publishing their data — and they want to be the ones to publish the data.”

So researchers are also willing to share experimental data, but again only on their own terms.  And those terms are that researchers want to decide what data gets shared and when it gets shared.

Seven labs is hardly a representative sample of the hundreds of thousands if not millions of labs in ‘bottom up’ research disciplines like biology, chemistry, medicine and materials.  And yet the attitudes noted by the study have a strong feeling of familiarity about them, and it’s not implausible to assume that they are  broadly representative of  attitudes that are widespread throughout these disciplines.  The message from the scientists interviewed  in the study is loud and clear:  they are willing to share their data only when they can decide which data to share, whom to share it with, and when.

In the next post I’m going to discuss why many existing ‘top down’ data sharing initiatives have failed to take off because of lack of interest and support from the communities, and speculate about what kind of application, tool, or environment would be needed to enable scientists in bottom up research disciplines like biology, chemistry, medicine and materials to share their data in the controlled fashion they seem to prefer.

Enhanced by Zemanta