A couple of days ago I was alerted on Twitter by, I think, Simon Hodson, to the publication of the Research Councils UK policy on access to research data. I had not been anticipating this with too much anxiety as so many conversations I have had, in person and via social media, assured me that the default position was going to be open access. I was even more reassured by an editorial by David Stuckler and John Lynch with the great title "In God we trust, all others must bring data" in the International Journal of Epidemiology that I wrote a blog about on 1 January this year. In making a case for their new initiative to bring together a depository of health data they point out that:
"Available data often go unused because they are not well enough
documented, lack accessible how-to guides for their use, or knowledge about the
resource is passed on informally within research groups or collaborations. Some
data may also require analytical skills that are in short supply; or people may
simply be unaware of their existence or unable to access them."
The UK Data Archive has, for decades, been working to make sure that this does not happen to publicly funded (and even some privately funded such as the Health And Lifestyle Survey) social science data. Any project funded by the ESRC is obliged to deposit its data in the Archive within a short period (I think it is 6 months). The UKDA practices for data curation are well established. Most of the large and complex data sets can be downloaded in abut 2 minutes by any bona fide academic who has registered the title of the project for which the data will be used. The safeguards for individual confidentiality reside in the anonymisation of the records, and one of the few restrictions to this open access comes when the research requires information on area of residence. Nowadays it is possible using Geographic Information Systems to link things like temperature, rainfall, the location of certain kinds of facilities and property values to individual data. But the Archive judges that for example adding the Postcode Area to the openly available data is too risky, so this has to be done under more restrictive conditions. In all this time (at least 30 years) there has never been one single case of any individual's privacy being threatened.
However, what is clear from the recent policy document from RCUK is that it is no longer the threat to confidentiality of data that forms the major barrier to open access. This won't come as a huge surprise to a lot of people, but the big barrier is what RCUK term 'intellectual property'. I have often been asked "why should we sweat our guts out collecting data when we just have to give it away?". And this is what a lot of people who work in epidemiology feel.
Why is there this difference in attitude between people who work in the social sciences and in epidemiology? People in both disciplinary areas collect data. It is always hard work. In economics individual academics don't so often collect their own data as the 'classical' economic data is collected routinely. But economists also do a lot of 'micro-economics' using data from the British Household Panel Study and the English Longitudinal Study of Ageing as well as birth cohort studies. I have never heard one of my economics colleagues after participating in the design of these studies claim 'intellectual property' over, for example, the data on income, wealth, pensions and so on.
I have fought several battles (mostly unsuccessful) to get measures of physical functioning into various birth cohort and panel studies. But it would never cross my mind that I own 'intellectual property' in the data. The ideas behind my desire for these measures might be regarded as 'mine'. But when you know why you want to collect a certain measure you have the most enormous flying start. As long as you get going on the research question in a timely way no one is gong to steal your 'property'. And if you don't get going in a timely way then other people must be allowed to do so. Anything else is a misuse of public money. The ethical underpinning of health research is that we promise the people who allow us to stick pins in them and make them blow into tubes that the results will be used to improve public health, not to advance our own careers.
So I strongly disagree with the position taken by RCUK that those who collect data should have sole ownership of it "until their major research questions have been answered". That is a charter for slowing down the use of new information for the public good.