Where data and journal content collide: what does it mean to 'publish your data'?
This presentation uses two studies to inform and illustrate thinking by a PI on what data could be considered as candidate for deposit into the University’s Datashare repository. The findings from these two studies have featured in conference presentations and in blogs but have not yet been ‘fixed’ in a journal article. This has also prompted thoughts on when it is sensible to deposit data and the prospect of doing so early under a pre-publication embargo. Three types of data are considered. The first type of data (Type A) comprise sources of data that are external to a project, the databases that are drawn upon in a study; although they are cited in publications they are often not the responsibility of the PI. The second type (Type B) are the assembled datasets that were used in the analyses, the findings from which are reported and used as evidence in published scholarly statements. The third type (Type C) are the ‘supplementary data’ (the data behind the graph) which enhance the publication of the results reported in scholarly statement, forming what might be regarded as a multi-part work on the Web. Study 1 forms part of the Hiberlink project funded by the Andrew Mellon Foundation into ‘reference rot’ which goes beyond link rot (404, not found) to engage with ‘content drift’ (when the content referenced at the end of the link has evolved, has changed dramatically, or has disappeared completely). This is an exploratory investigation into ‘reference rot’ for the references (c.46,000 URIs) made from c.7,000 e-theses doctoral theses to web-based resources. This project is being carried out at the University of Edinburgh (at EDINA and the Language Technology Group in the School of Informatics) jointly with the Research Library at Los Alamos National Laboratory. It has its main focus on references from hundreds of ‘000s of journal articles to the ‘web at large’: progress is reported at http://hiberlink.org . Study 2 is an ‘unfunded’ investigation associated with the Keepers Registry, a service that is run for Jisc to monitor which e-journals are being kept safe by the world’s archiving organisations in order to highlight which should be regarded as at risk of loss. As this is ‘indirectly funded’ it is unlikely to be noticed by the research office. This study has involved use of the logs of the UK OpenURL Router, distilling the 10.4m requests made annually by researchers and students in UK universities for articles through this link resolver service. The c.53,000 online serials identified by ISSN were cross checked against the reports into the Keepers Registry, noting that less than one third of those were being kept safe. Further details are found in Burnhill (2013) and by following links on http://edina.ac.uk . Opportunity is also taken to look to the new research objects that are ‘resident on the Web’, including the implications that may have for the integrity of the scholarly record given the dynamics of the Web. Not only is the Web becoming a dominant means to access scholarly statement but it is also enabling rich aggregations of linked content into composite web-based research objects, to include data as intrinsic to the statement. Moreover, as scholarly statement has become digital, so it has become malleable, with challenges to notions of fixity, citation and continuity of access. This shift to a broader view of scholarly works in digital format should not be regarded as completely new and alien but builds on an observation made thirty years ago by Sue Dodd in the pre-Web era of the Internet: “In the near future, libraries will have no choice but to become more involved with computerized files and programs. … There is no doubt that machine-readable data will play an even greater role in research and development programs of the future,” (Dodd, 1982 in Burnhill, 2014).