Edinburgh Research Archive logo

Edinburgh Research Archive

University of Edinburgh homecrest
View Item 
  •   ERA Home
  • Information Services
  • Dealing with Data
  • View Item
  •   ERA Home
  • Information Services
  • Dealing with Data
  • View Item
  • Login
JavaScript is disabled for your browser. Some features of this site may not work without it.

eDataShield: Running an analysis of combined data when the individual records cannot be combined

View/Open
3-2-dwd_raab_2020.pptx (8.574Mb)
Date
15/01/2020
Author
Raab, Gillian
Dibben, Chris
Metadata
Show full item record
Abstract
In social or epidemiological research comparable data are often collected by agencies in different settings, e.g. in different countries or by different organisations. Disclosure concerns may prevent the agencies releasing their data to outside users. Comparison of results between the different agencies may be carried out by running separate analyses in the safe haven provided by each agency and comparing the published reports. This approach has several disadvantages. One can never be sure that the data sets and variables which are nominally the same are really comparable. An analysis that adjusts for covariates in each individual agency will not be identical to what one would obtain if the raw data were pooled. Tests for agency by covariate interactions are not readily carried out from published reports. A similar situation has arisen in the analysis of genomic data, where a pooled analysis of small individual studies is required for adequate inference, but the individual centres did not wish to share their data. In response to this problem the DataSHIELD system was developed (see www.datashield.org) where a joint analysis is carried out iteratively by linking the computer in each centre to an analysis computer (AC). The AC holds no raw data, but receives summary statistics from each of the individual studies, combines them, and passes the combined summaries back to the individual centres. This allows joint analyses using generalised linear models to be fitted by iterating this exchange of summary statistics. The interface between the AC and the other centres prevents any raw data being exchanged. When disclosure concerns would not allow centre computers to be linked in this way it is possible to adapt this procedure by exchanging summaries between agencies by email. Routines in R have been developed to allow such analyses to be carried out – the e-DataSHIELD protocol. We will describe how this process works and present an example of some analyses that used data from the Scottish Longitudinal Study and the ONS Longitudinal Study (England and Wales) to compare mortality between individuals living in urban centres in the two countries.
URI
https://hdl.handle.net/1842/36691
Collections
  • Dealing with Data

Library & University Collections HomeUniversity of Edinburgh Information Services Home
Privacy & Cookies | Takedown Policy | Accessibility | Contact
Privacy & Cookies
Takedown Policy
Accessibility
Contact
feed RSS Feeds

RSS Feed not available for this page

 

 

All of ERACommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsPublication TypeSponsorSupervisorsThis CollectionBy Issue DateAuthorsTitlesSubjectsPublication TypeSponsorSupervisors
LoginRegister

Library & University Collections HomeUniversity of Edinburgh Information Services Home
Privacy & Cookies | Takedown Policy | Accessibility | Contact
Privacy & Cookies
Takedown Policy
Accessibility
Contact
feed RSS Feeds

RSS Feed not available for this page