Edinburgh Research Archive

A critical look at methods for protection of privacy in the 2015 Charter for Safe Havens in Scotland for handling unconsented data from NHS patient records to support research

dc.contributor.author
McKeigue, Paul
en
dc.date.accessioned
2018-01-25T13:20:53Z
dc.date.available
2018-01-25T13:20:53Z
dc.date.issued
2017-11-22
dc.description.abstract
To allow e-health records to be exploited for research without individual-level consent, The 2015 Charter for Safe Havens in Scotland recommended that datasets should be made available only through “safe haven” data warehouses. To protect privacy, the Charter laid down that linked datasets should be kept only for the “minimal time necessary”, and that analytical outputs should be manually checked for “statistical disclosure” before being made available for the user to copy. These measures reduce research productivity and restrict the detailed work required to construct cohorts to study the outcome of chronic disease, without necessarily protecting privacy against attacks that might be realistically be mounted. This presentation examines how to protect against these threats. The risks to privacy in these deidentified datasets are either from reidentification attacks by a trusted user with access to individual-level data, or from attribute disclosure attacks that exploit aggregate data released for publication. These attacks depend critically on the availability of “side information” on the targeted individual. Basic principles of information theory can be applied to quantify privacy as the entropy of a probability distribution (measured in bits), and to protect privacy by limiting the ability to use side information. “Differential privacy” techniques, which protect against arbitrary side information, are unnecessarily stringent in the context of e-health records, where only a few of the variables in the e-health record are likely to be available to an attacker who does not already have access to the e-health record. The variables most likely to be used for side information - gender, geographical location (health board), and year of birth - contain about 12 bits of information, compared with an entropy of about 22 bits for the probability distribution over identities. For a successful reidentification attack, additional side information would be needed. A single hospital admission date, which might be available to an attacker, contains about 10 bits of information. Adding noise to dates of events, while preserving the sequence of these events and intervals between events, would provide additional protection against such attacks. For aggregate data, standard rules for statistical disclosure control such as the rule of at least 5 individuals in every cell of a frequency table, do not distinguish betweeen variables such as lab results that are unlikely to be available to an attacker, and variables such as geographic location. More fundamentally, the rules do not take account of how the information leak accumulates over multiple variables: in the most extreme case, where the attacker has access to an individual’s genotypes, a successful attribute disclosure attack could be mounted using data aggregated from thousands individuals. Risks to privacy are much higher when e-health records are linked to social data, as these datasets contain more variables that are likely to be available to an adversary and could be exploited as side information. The current “one-size-fits-all” approach, which does not distinguish between high-risk and low-risk linkages, imposes unnecessary constraints on research using only e-health records in which the risk to privacy is minimal. A video of this presentation can be viewed at https://media.ed.ac.uk/media/0_taraqidu
en
dc.identifier.uri
http://hdl.handle.net/1842/26036
dc.language.iso
en
dc.subject
Research Data
en
dc.subject
Clinical data
en
dc.subject
Safe Haven
en
dc.title
A critical look at methods for protection of privacy in the 2015 Charter for Safe Havens in Scotland for handling unconsented data from NHS patient records to support research
en
dc.type
Presentation
en

Files

Original bundle

Now showing 1 - 1 of 1
Name:
S2-3-Methods_for_protection_of_privacy_in_safe_havens-P_McKeigue.pdf
Size:
447.6 KB
Format:
Adobe Portable Document Format
Description:

This item appears in the following Collection(s)