Enabling stroke and blood pressure research in UK Biobank
Woodfield, Rebecca Mary
Background Blood pressure is one of the most important modifiable risk factors for stroke. Although the influence of an individual’s average blood pressure (BP) on their overall stroke risk is well established, visit-to-visit blood pressure variability (BPV) - variation in blood pressure from one clinic visit to the next- may be an independent risk factor for stroke. The influence of BPV on stroke risk in the general population is not fully understood, nor is it known whether associations with BPV vary by pathological stroke type. Very large prospective studies, including exposure measurements of BP and BPV as well as accurate identification, confirmation and sub-classification of large numbers of stroke cases during follow-up, are needed to test the associations between BP parameters, stroke and its main pathological types. UK Biobank (UKB) is a very large prospective cohort study of ~500,000 middle aged adults recruited from England, Scotland, and Wales between 2006 and 2010. Participants completed a detailed baseline assessment at recruitment (which included self-report of prior stroke and BP measurement). Follow up for health-related outcomes (including new occurrences of stroke) in UKB relies on linkages to routine coded datasets for hospital admissions, death registrations and primary care data. Coded primary care data could also be used to capture novel exposures, like blood pressure variability (BPV). In this thesis, I aimed to investigate how large prospective epidemiological studies such as UK Biobank might be used to investigate the influence of BP, and in particular BPV, on stroke and its types and subtypes. I did this through advancing understanding of the identification and characterisation of stroke cases in large prospective studies, and of obtaining measures of BPV from linked primary care data. Specifically, I aimed: (1) to evaluate the accuracy of patient self-report of stroke, the accuracy of routinely available coded healthcare data for stroke, and the reliability and feasibility of ischaemic stroke classification systems for large epidemiological studies such as UKB; (2) to identify prevalent and early incident stroke cases in UKB using multiple overlapping sources of coded data, and determine the proportions of cases classified into main pathological types of stroke; (3) to explore the feasibility of using coded primary care data to obtain measures of BPV in UKB. Methods (1) I performed a series of systematic reviews of published data on (i) the accuracy of patient self-report of stroke, (ii) the accuracy for stroke and its main pathological types (ischaemic stroke, intracerebral haemorrhage, subarachnoid haemorrhage) of International Classification of Diseases (ICD) coded hospital admissions and death certificates, and Read coded primary care records, and 3) the inter-rater reliability of ischaemic stroke classification systems. (2) Informed by this work I identified prevalent and early incident stroke cases in UKB using linked coded hospital and death registration data as well as self-report data. In a sub cohort of participants, I was able to assess the additional role in case identification of linked coded primary care data. I compared the numbers of potential stroke cases ascertained by multiple overlapping combinations of these data and examined the proportions classified into the main pathological stroke types. (3) Finally, I analysed data from about 10,000 Welsh UKB participants with linked coded primary care data to identify those in whom visit-to-visit BPV could be measured using coded systolic blood pressure values (BP). I explored the association between frequency of visits with coded BP values and: participant characteristics; time between visits; mean BPV; standard deviation of BPV (SD BPV). I also calculated within-individual agreement between coded BP and UKB baseline assessment BP. Results (1) From my systematic reviews I found that self-report accuracy was strongly influenced by characteristics of the study population. In populations with low stroke prevalence up to 75% of self-reported strokes were false positives. ICD codes for cerebrovascular diseases had a broad range of accuracy for stroke and its main pathological types, but appropriately selected, ‘stroke specific’ ICD codes were consistently >70% accurate when compared to an independent reference standard for stroke. Few studies assessed the accuracy of either primary care data or combinations of data sources for stroke. The overall inter-observer reliability of ischaemic stroke classification systems ranged from moderate to almost perfect. Study characteristics other than classification system accounted for much of the variation in reliability. Additional features which enhanced reliability included use of clear rules, data abstraction protocols, computerised assignment, and reduced number of subtype categories. (2) The prevalence of stroke in UK Biobank based on linked ICD coded hospital admissions data and participant self-report was ~1.7%. The majority of these prevalent stroke cases were of ‘unspecified’ stroke type. Incident strokes captured by ICD codes were mostly hospital admitted cases, but a smaller additional proportion were fatal cases not detected in hospital admissions data. The majority (~89%) of ICD coded incident strokes were a specified pathological type. In the sub-cohort of UKB participants with additional primary care data linkage ~20% of potential incident stroke cases were detected by coded primary care data alone. (3) Among Welsh UKB participants with linked primary care data, around two thirds had sufficient coded data to estimate visit-to-visit BPV any time before recruitment, and just under half had sufficient coded data to estimate BPV during the 5 years before recruitment. Selecting participants with more visits reduced generalizability, but there was good variability in BPV amongst those selected (standard deviation in BPV range ~5mmHg to ~7mmHg), and reasonable agreement between coded BP and BP recorded at the UKB baseline assessment (intra class correlation coefficient 0.53, 95% CI 0.52 to 0.55). Conclusions This work will inform the approaches to stroke outcomes ascertainment and the measurement of a novel exposure, blood pressure variability, in UK Biobank. This will enable future exploration of the associations between blood pressure parameters, stroke, and its main types and sub-types in UK Biobank. Investigating these associations will improve our understanding of causal pathways for the different pathological types and sub-types of stroke and underpin increasingly targeted strategies to modify BP for stroke prevention.