Use of electronic health records for predictive modelling and cohort analysis in diabetes
Item statusRestricted Access
Embargo end date17/11/2023
McGurnaghan, Stuart John
The increasing use of electronic, rather than paper-based, healthcare records provides enormous potential for health research. However, research–enabling such electronic health (e-health) records presents considerable challenges across several inter-connected domains such as database design and construction, creation of verifiable research pipelines, development, and application of appropriate statistical methods through to solutions for governance and privacy issues. In this thesis, I describe the development of a platform and demonstrate its application in risk prediction, pharmacoepidemiology and disease prevalence. The work in this thesis describes i) the construction of a research platform based on e-health records from the total population of Scotland with diabetes ii) the use of this platform to address several questions about the epidemiology and pharmacoepidemiology of diabetes complications relevant to the care of people with diabetes. The thesis consists of two major themes i) technical development of the platform that I designed as a software developer to enable the research to be achieved and ii) the application of appropriate study design and statistical methods as an emerging epidemiologist to answer specific research questions. Chapter 1 contains an overview of some of the challenges and approaches taken in the construction and use of research data platforms from e-health records. The chapter focuses on the informatics theme of the thesis. The introductory background to the specific research questions is given within the question-specific chapters. Chapter 2 describes the research platform that I designed and implemented. It summarises the types of data available in the platform and describes the cohort of those with diabetes in Scotland. These data and participants are then selected in the subsequent specific research studies. As such, it forms the equivalent of an overarching methods chapter for the remainder of the thesis. The manuscript has been submitted for publication and at the time of thesis submission was under review. In Chapter 3, I describe work I conducted and published using the research platform to provide a contemporary snapshot of prevalence of cardiovascular disease (CVD) and cardiovascular risk factors in all people with type 2 diabetes in Scotland. Cardiovascular disease is the leading cause of death and loss of life expectancy in both type 2 and type 1 diabetes, but the past decades have seen advances in the understanding of disease pathogenesis and its prevention. However, I found that prevalence remains high (about one-third of those with type 2 diabetes had already been diagnosed with CVD) and that two-thirds had suboptimal control of at least two modifiable risk factors for CVD. This demonstrated the ongoing impact of CVD in diabetes and delineated areas of unmet need with respect to known risk factor control. In Chapter 4, I demonstrate the use of the research platform for real-world pharmacoepidemiology. Specifically, it describes a study I conducted and published to test the effect of a new diabetes drug (dapagliflozin) on cardiovascular risk factors in people with type 2 diabetes. At the time of the study, it was unknown whether this drug would achieve the same effects in the real-world as it had in the much more idealised setting of short duration clinical trials. The analysis found that reductions in HbA1c, SBP, and BMI were equivalent to that of the clinical trials, and importantly that these effects were sustained over the median 210 day follow-up. Chapter 5 describes the quantification of current CVD incidence rates in people with type 1 diabetes in Scotland and the use of the research platform to construct a CVD risk prediction model for people with type 1 diabetes. I then used the Swedish national diabetes register to validate the generalisability of the model. There is considerable debate among clinicians about the appropriate age and circumstances under which cardiovascular risk modifying therapies (in particular, statins) should be prescribed in type 1 diabetes. Clinical guidelines assume a very high absolute rate of CVD from an early age and advocate basing treatment-initiation decisions either on the expected absolute rate over the ensuing decade or in some cases lifetime risk. Yet contemporaneous data on the actual current experienced risks are lacking. I found the current absolute rates to be much lower than the guidelines implicitly assume. I showed that under current guidelines, >90% of those aged 20-40 years and 100% of those >=40 years with type 1 diabetes were eligible for statins, but it was not until age 65 upwards that 100% had a modelled risk of CVD >=10% in 10 years, the threshold for statin use in the general population. The CVD prediction model I constructed was well calibrated and achieved high predictive performance in both Scotland and Sweden. The results should prove useful to facilitate individualised discussions regarding appropriate prescribing and the rationale for prescribing. In chapter 6, I describe my use of the research platform to rapidly address a sudden major challenge in diabetes, namely that of the SARS-CoV-2 pandemic. At the outset of the pandemic in 2020, a high preponderance of diabetes among those being admitted to hospital with COVID-19 was being reported. Yet most studies lacked a population denominator, so it was unclear just how large the risks were associated with diabetes, and if the clinical risks (of hospitalisation, admission to a critical care facility and death) were predictable among those with diabetes. It was also unclear whether all those with these risks should be in shielding. I found that by the end of the first wave of the pandemic the risk of severe COVID-19 was elevated 2.4 fold in type 1 diabetes and 1.4 fold in type 2 diabetes (much less than what was generally thought). The prediction for the cross-validated predictive model of severe COVID-19, retained 11 factors in addition to age, sex, diabetes type and duration and had good predictive performance (C-statistic of 0.85). The study results were reported to government and diabetes stakeholder groups early in the pandemic and were used to inform shielding policy. Chapter 7 presents an overall discussion of the work of the thesis, the lessons learned and future work focusing on the research platform development theme. The key findings and advantages of the work I have described in generating the data platform are: the importance of separating the analysis from the data; the ability to accommodate various data types from different data sources (flexible data input); longitudinal database formatting; the importance of an accurate metadata dictionary; a verifiable research pipeline; the compliance with governance, and the ability to generate synthetic datasets. The work in this thesis provides evidence of the feasibility and usefulness of harnessing e-health records for important and timely research that impacts on people with diabetes. It provides insights and exemplars of use that should be helpful for others in the field trying to develop such platforms for diabetes or other disease areas. Future work includes the development of a pharmacoepidemiology pipeline, allowing rapid analysis of safety and effectiveness outcomes, given a wide variety of exposures, and the ability to incorporate genetic and ’omics data.