Applying missing data methods to routine data using the example of a population-based register of patients with diabetes
Files
Item Status
Embargo End Date
Date
Authors
Read, Stephanie Helen
Abstract
BACKGROUND:
Routinely-collected data offer great potential for epidemiological research and could be
used to make randomised controlled trials (RCTs) more efficient. The use of routine
data for research has been limited by concerns surrounding data quality, particularly data
completeness. To fully exploit these information-rich data sources it is necessary to
identify approaches capable of overcoming high proportions of missing data.
Using a 2008 extract of the Scottish Care Information – Diabetes Collaboration (SCIDC)
database, a population-based register of people with a diagnosis of diabetes in
Scotland, I compared the findings of several methods for handling missing data in a
retrospective cohort study investigating the association between body mass index (BMI)
and all-cause mortality in patients with type 2 diabetes.
METHODS:
Discussions with clinicians and logistic regression analyses were used to determine the
likely mechanisms of missingness and the relative appropriateness of a selection of
missing data methods, such as multiple imputation. Sequentially more complicated
imputation approaches were used to handle missing data. Cox proportional hazard
model coefficients for the association between BMI and all-cause mortality were
compared for each missing data method. Age-standardised mortality rates by categories
of BMI at around the time of diagnosis were also presented.
RESULTS:
There were 66,472 patients diagnosed with type 2 DM between 2004 and 2008. Of these
patients, 21% of patients did not have a recording of BMI at time of diagnosis.
Amongst patients with complete BMI data, there were 5,491 deaths during 296,584
person years of follow-up. Amongst patients with incomplete data, there were 2,090
deaths during 79,067 person-years of follow-up. Analyses indicated that the primary
mechanism of missingness was missing at random, conditional on patient year of
diagnosis and vital status. In particular, patients with missing data had considerably
worse survival than patients without missing data. Regardless of the method for
handling the missing data, a U-shaped relationship between BMI and mortality was
observed. Compared to complete case analysis, the association between BMI and alliii
cause mortality was weaker using multiple imputation approaches with estimates
moving towards the null. Closest observation imputation had the smallest effect on
estimates compared to complete case analysis.
Risk of mortality was consistently highest in the less than 25kg/m² BMI group. For
example, estimates obtained using multiple imputation using chained equations
indicated that patients with a BMI below 25kg/m² had a 38% higher risk of mortality
than patients in the 25 to less than 30kg/m² BMI category.
CONCLUSIONS:
Alternative methods to complete case analysis can be computationally intensive with
many important practical considerations. However, it remains valuable to explore the
robustness of estimates to departures from the assumptions made by complete case
analysis. The use of these methods can preserve the sample size and therefore may be
useful in developing risk prediction scores.
Mortality was lowest amongst overweight or obese patients relative to normal weight.
Further work is required to identify optimal approaches to weight management amongst
patients with diabetes.
This item appears in the following Collection(s)

