Edinburgh Research Archive

Statistical models in prognostic modelling with many skewed variables and missing data: a case study in breast cancer

dc.contributor.advisor
Warner, Pamela
en
dc.contributor.advisor
Anderson, Niall
en
dc.contributor.author
Baneshi, Mohammad Reza
en
dc.date.accessioned
2010-11-05T16:31:20Z
dc.date.available
2010-11-05T16:31:20Z
dc.date.issued
2009
dc.description.abstract
Prognostic models have clinical appeal to aid therapeutic decision making. In the UK, the Nottingham Prognostic Index (NPI) has been used, for over two decades, to inform patient management. However, it has been commented that NPI is not capable of identifying a subgroup of patients with a prognosis so good that adjuvant therapy with potential harmful side effects can be withheld safely. Tissue Microarray Analysis (TMA) now makes possible measurement of biological tissue microarray features of frozen biopsies from breast cancer tumours. These give an insight to the biology of tumour and hence could have the potential to enhance prognostic modelling. I therefore wished to investigate whether biomarkers can add value to clinical predictors to provide improved prognostic stratification in terms of Recurrence Free Survival (RFS). However, there are very many biomarkers that could be measured, they usually exhibit skewed distribution and missing values are common. The statistical issues raised are thus number of variables being tested, form of the association, imputation of missing data, and assessment of the stability and internal validity of the model. Therefore the specific aim of this study was to develop and to demonstrate performance of statistical modelling techniques that will be useful in circumstances where there is a surfeit of explanatory variables and missing data; in particular to achieve useful and parsimonious models while guarding against instability and overfitting. I also sought to identify a subgroup of patients with a prognosis so good that a decision can be made to avoid adjuvant therapy. I aimed to provide statistically robust answers to a set of clinical question and develop strategies to be used in such data sets that would be useful and acceptable to clinicians. A unique data set of 401 Estrogen Receptor positive (ER+) tamoxifen treated breast cancer patients with measurement for a large panel of biomarkers (72 in total) was available. Taking a statistical approach, I applied a multi-faceted screening process to select a limited set of potentially informative variables and to detect the appropriate form of the association, followed by multiple imputations of missing data and bootstrapping. In comparison with the NPI, the final joint model derived assigned patients into more appropriate risk groups (14% of recurred and 4% of non-recurred cases). The actuarial 7-year RFS rate for patients in the lowest risk quartile was 95% (95% C.I.: 89%, 100%). To evaluate an alternative approach, biological knowledge was incorporated into the process of model development. Model building began with the use of biological expertise to divide the variables into substantive biomarker sets on the basis of presumed role in the pathway to cancer progression. For each biomarker family, an informative and parsimonious index was generated by combining family variables, to be offered to the final model as intermediate predictor. In comparison with NPI, patients into more appropriate risk groups (21% of recurred and 11% of non-recurred patients). This model identified a low-risk group with 7-year RFS rate at 98% (95% C.I.: 96%, 100%).
en
dc.identifier.uri
http://hdl.handle.net/1842/4191
dc.language.iso
en
dc.publisher
The University of Edinburgh
en
dc.subject
prognostic models
en
dc.subject
Tissue Microarray Analysis
en
dc.subject
Recurrence Free Survival
en
dc.subject
statistical modelling
en
dc.subject
cancer progression
en
dc.subject
breast cancer
en
dc.title
Statistical models in prognostic modelling with many skewed variables and missing data: a case study in breast cancer
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Doctoral
en
dc.type.qualificationname
PhD Doctor of Philosophy
en

Files

Original bundle

Now showing 1 - 2 of 2
Name:
Baneshi2009.doc
Size:
3.52 MB
Format:
Microsoft Word
Description:
File not available for download
Name:
Baneshi2009.pdf
Size:
1.52 MB
Format:
Adobe Portable Document Format
Description:
PhD thesis

This item appears in the following Collection(s)