Bayesian methods for biomarker evaluation and disease diagnosis
Item statusRestricted Access
Embargo end date22/03/2023
Garrido Guillén, Javier
Accurate diagnosis of disease is of fundamental importance in medical research and clinical practice. For such reason, the role that diagnostic testing and screening play is undeniable. The major goal of a diagnostic test is to distinguish between individuals with a well-de ned condition, referred as diseased, and individuals with the absence of such condition, known as nondiseased. Before a test is widely used in practice, its discriminatory ability must be rigorously assessed through statistical analysis. The overlap coe cient, which is de ned as the proportion of overlap area between two density functions, has gained unarguably popularity as a summary measure of diagnostic accuracy and, in this thesis, it is our main object of study. In the rst chapter of this thesis, we introduce di erent concepts related to diagnostic tests and how its accuracy might be measured. A brief description of the receiver operating characteristic (ROC) curve, one the most popular existing statistical methods to evaluate the discriminatory ability of a test, is provided as well. We then de ne the coe cient of overlap and we discuss its advantages and disadvantages over usual summary measures, namely, the area under the ROC curve and the Youden index. At the end of the chapter, we recognize that, as it has been acknowledged in the literature, the performance of a diagnostic test may depend on covariates (e.g., age and/or sex) and failure to incorporate this information may result in misleading or oversimpli ed conclusions about the accuracy of the test. In the second chapter we provide a brief introduction to Bayesian inference and Bayesian non-parametric methods, as we have adopted the Bayesian paradigm throughout this thesis. In the third chapter of this thesis, we develop Bayesian inferential methods for the coefficient of overlap. Accurate estimation of the coefficient of overlap requires accurately estimating the density functions of test outcomes in both the diseased and nondiseased populations. For such end, we employ a Dirichlet process mixture of normal distributions to model such density functions. Once estimates of the density functions of test results have been obtained, two estimators for the coefficient of overlap are then proposed: one based on numerical integration and another one that further uses the Bayesian bootstrap. Our integrated framework relaxes restrictive distributional assumptions (e.g., normality of test outcomes in each population) of existing approaches. The performance of our methods is assessed through a simulation study and we also provide an application concerned with the search for ovarian cancer biomarkers. In the fourth chapter of this thesis, we extend our exible modelling approach for the coefficient of overlap to the covariates' context. We follow a joint approach based on Dirichlet process mixtures, where both test outcomes and covariates are modelled jointly through a multivariate kernel. We use di erent simulated examples to evaluate the performance of our modelling approach and we provide an application to two real datasets as well. The rst application concerns the assessment of the accuracy of the glucose levels as a marker for diabetes changes with age. In turn, in the second application the goal is to study the effect of age and sex on the discriminatory ability of different biomarkers for the Alzheimer's disease. In the fifth chapter, we include vignettes and examples showing the usage of the R package OverlapCoefficient, which implements our methods. Finally, we discuss future working directions, such as possible generalizations of the coefficient of overlap to handle two or more biomarkers.