Bayesian methods for biomarker evaluation and disease diagnosis
View/ Open
Garrido GuillénJE_2022.pdf (8.740Mb)
Date
22/03/2022Item status
Restricted AccessEmbargo end date
22/03/2023Author
Garrido Guillén, Javier
Metadata
Abstract
Accurate diagnosis of disease is of fundamental importance in medical research and clinical practice.
For such reason, the role that diagnostic testing and screening play is undeniable. The major goal of a
diagnostic test is to distinguish between individuals with a well-de ned condition, referred as diseased,
and individuals with the absence of such condition, known as nondiseased. Before a test is widely used
in practice, its discriminatory ability must be rigorously assessed through statistical analysis. The
overlap coe cient, which is de ned as the proportion of overlap area between two density functions,
has gained unarguably popularity as a summary measure of diagnostic accuracy and, in this thesis, it
is our main object of study.
In the rst chapter of this thesis, we introduce di erent concepts related to diagnostic tests and
how its accuracy might be measured. A brief description of the receiver operating characteristic (ROC)
curve, one the most popular existing statistical methods to evaluate the discriminatory ability of a
test, is provided as well. We then de ne the coe cient of overlap and we discuss its advantages and
disadvantages over usual summary measures, namely, the area under the ROC curve and the Youden
index. At the end of the chapter, we recognize that, as it has been acknowledged in the literature,
the performance of a diagnostic test may depend on covariates (e.g., age and/or sex) and failure to
incorporate this information may result in misleading or oversimpli ed conclusions about the accuracy
of the test.
In the second chapter we provide a brief introduction to Bayesian inference and Bayesian non-parametric
methods, as we have adopted the Bayesian paradigm throughout this thesis.
In the third chapter of this thesis, we develop Bayesian inferential methods for the coefficient of
overlap. Accurate estimation of the coefficient of overlap requires accurately estimating the density
functions of test outcomes in both the diseased and nondiseased populations. For such end, we employ
a Dirichlet process mixture of normal distributions to model such density functions. Once estimates
of the density functions of test results have been obtained, two estimators for the coefficient of overlap
are then proposed: one based on numerical integration and another one that further uses the Bayesian
bootstrap. Our integrated framework relaxes restrictive distributional assumptions (e.g., normality of
test outcomes in each population) of existing approaches. The performance of our methods is assessed
through a simulation study and we also provide an application concerned with the search for ovarian
cancer biomarkers.
In the fourth chapter of this thesis, we extend our
exible modelling approach for the coefficient
of overlap to the covariates' context. We follow a joint approach based on Dirichlet process mixtures,
where both test outcomes and covariates are modelled jointly through a multivariate kernel. We use
di erent simulated examples to evaluate the performance of our modelling approach and we provide an
application to two real datasets as well. The rst application concerns the assessment of the accuracy
of the glucose levels as a marker for diabetes changes with age. In turn, in the second application the
goal is to study the effect of age and sex on the discriminatory ability of different biomarkers for the
Alzheimer's disease.
In the fifth chapter, we include vignettes and examples showing the usage of the R package
OverlapCoefficient, which implements our methods.
Finally, we discuss future working directions, such as possible generalizations of the coefficient of
overlap to handle two or more biomarkers.