Joint modelling of longitudinal and survival data for dynamic prediction in credit-related applications
View/ Open
Date
09/12/2022Item status
Restricted AccessEmbargo end date
09/12/2022Author
Medina-Olivares, Víctor H.
Metadata
Abstract
Lenders monitor their borrowers over time, allowing them to dynamically predict
the probability of an event of interest, such as default. The widely used survival
models focus on when the event happens and can handle time-varying covariates
(TVCs) and censored observations. However, an issue little addressed in the
literature is that the model specification and the predictive framework depend
on the type of TVC included. TVCs can be either exogenous or endogenous
to the survival time. Exogenous are those whose future paths are not affected
by the event’s occurrence, such as macroeconomic variables. Endogenous, on the
contrary, are those whose paths are influenced by the survival status. An example
of the latter would be the unpaid principal balance when the event is the default.
This thesis explores new mathematical models in credit-related applications,
known as joint models of longitudinal and survival data. Initially developed
in medical research, these models, in their standard version, are formed by two
sub-models, one for the survival process and the other for the endogenous TVC
(also named longitudinal outcome in this context). A latent structure links the
sub-models, commonly in the form of random effects. Joint models have two
advantages compared to survival models. First, they allow us to handle possible
endogeneities in the TVCs. Second, by jointly modelling both processes, they offer
us a dynamic prediction framework that incorporates their mutual evolution.
We propose a series of innovations to make the approach appropriate to creditrelated
applications. These innovations relate to the nature of survival time, the
specific evolution of the TVCs, ways to scale the technique to large datasets and
how to leverage the available data in the modelling framework.
In concrete, we adapt the formulation of the joint models and their performance
metrics to the discrete nature of the loan data. In addition, we include autoregressive
terms in the TVC specification to address observed serial correlation
and enhance predictive capability. Moreover, we can study more complex specifications
with larger datasets by reformulating the approach within the INLA
framework, a fast and accurate algorithm for Bayesian inference. Among these
specifications are the joint models with more than one TVC and the joint model
that leverages geographical information to include spatial and spatio-temporal effects
in the hazard function. We also introduce a more accurate way to estimate
individual survival predictions using the Laplace method. Finally, to compare
different models, we propose a computationally efficient implementation of the
cross-entropy estimate of the posterior predictive conditional density that uses
the estimates obtained in the inference step.
We apply joint models to predict the time to credit events in the following three
settings: default in US mortgages, full prepayment in a German consumer loan
portfolio, and full prepayment in US mortgages. The main empirical results show
that the autoregressive terms in the joint model let us achieve better discrimination
performance, the predictive ability is significantly enhanced compared to
survival models when more TVCs are considered, and the inclusion of spatial
effects consistently leads to better data representation.