Edinburgh Research Archive

Modelling loss given default of corporate bonds with advanced machine learning techniques

Item Status

Embargo End Date

Authors

Zhang, Junfeng

Abstract

Loss Given Default (LGD) is an important component of credit risk management under Basel Accords. Accurate LGD models allow financial institutions to estimate potential losses and allocate sufficient regulatory capital for maintaining their financial stability and solvency. However, LGD modelling presents several significant challenges. Firstly, the determinants of LGD has not been fully explored yet, especially nowadays, when new data sources become available with the potential of improving accuracy and robustness. Yet these new sources require evaluation. Secondly, the LGD distribution is complex and often multimodal, making it challenging for traditional statistical models to fit and predict accurately. Lastly, the potential distributional shifts over time adds another layer of difficulty, as the economic environment, regulatory landscape, and competitive conditions can change, affecting the consistency and stationarity of LGD data. This Ph.D. thesis aims to address these challenges and enhance LGD modelling performance through both data and algorithm approaches, with the dataset of US corporate bond recoveries. The first and the second analysis chapters explore the integration of Environmental, Social, and Governance (ESG) information and text information from annual filings to improve LGD predictions, respectively. ESG factors, which have been shown to influence overall credit risk, are incorporated into machine learning models to assess their impact on LGD. The study finds that including ESG data significantly enhances the predictive accuracy of LGD models, especially during adverse macroeconomic conditions and for riskier debt segments. Among the ESG pillars, social factors emerge as having the most substantial impact on LGD estimates. This approach underscores the potential of ESG information in providing a more comprehensive view of credit risk. Moreover, the second piece of work leverages advanced text analytics on corporate annual filings to extract soft information that can further explain LGD variations. Utilising a fine-tuned FinBERT model for sentiment analysis, along with additional text metrics and text embeddings, the research constructs a set of textual variables from various sections. The findings demonstrate that these textual variables, particularly those from the Management Discussion and Analysis section, significantly improve out-of-time prediction accuracy by capturing deeper semantic meanings and reducing noise in the data. This innovative use of natural language processing techniques highlights the value of qualitative information in enhancing credit risk models. The last analysis chapter addresses the issue of dataset shift in LGD modelling by introducing dynamic ensemble learning strategies. Machine learning models often show inferior performance when applied to new, unseen data due to changes in underlying data distributions over time. This research examines the dataset shift problem in application to LGD and proposes four novel dynamic ensemble learning strategies that adapt to these shifts more effectively than static models. Among these, the dynamic feature-based strategy demonstrates superior performance, showcasing its robustness against temporal variations in the dataset. This approach emphasises the importance of adaptive modelling techniques in maintaining predictive accuracy over time. In summary, this thesis significantly contributes to the field of LGD modelling by integrating ESG information, employing sophisticated text analytics, and developing robust dynamic ensemble learning frameworks. These contributions not only improve the predictive power of LGD models but also offer practical insights for financial institutions to enhance risk assessment and regulatory compliance in a dynamic economic environment.

This item appears in the following Collection(s)