New approaches to corporate credit rating prediction
Item statusRestricted Access
Embargo end date21/03/2023
Corporate credit ratings are a formal and independent opinion about a company's creditworthiness and are regularly used by different stakeholders, such as investors and the companies themselves in their decision-making process. Therefore, corporate credit ratings need to be as accurate as possible. Since the rating agencies that issue the corporate credit ratings are not only cost-intensive but also need several months to assess a company's creditworthiness, a large number of studies focus on developing computational methods for corporate credit rating prediction that overcome these issues by being cost and time efficient. This thesis adds to existing studies by proposing the use of novel data and modeling strategies for enhancing the prediction of corporate credit ratings. We analyze data relating to NASDAQ and NYSE listed companies over 2011-2019. First, we investigate the predictive power of information contained in tweets from and about companies for predicting corporate credit rating levels and changes. We transform the tweets into sentiment scores based on a general word-list and sentiment scores based on a finance specific word-list and relate them separately, together with the tweet frequency, to the credit rating levels and to the incidence of a credit rating change. Overall, we find that including information from tweets gives a higher predictive performance compared to those of models that omit them. Additionally, we find that tweets about companies are overall more predictive for credit rating changes than tweets by companies. However, when predicting credit rating levels the information contained in tweets about companies is equally predictive as information contained in tweets by the companies themselves. Also, we find that the two sentiment scores give different results for different dependent variables, sources of tweets, algorithms and aggregated tweets, such as monthly and yearly. Last, we find that short term information, such as information contained in tweets in the month before the credit rating change, is predictive of credit rating changes and that the effect of the Twitter predictors when predicting credit rating levels differs between industries. Second, we show how linguistic features contained in tweets from and about companies can be used to enhance the prediction of corporate credit rating levels and changes. We apply the open-vocabulary approach Differential Language Analysis (DLA) in order to identify linguistic features in tweets that are individually most predictive of corporate credit rating levels and changes. In a subsequent step we introduce two strategies on how specific identified linguistic features can be selected and used for corporate credit rating prediction. Overall, we find that linguistic features in tweets enhance the prediction of corporate credit rating levels and changes. Additionally, we find that the models containing the linguistic features perform equally to the models containing the sentiment scores. Third, we propose a novel ordinal logistic regression model (OLMIDAS) that allows for the inclusion of variables sampled at higher frequencies than the dependent variable. We verify our proposed model in a simulation study by showing that it can find the true patterns in the data. In an empirical study we apply OLMIDAS to the prediction of corporate credit rating levels and compare its performance to a standard ordinal logistic regression model that contains an annual aggregate of the higher-frequency variable. We find that OLMIDAS outperforms the standard ordinal logistic regression model while providing additional knowledge on the structure of the higher-frequency explanatory variable although this appears to be application sensitive. Overall, we contribute to the existing literature in three ways. First, we contribute to the literature empirically by proposing novel features from Twitter that enhance the prediction of corporate credit rating levels and changes and that have not been used for predicting credit ratings before. Second, we contribute to the literature by showing how DLA can be applied within a financial context, since this has not been done before. More specifically, we apply DLA to corporate credit ratings and Twitter data and show how specific linguistic features can be selected to enhance the prediction of corporate credit rating levels and changes. Third, we contribute to the existing literature methodologically by developing a mixed-frequency regression model for ordinal response data. Our proposed model can include variables sampled at different frequencies and can lead under certain circumstances to superior prediction than standard ordinal logistic regression.