Automatically Extracting the Source Words of English Lexical Blends
Item statusRestricted Access
Language changes constantly – new words are created on a daily basis. This thesis examines blends in English, a highly productive word formation process where two words are combined to form a new word with a new meaning. In order to allow natural language processing system to handle blends, I present a system that automatically extracts the words comprising the blend using a set of statistical features. Using the features on a corpus consisting of 2236 blends and a logistic regression classifier, I obtain a 50% accuracy on the gold standard. So far, this is the largest corpus of blends used for this task. I compare the results to previous work and provide solutions on how to improve the system’s performance.