Edinburgh Research Archive

Automatically extracting the source words of English lexical blends

Item Status

RESTRICTED ACCESS

Embargo End Date

Authors

Kosinowski, Hanne

Abstract

Language changes constantly – new words are created on a daily basis. This thesis examines blends in English, a highly productive word formation process where two words are combined to form a new word with a new meaning. In order to allow natural language processing system to handle blends, I present a system that automatically extracts the words comprising the blend using a set of statistical features. Using the features on a corpus consisting of 2236 blends and a logistic regression classifier, I obtain a 50% accuracy on the gold standard. So far, this is the largest corpus of blends used for this task. I compare the results to previous work and provide solutions on how to improve the system’s performance.

This item appears in the following Collection(s)