Automatically extracting the source words of English lexical blends
Abstract
Language changes constantly – new words are created on a daily basis. This
thesis examines blends in English, a highly productive word formation process
where two words are combined to form a new word with a new meaning. In
order to allow natural language processing system to handle blends, I present a
system that automatically extracts the words comprising the blend using a set
of statistical features. Using the features on a corpus consisting of 2236 blends
and a logistic regression classifier, I obtain a 50% accuracy on the gold standard.
So far, this is the largest corpus of blends used for this task. I compare the
results to previous work and provide solutions on how to improve the system’s
performance.
This item appears in the following Collection(s)

