A flexible expansion algorithm for user-chosen abbreviations
View/ Open
Willis2008.pdf (4.961Mb)
Date
11/2008Item status
Restricted AccessEmbargo end date
31/12/2100Author
Willis, Timothy Alan
Metadata
Abstract
People with some types of motor disabilities who wish to generate text using a computer
can find the process both fatiguing and time-consuming. These problems can be
alleviated by reducing the quantity of keystrokes they must make, and one approach is to
allow the user to enter shortened, abbreviated input, which is then re-expanded for them,
by a program ‘filling in the gaps’. Word Prediction is one approach, but comes with
drawbacks, one of which is the requirement that generally the user must type the first
letters of their intended word, regardless of how unrepresentative they may consider
those letters to be. Abbreviation Expansion allows the user to type reduced forms of
many words in a way they feel represents them more effectively. This can be done by the
omission of one or more letters, or the replacement of letter sequences with other, usually
shorter, sequences. For instance, the word ‘hyphenate might be shortened to ‘yfn8’, by
leaving out some letters and replacing the ‘ph’ and ‘ate’ with the shorter but phonetically
similar ‘f’ and ‘8’. ‘Fixed Abbreviation Expansion’ requires the user to memorise a set
of correspondences between abbreviations and the full words which they represent.
While this enables useful keystroke savings to be made, these come alongside an
increased cognitive load and potential for error. Where a word is encountered for which
there is no preset abbreviation, or for which the user cannot remember one, keystroke
savings may be lost. ‘Flexible Abbreviation Expansion’ allows the user to leave out
whichever letters they feel to be ‘less differentiating' and jump straight ahead to type
those they feel are most ‘salient’ and most characterise the word, choosing abbreviations
‘on the fly’. The need to memorise sets of correspondences is removed, as the user can
be offered all candidates for which the abbreviation might be a representation, usually in
small sets on screen. For useful savings to be made, the intended word must regularly be
in the first or second set for quick selection, or the system might attempt to place the
intended word at the very top of its list as frequently as possible.
Thus it is important to generate and rank the candidates effectively, so that high
probability words can be offered in a shortlist. Lower-ranking candidates can be offered
in secondary lists which are not immediately displayed. This can reduce both the
cognitive load and keystrokes needed for selection.
The thesis addresses the task of reducing the number of keystrokes needed for text
creation with a large, expressive vocabulary, using a new approach to flexible
abbreviation expansion. To inform the solution, two empirical studies were run to gather
letter-level statistics on the abbreviation methods of twenty-nine people, under different
degrees of constriction (that is, different restrictions on the numbers of characters by
which to reduce). These studies showed that with a small amount of priming, people
would abbreviate in regular ways, both shared between users, and repeated through the
data from an individual. Analysis showed the most common strategies to be vowel
deletion, phonetic replacement, loss of double letters, and word truncation. Participants reduced the number of letters in their texts by between 25% (judged to maintain a high
degree of comprehensibility) up to 40% (judged to be a maximum degree of brevity
whilst still retaining comprehensibility).
Informed by these results, an individual-word-level algorithm was developed. For each
input abbreviation, a set of candidates is produced, ranked in such a way as to potentially
save substantial keystrokes when used across a whole text. A variety of statistical and
linguistic techniques, often also used in spelling checking and correction, are used to rank
them so that the most probable will be easiest to select, and with fewest keystrokes. The
algorithm works at the level of the individual word, without looking at surrounding
context.
Evaluation of the algorithm demonstrated that it outperforms its nearest comparable
alternative, of ranking word lists exclusively by word frequency. The evaluation was
performed on the data from the second empirical study, using vocabulary sizes of 2-, 10-,
20- and 30-thousand words.
The results show the algorithm to be of potential benefit for use as a component of a
flexible abbreviation expansion system. Even with the overhead of selection of the
intended word, useful keystroke savings could still be attained. It is envisaged that such a
system could be implemented on many platforms, including as part of an AAC
(Augmentative and Alternative Communication) device, and an email system on a
standard PC, thus making typed communication for the user group more comfortable and
expansive.