A flexible expansion algorithm for user-chosen abbreviations
Item statusRestricted Access
Embargo end date31/12/2100
Willis, Timothy Alan
People with some types of motor disabilities who wish to generate text using a computer can find the process both fatiguing and time-consuming. These problems can be alleviated by reducing the quantity of keystrokes they must make, and one approach is to allow the user to enter shortened, abbreviated input, which is then re-expanded for them, by a program ‘filling in the gaps’. Word Prediction is one approach, but comes with drawbacks, one of which is the requirement that generally the user must type the first letters of their intended word, regardless of how unrepresentative they may consider those letters to be. Abbreviation Expansion allows the user to type reduced forms of many words in a way they feel represents them more effectively. This can be done by the omission of one or more letters, or the replacement of letter sequences with other, usually shorter, sequences. For instance, the word ‘hyphenate might be shortened to ‘yfn8’, by leaving out some letters and replacing the ‘ph’ and ‘ate’ with the shorter but phonetically similar ‘f’ and ‘8’. ‘Fixed Abbreviation Expansion’ requires the user to memorise a set of correspondences between abbreviations and the full words which they represent. While this enables useful keystroke savings to be made, these come alongside an increased cognitive load and potential for error. Where a word is encountered for which there is no preset abbreviation, or for which the user cannot remember one, keystroke savings may be lost. ‘Flexible Abbreviation Expansion’ allows the user to leave out whichever letters they feel to be ‘less differentiating' and jump straight ahead to type those they feel are most ‘salient’ and most characterise the word, choosing abbreviations ‘on the fly’. The need to memorise sets of correspondences is removed, as the user can be offered all candidates for which the abbreviation might be a representation, usually in small sets on screen. For useful savings to be made, the intended word must regularly be in the first or second set for quick selection, or the system might attempt to place the intended word at the very top of its list as frequently as possible. Thus it is important to generate and rank the candidates effectively, so that high probability words can be offered in a shortlist. Lower-ranking candidates can be offered in secondary lists which are not immediately displayed. This can reduce both the cognitive load and keystrokes needed for selection. The thesis addresses the task of reducing the number of keystrokes needed for text creation with a large, expressive vocabulary, using a new approach to flexible abbreviation expansion. To inform the solution, two empirical studies were run to gather letter-level statistics on the abbreviation methods of twenty-nine people, under different degrees of constriction (that is, different restrictions on the numbers of characters by which to reduce). These studies showed that with a small amount of priming, people would abbreviate in regular ways, both shared between users, and repeated through the data from an individual. Analysis showed the most common strategies to be vowel deletion, phonetic replacement, loss of double letters, and word truncation. Participants reduced the number of letters in their texts by between 25% (judged to maintain a high degree of comprehensibility) up to 40% (judged to be a maximum degree of brevity whilst still retaining comprehensibility). Informed by these results, an individual-word-level algorithm was developed. For each input abbreviation, a set of candidates is produced, ranked in such a way as to potentially save substantial keystrokes when used across a whole text. A variety of statistical and linguistic techniques, often also used in spelling checking and correction, are used to rank them so that the most probable will be easiest to select, and with fewest keystrokes. The algorithm works at the level of the individual word, without looking at surrounding context. Evaluation of the algorithm demonstrated that it outperforms its nearest comparable alternative, of ranking word lists exclusively by word frequency. The evaluation was performed on the data from the second empirical study, using vocabulary sizes of 2-, 10-, 20- and 30-thousand words. The results show the algorithm to be of potential benefit for use as a component of a flexible abbreviation expansion system. Even with the overhead of selection of the intended word, useful keystroke savings could still be attained. It is envisaged that such a system could be implemented on many platforms, including as part of an AAC (Augmentative and Alternative Communication) device, and an email system on a standard PC, thus making typed communication for the user group more comfortable and expansive.