What’s The Point? A (Computational) Theory of Punctuation
Although punctuation is clearly an important part of the written language, many natural language processing systems developed to date simply ignore punctuation in input text, or do not place it in output text. The reason for this is the lack of any clear, implementable theory of punctuation function suitable for transfer to the computational domain. The work described in this thesis aims to build on previous linguistic work on the function of punctuation, particularly that by Nunberg (1990),with experimental and theoretical investi- gations into the potential usefulness of including punctuation in natural language analyses, the variety of punctuation marks present in text, and the syntactic and semantic functions of those marks . Results from these investigations are combined into a taxonomy of punctuation marks and synthesised into a theory describing principles and rule schemata whereby punctuation functionality can be added to natural language processing systems. The thesis begins with some introductory chapters, discussing the nature of punctuation, its history, and previous approaches to theoretical description. Subsequent chapters describe the experimental and theoretical investigations into the potential uses of punctuation in compu- tational systems, the variety of punctuation marks used, and the syntactic and semantic functions thatpunctuation marks fulfil. Further chapters then construct a taxonomy of punctu- ation marks and describe the theory synthesised from the results of the investigations. The concluding chapters sum up the research and discuss its possible extension to languages other than English.