Proto-phoneme reconstruction as naive Bayes inference
Abstract
The comparative method is the standard technique by which historical linguists reconstruct ancestral languages from their descendants. The method, however, has received little attention from the computational linguistics community. We present a principled method by which sound change plausibility can be encoded, and a probabilistic framework for learning about sound change and using this knowledge for reconstruction. Our techniques are entirely probabilistic, and leverage the wealth of data that is becoming available in a machine-readable format. We show that a Naive Bayes classifier, combined with phoneme-conditioned categorical distributions over phonemes, learned via Maximum a Posteriori with smoothing based on phonetic similarity, can be used to accurately reconstruct proto-words from their descendants. Our system out-performs previous approaches to language reconstruction.
This item appears in the following Collection(s)

