The effectiveness of the stylometry of function words in discriminating between Shakespeare and Fletcher
View/ Open
Date
1987Author
Horton, Thomas Bolton
Metadata
Abstract
A number of recent successful authorship studies have relied on a statistical
analysis of language features based on function words. However, stylometry has
not been extensively applied to Elizabethan and Jacobean dramatic questions.
To determine the effectiveness of such an approach in this field, language features
are studied in twenty-four plays by Shakespeare and eight by Fletcher. The goal
is to develop procedures that might be used to determine the authorship of
individual scenes in The Two Noble Kinsmen and Henry VIII.
Homonyms, spelling variants and contracted forms in old-spelling dramatic
texts present problems for a computer analysis. A program that uses a system of
pre-edit codes and replacement /expansion lists was developed to prepare versions
of the texts in which all forms of common words can be recognized automatically.
To evaluate some procedures for determining authorship developed by A. Q.
Morton and his colleagues, occurrences of 30 common collocations and 5 proportional
pairs are analyzed in the texts. Within-author variation for these features
is greater than had been found in previous studies. Univariate chi-square tests
are shown to be of limited usefulness because of the statistical distribution of
these textual features and correlation between pairs of features. The best of the
collocations do not discriminate as well as most of the individual words from
which they are composed.
Turning to the rate of occurrence of individual words and groups of words, distinctiveness
ratios and t-tests are used to select variables that best discriminate
between Shakespeare and Fletcher. Variation due to date of composition and
genre within the Shakespeare texts is examined. A multivariate and distributionfree
discriminant analysis procedure (using kernel estimation) is introduced. The
classifiers based on the best marker words and the kernel method are not greatly
affected by characterization and perform well for samples as short as 500 words.
When the final procedure is used to assign the 459 scenes of known authorship
(containing at least 500 words)almost 112 95% are assigned to the correct author. Only
two scenes are incorrectly classified, and 4.8% of the scenes cannot be assigned
to either author by the procedure. When applied to individual scenes of at least 500 words in The Two Noble
Kinsmen and Henry VIII, the procedure indicates that both plays are collaborations
and generally supports the usual division. However, the marker words in
a number of scenes often attributed to Fletcher are very much closer to Shakespeare's
pattern of use. These scenes include TNK IV.iii and H8 I.iii, IV.i-ii
and V.iv.