Testing second language writing in academic settings
View/ Open
Date
1987Author
Hamp-Lyons, Elizabeth Margaret
Metadata
Abstract
While the direct assessment of proficiency in writing through the collection and
evaluation of one or more writing samples is a common activity in educational
systems, and has been extensively and intensively researched, there is still much
to be learned, The context of this study is the evaluation of the writing
proficiency of overseas, mainly postgraduate, applicants to British tertiary
education institutions. The empirical study investigated two competing claims:
(1) that an appropriate writing
specifically related to the
discipline;
(2) that an appropriate writing
expository writing of a kind
community across disciplines
Following investigation of the constructs of 'writing', 'proficiency' and 'specific
purposes' and the establishment of formal criteria from language testing, it was
hypothesized that:
(1) Scores assigned to the writing of non-native postgraduates at
British universities when writing on discipline specific <SAP>
topics would not be significantly different from the same subjects'
scores when writing on general academic <GAP> topics.
(2) Scores assigned to the same subjects for two 'parallel' SAP
questions would share more variance than scores assigned to these
subjects for one SAP and one GAP question.
(3) Single-rater scores resulting from the operational scoring
procedure would not be adequately reliable.
(4) A three-rater aggregate score would be adequately reliable.
The writing of 111 subjects in five ':Modules' on two SAP questions and one GAP
question was studiedj it was found that operational scores were unreliable but
aggregate scores were adequately reliable, Results of the study presented a
conflicting pattern, with significantly higher mean scares for SAP than GAP in
some cases, but with more significant correlations between SAP and GAP than
between SAP and SAP. Rarely was more than 60% of score variance accounted for
in any interaction. It was suggested that a consistent SAP tt GAP distinction is
not being maintained in the test design.
The key writing test variables of scaring procedure, reader variables, essay test
task design and writer variables were intensively studied in an attempt to move
toward a better understanding of what was causing the inconsistent results. It
was found that the scoring procedure used general rather than specific academic
criteria, and that raters were applying these criteria to SAP as well as GAP
writing tests. Close study of raters gave no indication that they were
recognising and valuing SAP responses from writers. Issues of task design and
task difficulty were approached through the study of writers' responses, and
some progress was made in understanding the characteristics of an SAP response
from a writer. The three writing tests exhibited no clear SAP/GAP distinction,
and neither difficulty levels nor task demands exhibited uniformity.
It is suggested that until the scoring procedure and criteria are made more
valid, raters are trained to make SAP judgements, and tasks which are more
validly SAP are designed, no firm conclusion as to which of the competing models
is more valid can be drawn. Until that time, there appears to be little support
for the use of purportedly specific academic purpose rather than general
academic purpose writing test tasks.