Testing second language writing in academic settings
Hamp-Lyons, Elizabeth Margaret
While the direct assessment of proficiency in writing through the collection and evaluation of one or more writing samples is a common activity in educational systems, and has been extensively and intensively researched, there is still much to be learned, The context of this study is the evaluation of the writing proficiency of overseas, mainly postgraduate, applicants to British tertiary education institutions. The empirical study investigated two competing claims: (1) that an appropriate writing specifically related to the discipline; (2) that an appropriate writing expository writing of a kind community across disciplines Following investigation of the constructs of 'writing', 'proficiency' and 'specific purposes' and the establishment of formal criteria from language testing, it was hypothesized that: (1) Scores assigned to the writing of non-native postgraduates at British universities when writing on discipline specific <SAP> topics would not be significantly different from the same subjects' scores when writing on general academic <GAP> topics. (2) Scores assigned to the same subjects for two 'parallel' SAP questions would share more variance than scores assigned to these subjects for one SAP and one GAP question. (3) Single-rater scores resulting from the operational scoring procedure would not be adequately reliable. (4) A three-rater aggregate score would be adequately reliable. The writing of 111 subjects in five ':Modules' on two SAP questions and one GAP question was studiedj it was found that operational scores were unreliable but aggregate scores were adequately reliable, Results of the study presented a conflicting pattern, with significantly higher mean scares for SAP than GAP in some cases, but with more significant correlations between SAP and GAP than between SAP and SAP. Rarely was more than 60% of score variance accounted for in any interaction. It was suggested that a consistent SAP tt GAP distinction is not being maintained in the test design. The key writing test variables of scaring procedure, reader variables, essay test task design and writer variables were intensively studied in an attempt to move toward a better understanding of what was causing the inconsistent results. It was found that the scoring procedure used general rather than specific academic criteria, and that raters were applying these criteria to SAP as well as GAP writing tests. Close study of raters gave no indication that they were recognising and valuing SAP responses from writers. Issues of task design and task difficulty were approached through the study of writers' responses, and some progress was made in understanding the characteristics of an SAP response from a writer. The three writing tests exhibited no clear SAP/GAP distinction, and neither difficulty levels nor task demands exhibited uniformity. It is suggested that until the scoring procedure and criteria are made more valid, raters are trained to make SAP judgements, and tasks which are more validly SAP are designed, no firm conclusion as to which of the competing models is more valid can be drawn. Until that time, there appears to be little support for the use of purportedly specific academic purpose rather than general academic purpose writing test tasks.