Interrater reliability in large-scale assessments: can teachers score national tests reliably without external controls?
2015 (English)In: Practical Assessment, Research & Evaluation, ISSN 1531-7714, Vol. 20, no 9Article in journal (Refereed) Published
In most large-scale assessment systems a set of rather expensive external quality controls are implemented in order to guarantee the quality of interrater reliability. This study empirically examines if teachers’ ratings of national tests in mathematics can be reliable without using monitoring, training, or other methods of external quality assurance. A sample of 99 booklets of students’ answers to a national test in mathematics was scored by five teachers independently. The interrater reliability was analyzed using consensus and consistency estimates, with the focus on the test as a whole, as well as on individual items. The results show that the estimates are acceptable and in many cases fairly high, irrespective of the reliability measure used. Some plausible explanations for lower interrater reliability in individual items are discussed, and some suggestions are made in the direction of further improving reliability without imposing any system of control.
Place, publisher, year, edition, pages
2015. Vol. 20, no 9
IdentifiersURN: urn:nbn:se:umu:diva-101511OAI: oai:DiVA.org:umu-101511DiVA: diva2:799871