Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Methods for Examining the Psychometric Quality of Subscores: A Review and Application
Umeå University, Faculty of Social Sciences, Department of applied educational science, Departement of Educational Measurement.
Umeå University, Faculty of Social Sciences, Department of applied educational science, Departement of Educational Measurement.
2015 (English)In: Practical Assessment, Research & Evaluation, ISSN 1531-7714, E-ISSN 1531-7714, Vol. 20, 21Article in journal (Refereed) Published
Abstract [en]

When subscores on a test are reported to the test taker, the appropriateness of reporting them depends on whether they provide useful information above what is provided by the total score. Subscores that fail to do so lack adequate psychometric quality and should not be reported. There are several methods for examining the quality of subscores, and in this study seven such methods, four of which are based on classical test theory and three of which are based on item response theory, were reviewed and applied to empirical data. The data consisted of test takers' scores on four test forms – two administrations of a first version of a college admission test and two administrations of a second version – and the analyses were carried out on the subtest and section levels. The two section scores were found to have adequate psychometric quality with all methods used, whereas the results for subtest scores ranged from almost all scores having adequate psychometric quality to none having adequate psychometric quality. The authors recommend using Haberman's method and the related utility index because of their solid theoretical foundation and because of various issues with the other subscore quality methods.

Place, publisher, year, edition, pages
2015. Vol. 20, 21
Keyword [en]
subscores, score reporting, mean squared error, factor analysis, IRT, college admissions tests
National Category
Pedagogy Psychology
Research subject
didactics of educational measurement
Identifiers
URN: urn:nbn:se:umu:diva-112181OAI: oai:DiVA.org:umu-112181DiVA: diva2:876398
Available from: 2015-12-03 Created: 2015-12-03 Last updated: 2017-12-01Bibliographically approved
In thesis
1. Theory and validity evidence for a large-scale test for selection to higher education
Open this publication in new window or tab >>Theory and validity evidence for a large-scale test for selection to higher education
2017 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Validity is a crucial part of all forms of measurement, and especially in instruments that are high-stakes to the test takers. The aim of this thesis was to examine theory and validity evidence for a recently revised large-scale instrument used for selection to higher education in Sweden, the Swedish Scholastic Assessment Test (SweSAT), as well as identify threats to its validity. Previous versions of the SweSAT have been intensely studied but when it was revised in 2011, further research was needed to strengthen the validity arguments for the test. The validity approach suggested in the most recent version of the Standards for education and psychological testing, in which the theoretical basis and five sources of validity evidence are the key aspects of validity, was adopted in this thesis.

The four studies that are presented in this thesis focus on different aspects of the SweSAT, including theory, score reporting, item functioning and linking of test forms. These studies examine validity evidence from four of the five sources of validity: evidence based on test content, response processes, internal structure and consequences of testing.

The results from the thesis as a whole show that there is validity evidence that supports some of the validity arguments for the intended interpretations and uses of SweSAT scores, and that there are potential threats to validity that require further attention. Empirical evidence supports the two-dimensional structure of the construct scholastic proficiency, but the construct requires a more thorough definition in order to better examine validity evidence based on content and consequences for test takers. Section scores provide more information about test takers' strengths and weaknesses than what is already provided by the total score and can therefore be reported, but subtest scores do not provide additional information and should not be reported. All four quantitative subtests, as well as the Swedish reading comprehension subtest, are essentially free of differential item functioning (DIF) but there is moderate DIF that could be bias in two of the four verbal subtests. Finally, the equating procedure, although it appears to be appropriate, needs to be examined further in order to determine whether it is the best practice available or not for the SweSAT.

Some of the results in this thesis are specific to the SweSAT because only SweSAT data was used but the design of the studies and the methods that were applied serve as practical examples of validating a test and are therefore likely useful to different populations of people involved in test development, test use and psychometric research.

Suggestions for further research include: (1) a study to create a more clear and elaborate definition of the construct, scholastic proficiency; (2) a large and empirically focused study of subscore value in the SweSAT using repeat test takers and applying Haberman’s method along with recently proposed effect size measures; (3) a cross-validation DIF-study using more recently administered test forms; (4) a study that examines the causes for the recurring score differences between women and men on the SweSAT; and (5) a study that re-examines the best practice for equating the current version of the SweSAT, using simulated data in addition to empirical data.

Place, publisher, year, edition, pages
Umeå: Umeå universitet, 2017. 51 p.
Series
Academic dissertations at the department of Educational Measurement, ISSN 1652-9650 ; 10
Keyword
SweSAT, validity, theoretical model, score reporting, subscores, DIF, equating, linking, Högskoleprovet, validitet, teoretisk modell, rapportering av provpoäng, ekvivalering, länkning
National Category
Educational Sciences
Research subject
didactics of educational measurement
Identifiers
urn:nbn:se:umu:diva-138492 (URN)978-91-7601-732-6 (ISBN)
Public defence
2017-09-22, Hörsal 1031, Norra beteendevetarhuset, Umeå, 10:00 (English)
Opponent
Supervisors
Available from: 2017-09-01 Created: 2017-08-24 Last updated: 2017-09-20Bibliographically approved

Open Access in DiVA

fulltext(605 kB)148 downloads
File information
File name FULLTEXT01.pdfFile size 605 kBChecksum SHA-512
291b19fd85bc1066f2eda704dbfcc64a00a501aaeeb3bf353a481ceb16f03c362eae654bd1b30e201a3d4f5953f0f63e71b3792a2d69fe0ea5830615a378a9d9
Type fulltextMimetype application/pdf

Other links

URL

Search in DiVA

By author/editor
Wedman, JonathanLyrén, Per-Erik
By organisation
Departement of Educational Measurement
In the same journal
Practical Assessment, Research & Evaluation
PedagogyPsychology

Search outside of DiVA

GoogleGoogle Scholar
Total: 148 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 538 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf