Change search
Refine search result
1 - 13 of 13
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Almarlind, Pia
    et al.
    Umeå University, Faculty of Social Sciences, Department of applied educational science, Departement of Educational Measurement.
    Lind Pantzare, Anna
    Umeå University, Faculty of Social Sciences, Department of applied educational science, Departement of Educational Measurement.
    Are the influences from social and political agents beneficial and a necessity in thedevelopment and validation of educational assessments?2016Conference paper (Refereed)
    Abstract [en]

    In order to produce sound and valid tests with respect to the Standards (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 2014) there are a lot of issues to consider and take into account. Especially when developing tests on a national level. In addition, there are often several stakeholders having an ambition to influence the tests in different directions. Sometimes these stakeholders agree, but often their requests are diametrically opposed and it is not unusual that the requests are not in line with a good measurement practice. In the midst are the test developing organisations, commissioned to develop products that are valid in relation to the aim/aims for the test and therefore being developed within a sound measurement practice but also accepted by all users.

     

    As described in the theme for the conference this external influence from stakeholders on the tests is immense and sometimes described as only negative. Often there are politicians who uses educational assessments, like national tests or exams, to control the school system on the one hand but on the other hand using the tests to implement changes. At the same time the politicians are sensitive to reactions from the teachers, parents and other stakeholders since they are important groups of voters. In Sweden the debate is at the moment focused on the large number of national tests and the workload they entails to teachers but also students. In a recently published government-appointed inquiry (SOU 2016:25, 2016) it is suggested that the number of national tests should be reduced, that the remaining tests should be less extensive and that the tests should be easier to administer and mark, which probably will affect the validity of the tests.

     

    These external influences could, from a test developing perspective, be seen as problematic since it often introduces (rapid) changes of the tests. On the other hand, one could argue that these external influences are necessary prerequisites to have an ongoing process in order to develop the tests so that they become even more cost effective, valid and seen as valuable for the users.

     

    We think it would be interesting to discuss this complex system of, on the one hand, social and political agents trying to influence and change the national assessment systems and, on the other hand, the test developing organisations aiming to develop assessments that are valid. But at the same time these organisations are dependent of getting resources from the agents to fulfil the commission, which might affect which changes that are implemented and not.

     

    This is a proposal for a discussion group based on the broad question posed as title. Below we have specified some themes that would be interesting to discuss getting perspectives from different countries and testing systems.

     

    • How are the products, i.e. the tests, and the processes developing the tests affected by the influences from different social and political agents?
    • Are there stakeholders having greater impact, and if there are, is it a necessity or a risk? Why How?
    • Finally, is it maybe necessary to have this continuous external validation of the tests in order to develop, strengthen and legitimise them or does it “ruin the work”?
  • 2.
    Almarlind, Pia
    et al.
    Umeå University, Faculty of Social Sciences, Department of applied educational science, Departement of Educational Measurement.
    Lind Pantzare, Anna
    Umeå University, Faculty of Social Sciences, Department of applied educational science, Departement of Educational Measurement.
    Swedish national tests in science: assessment of students in a 21st century world2014Conference paper (Refereed)
    Abstract [en]

    Sweden has since 2009 had national tests for school year nine in the science (i.e. biology, physics and chemistry). These tests aim to assess the three competencies described in the curricula namely, review information, communicate and take a stand in questions related to the subject, conduct systematic investigations and use concepts theories and models to explain connections. Beginning from the end, the third competence is assessed trough a constructed response test, a test that can be seen as rather traditional science test. The second competence is assessed through a practical investigation that the students plan, conduct and evaluate. The first competence is assessed by a more complex item where the students are served some information which they are supposed to review and use in their argumentation when they take a stand. These tests include a lot of the competences aimed at when discussing assessment for the 21st century world; creativity, critical thinking, problem solving, communication, and decision-making.

    The assessment of laboratory work has been an obligatory part of the national tests from the beginning in 2009. Despite all problems with the practical handling in the school there has been an important part of the aim that the national tests should be exemplary. If the tests do not include a practical test why should the schools work with laboratory work in class? There was also statistics indicating that many of the schools did not work practical at all.

    When the new curriculum was introduced 2011 the competence to communicate was highlighted. Beside the more general definitions of communication several of the syllabi included a subject specific communication component. In the first version of the tests developed in relation to the new curriculum there has been a part assessing the students’ ability to communicate. In these items the students are supposed to scrutinize and analyse given information, communicate and make a decision. The items are connected to questions concerning energy, environment, resource use and health. One of the challenges is to really assess science communication and not communication in general.

    In this presentation we will discuss the rationales for this test model and show how the assessment looks like.

  • 3.
    Bergqvist, Ewa
    et al.
    Umeå University, Faculty of Social Sciences, Department of Educational Measurement. Umeå University, Faculty of Science and Technology, Department of Mathematics and Mathematical Statistics. Umeå University, Faculty of Science and Technology, Umeå Mathematics Education Research Centre (UMERC).
    Lind, Anna
    Umeå University, Faculty of Social Sciences, Department of Educational Measurement. Umeå University, Faculty of Science and Technology, Umeå Mathematics Education Research Centre (UMERC).
    Är det svårare att dela med fyra än med två när man läser matte C?: En jämförelse av svårighetsgrad mellan olika versioner av matematikuppgifter i Nationella kursprov2005Report (Other academic)
    Abstract [sv]

    Våren 2004 infördes på försök två olika versioner av de nationella kursproven i matematik på kurserna B, C och D. På varje kurs skilde sig de båda versionerna åt i några av de ingående uppgifterna, medan övriga uppgifter var identiska. Syftet med denna studie är att undersöka om, hur och varför dessa förändringar i matematikuppgifterna påverkar uppgifternas svårighetsgrad.

    Resultaten visar att en förändring av de i uppgifterna ingående talen endast i ett fåtal fall påverkat uppgifternas svårighetsgrad i någon större utsträckning. Dessa fåtal fall studeras vart och ett för sig. När två uppgifters sammanhang och formulering skiljer sig åt, även om det matematiska innehållet är i stort sett identiskt, visar ett exempel på att skillnaden i svårighetsgrad kan vara mycket stor.

  • 4.
    Lind Pantzare, Anna
    Umeå University, Faculty of Social Sciences, Department of applied educational science, Departement of Educational Measurement.
    Dimensions of validity: studies of the Swedish national tests in mathematics2018Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    The main purpose for the Swedish national tests was from the beginning to provide exemplary assessments in a subject and support teachers when interpreting the syllabus. Today, their main purpose is to provide an important basis for teachers when grading their students. Although the results from tests do not entirely decides a student’s grade, they are to be taken into special account in the grading process. Given the increasing importance and raise of the stakes, quality issues in terms of validity and reliability is attracting greater attention. The main purpose of this thesis is to examine evidence demonstrating the validity for the Swedish national tests in upper secondary school mathematics and thereby identify potential threats to validity that may affect the interpretations of the test results and lead to invalid conclusions. The validation is made in relation to the purpose that the national tests should support fair and equal assessment and grading. More specifically, the focus was to investigate how differences connected to digital tools, different scorers and the standard setting process affect the results, and also investigate if subscores can be used when interpreting the results. A model visualized as a chain containing links associated with various aspects of validity, ranging from administration and scoring to interpretation and decision-making, is used as a framework for the validation.

    The thesis consists of four empirical studies presented in the form of papers and an introduction with summaries of the papers. Different parts of the validation chain are examined in the studies. The focus of the first study is the administration and impact of using advanced calculators when answering test items. These calculators are able to solve equations algebraically and therefore reduce the risk of a student making mistakes. Since the use of such calculators is allowed but not required and since they are quite expensive, there is an obvious threat to validity since the national tests are supposed to be fair and equal for all test takers. The results show that the advanced calculators were not used to a great extent and it was mainly those students who were high-achieving in mathematics that benefited the most. Therefore the conclusion was that the calculators did not affect the results.

    The second study was an inter-rater reliability study. In Sweden, teachers are responsible for scoring their own students’ national tests, without any training, monitoring or moderation. Therefore it was interesting to investigate the reliability of the scoring since there is a potential risk of bias against one’s own students. The analyses showed that the agreement between different raters, analyzed with percent-agreement and kappa, is rather high but some items have lower agreement. In general, items with several correct answers or items where different solution strategies are available are more difficult to score reliably.

    The cut scores set by a judgmental Angoff standard setting, the method used to define the cut scores for the national tests in mathematics, was in study three compared with a statistical linking procedure using an anchor test design in order to investigate if the cut scores for two test forms were equally demanding. The results indicate that there were no large differences between the test forms. However, one of the test taker groups was rather small which restricts the power of the analysis. The national tests do not include any anchor items and the study highlights the challenges of introducing equating, that is comparing the difficulty of different test forms, on a regular basis.

    In study four, the focus was on subscores and whether there was any value in reporting them in addition to the total score. The syllabus in mathematics has been competence-based since 2011 and the items in the national tests are categorized in relation to these competencies. The test grades are only connected to the total score via the cut scores but the result for each student is consolidated in a result profile based on those competencies. The subscore analysis shows that none of the subscores have added value and the tests would have to be up to four times longer in order to achieve any significant value.

    In conclusion, the studies indicate that several of the potential threats do not appear to be significant and the evidence suggests that the interpretations made and decisions taken have the potential to be valid. However, there is a need for further studies. In particular, there is a need to develop a procedure for equating that can be implemented on a regular basis.

  • 5.
    Lind Pantzare, Anna
    Umeå University, Faculty of Social Sciences, Department of applied educational science, Departement of Educational Measurement.
    Interrater reliability in large-scale assessments: can teachers score national tests reliably without external controls?2015In: Practical Assessment, Research & Evaluation, ISSN 1531-7714, E-ISSN 1531-7714, Vol. 20, no 9Article in journal (Refereed)
    Abstract [en]

    In most large-scale assessment systems a set of rather expensive external quality controls are implemented in order to guarantee the quality of interrater reliability. This study empirically examines if teachers’ ratings of national tests in mathematics can be reliable without using monitoring, training, or other methods of external quality assurance. A sample of 99 booklets of students’ answers to a national test in mathematics was scored by five teachers independently. The interrater reliability was analyzed using consensus and consistency estimates, with the focus on the test as a whole, as well as on individual items. The results show that the estimates are acceptable and in many cases fairly high, irrespective of the reliability measure used. Some plausible explanations for lower interrater reliability in individual items are discussed, and some suggestions are made in the direction of further improving reliability without imposing any system of control.

  • 6.
    Lind Pantzare, Anna
    Umeå University, Faculty of Social Sciences, Department of applied educational science, Departement of Educational Measurement.
    Kursproven i gymnasieskolan: Matematik i Umeå 1995-20132013Report (Other academic)
  • 7.
    Lind Pantzare, Anna
    Umeå University, Faculty of Social Sciences, Department of applied educational science, Departement of Educational Measurement.
    Students’ use of CAS calculators: effects on the trustworthiness and fairness of mathematics assessments2012In: International journal of mathematical education in science and technology, ISSN 0020-739X, E-ISSN 1464-5211, Vol. 43, no 7, p. 843-861Article in journal (Refereed)
    Abstract [en]

    Calculators with computer algebra systems (CAS) are powerful tools when working with equations and algebraic expressions in mathematics. When the calculators are allowed to be used during assessments but are not available or provided to every student, they may cause bias.  The CAS calculators may also have an impact on the trustworthiness of results.

     In this study students’ use of the CAS calculator in their work with released assessment items from TIMSS Advanced 2008 is studied using two approaches. Eight students familiar with CAS, from two  mathematics classes in the 12th form, were video filmed when encouraged to think aloud during their work with the items. In addition, a questionnaire was distributed to all 33 students in the two classes who had been working with a CAS.

    The main finding is that even if the students are used to working with the CAS calculator, they are not using the calculator to a large extent. The analysis indicates that the difference in performance between the high- and low-achieving students has slightly increased due to the use of the calculator. From a validity perspective one could therefore argue that the CAS calculator is no major threat to the trustworthiness of the assessment. Nevertheless, the result indicates that those students in the study, mainly high achieving, who know how to use the CAS calculator, get an additional advantage. The advantage brings an amount of unfairness into the assessment and could be a threat to the trustworthiness and fairness.

  • 8.
    Lind Pantzare, Anna
    Umeå University, Faculty of Social Sciences, Department of applied educational science, Departement of Educational Measurement.
    Validating cut scores set by Angoff procedures with results from equating procedures2015Conference paper (Refereed)
    Abstract [en]

    In Sweden the cut scores for each new test form of national tests in mathematics are set before test administration. This demand has existed ever since the transition to the current criterion-referenced system in 1994. One argument given for this requirement is to make sure that teachers no longer score and interpret the test score in a relative manner. The cut scores are set with a judgemental Angoff procedure, without inclusion of item field test data and with no regular equating or linking procedure. Therefore, a relevant question is if it is naïve to assume that the cut scores are equivalent over years. In these studies the equivalence of the cut scores for two, different and separate, pairs of tests are investigated, by comparing cut scores set by Angoff procedures with the results from equating procedures. In both examples a non-equivalent group anchor test (NEAT) design was used. The cut scores was compared to equating procedures with linear and equipercentile methods. The results show that there are validity arguments supporting that the Angoff procedure is working. However, the equating procedures reveal several methodological and practical challenges.

  • 9.
    Lind Pantzare, Anna
    Umeå University, Faculty of Social Sciences, Department of applied educational science, Departement of Educational Measurement.
    Validating standard setting: comparing judgmental and statistical linking2017In: Standard setting in education: the Nordic countries in an international perspective / [ed] Sigrid Blömeke Jan-Eric Gustafsson, Springer, 2017, 1, p. 143-160Chapter in book (Refereed)
    Abstract [en]

    This study presents a validation of the proposed cut scores for two test forms in mathematics that were developed from the same syllabus and blueprint. The external validity was analyzed by comparing the cut scores set by an Angoff procedure with the results provided by mean and linear observed score equating procedures. A non-equivalent group anchor test (NEAT) design was also used. The results provide evidence that the cut scores obtained through both judgmental and statistical linking are equivalent. However, the equating procedure revealed several methodological and practical challenges.

  • 10.
    Lind Pantzare, Anna
    Umeå University, Faculty of Social Sciences, Department of applied educational science, Departement of Educational Measurement.
    Validity issues in educational measurement - should subscores in national tests be reported or not?2017Conference paper (Refereed)
  • 11.
    Lind Pantzare, Anna
    et al.
    Umeå University, Faculty of Social Sciences, Department of applied educational science, Departement of Educational Measurement.
    Abrahamsson, Mattias
    Umeå University, Faculty of Social Sciences, Department of applied educational science, Departement of Educational Measurement.
    Almarlind, Pia
    Umeå University, Faculty of Social Sciences, Department of applied educational science, Departement of Educational Measurement.
    Lundgren, Christer
    Umeå University, Faculty of Social Sciences, Department of applied educational science, Departement of Educational Measurement.
    Ämnesproven i grundskolans årskurs 9 och specialskolans årskurs 10: vårterminen 20152015Report (Other academic)
  • 12.
    Lind Pantzare, Anna
    et al.
    Umeå University, Faculty of Social Sciences, Department of applied educational science.
    Wikström, Christina
    Umeå University, Faculty of Social Sciences, Department of applied educational science.
    Using summative tests for formative purposes: an analysis of the added value of subscoresManuscript (preprint) (Other academic)
    Abstract [en]

    Knowledge tests, both standardized and teacher developed, are central in teachers’ daily work when forming decisions on student achievement. Although it is recommended that a test should be used only for its intended purpose, tests that were designed for summative purposes are nevertheless used for giving feedback or making formative decisions. The purpose of this paper is to investigate whether a summative test within the Swedish national test framework can provide meaningful information for formative use by testing its reliability on the subscore level. The study also aims to analyze whether a Swedish national test can be used for provide guidance for practitioners who wish to use the information on the subscore level for planning instruction as well as other formative purposes, as sometimes implied in the information to teachers.

  • 13.
    Wikström, Christina
    et al.
    Umeå University, Faculty of Social Sciences, Department of applied educational science.
    Lind Pantzare, Anna
    Umeå University, Faculty of Social Sciences, Department of applied educational science.
    Standard setting in Sweden: school grades and national tests2018In: Examination standards: how measures and meanings differ around the world / [ed] Jo-Anne Baird, Tina Isaacs, Dennis Opposs, Lena Gray, London, UK: UCL Press, 2018, p. 235-251Chapter in book (Other academic)
1 - 13 of 13
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf