TEACHING ANATOMY: STANDARDS SETTING FOR MCQ, OSPE & OSCE - A QUALITY CONTROL MEASURE IN ASSESSMENT

In this post different types of standard setting methods have been described. Among the different types, fixed percentage method, Angoff's method (Angoffing) and Hofstee have been described with their advantages and disadvantages.Commonly used standard setting methods for objective structured clinical examination (OSCE) have also also been described.

Standards and setting them:

The term ‘standards’ can be used in a number of ways in relation to testing programs. Some of the types can be summarized as

• Eligibility standards

• Qualifications/educational requirement/other criteria

• Test delivery standards

• Administration conditions, security procedures, technical specifications etc

• Content standards

• Outcomes/curricular objectives/specific instructional goals

• Test is prepared out of it

• Performance standards

• establishing cut scores on a test

What is standard setting?

To put it in the simplest way, standard setting refers to the process of determining a cut score for tests. Cizek (1993) has defined standard setting as “the proper following of a prescribed, rational system of rules or procedures resulting in the assignment of a number to differentiate between two or more states or degrees of performance”. This definition highlights a systematic methodical way involving experts’ judgments (subjectivity) which take into account test’s purpose and content, the examinees and educational setting while determining the cut score. Thus standard setting translates the subjectivity i.e. experts’ judgments into objectivity i.e. a numerical value in the form of cut score.

Why do we need to standard set?

Cut score is a numerical value that represents whether an examinee meets the minimum standard set for a particular test and thus serves as the basis for passing or failing an examinee. But the question is - how do we know if the cut scores for a given assessment are set appropriately? For the results of an assessment to be credible and widely acceptable, it is necessary that the cut score is appropriately set. This makes the standard setting an important step in the process of test development.

Types of standard

Relative standards and Absolute standards

1) Relative standards

Relative standards are based on a comparison among the performances of examinees. The standards are expressed as the number or percentage of examinees. In these methods, the cut scores are set in such a way that allows, for example, to pass the 60 best performers or to discriminate to 40% from the bottom 60%. The method is appropriate for entrance examinations/selection examinations where a limited number of candidates can only be accommodated.

2) Absolute standards

Absolute standards are based on how much the examinees know. The standards are expressed as number or percentage of the test questions. In these methods, the cut scores are set in such a way that in order to pass, examinees require producing, for example, 60 correct answers out of 100 questions i.e. 60% on the test. The method is appropriate for test of competence like final or exit examinations, licensure and certification examinations.

Methods for setting standards

Characteristics of methods of standard setting

The method of standard setting should have the following characteristics so as to ensure the credibility of the results produced by it:

It should be consistent with the purpose of the test

It should be based on expert judgements

It should consider the ability of examinees

It should consider the educational setting

It should be defensible

It should be credible

It should be supported by published research

It should be feasible (easy to implement, easy to make others understand)

It should be acceptable to all stakeholders

Classification

1) Relative methods

It is based on judgments about groups of test takers. e. g. fixed percentage method

2) Absolute methods

It is of two types

• based on judgments about test items. e. g. Angoff’s Method

• based on judgments about the performance of individual examinees. e. g. Contrasting groups methods

3) Compromise methods

It is a compromise between relative and absolute standards. e.g. Hofstee method

Fixed percentage method

The process of this method can be outlined as follow:

• Each judge is asked what is the percentage of the examinees that will pass the test

• The judges can discuss and are free to change the score

• The estimates are averaged to determine the cut score

Advantages

• Easy to use

• Applicable to both written and clinical examination

• Suitable to identify a certain number of best (or worst) candidates

Disadvantages

• Independent of test content

• Independent of how much a examinee knows

• Less reliable and thus affect the validity of the test

Angoff’s method

The process of this method can be summarized as follow:

• The borderline students are defined

• Difficulty and importance of test item is explained

• Each judge estimates the proportion of borderline group that would respond the item correctly

• Judges discuss and can change the rating.

• The process is repeated for each item of the test.

• The judge’s estimates are averaged.

• The averages are summed up to determine the cut score.

Fig – 1: Shows the Angoff’s method score plan as estimates of borderline students that would answer each of the test items. A student should answer 4.49 items correctly out of 8 items.

Advantages

• It focuses attention on item content, thus ensuring the validity of the item

• It is relatively easy to use

• There is a considerable body of published work to support its use

• It is best suited to tests that seek to establish competence

Disadvantages

• It is difficult to define the concept of a "borderline students"

• Judges may feel like producing numbers out of the air

• The methods can be tedious and time consuming especially for a long test

Hofstee method

The method can be summarized as follow:

• Purpose of the test is explained

• Nature of the examinees is discussed

• What constitutes adequate/inadequate knowledge is discussed

• Each judge estimates the following

the minimum acceptable cut score
the maximum acceptable cut score
the minimum acceptable fail rate
the maximum acceptable fail rat

Note: Items 1 and 2 represent absolute standards and items 3 and 4 represent relative standards.

A final cut score is determined after the test is given by plotting the scores in a graph.

FIg-2: Determining the pass score using Hofstee Method.

Advantages

• It is easy to implement

• Judges are comfortable with the method of making estimates

Disadvantages

• The cut score may not be in the area defined by the judges’ estimates

• It is not the first choice in a high stakes testing situation

Commonly used Standards setting methods for objective structured clinical examinations (OSCEs):

Angoff’s method

For each item in the checklist, the judges estimate the proportion of borderline students that perform the particular task correctly. Alternately, the method can also be modified so that the judgment is made at the level of OSCE station rather than individual item on the checklist. The estimate scores are then averaged and summed up to determine the cut score for each OSCE station.

Borderline group method

During an OSCE examination, the examiners assess the performance of a student against each item in the checklist as well as assign a global rating (pass/fail/borderline) based on the overall performance of the student at the station.

The score obtained by the “borderline” performers serve the basis to determine the cut score.

Guidelines for setting standards

• Assign an appropriate number (at least 6-8 for high stakes testing)

• Select the characteristics the group should possess e.g. mixed professions

• All judges should attend throughout the session

• Purpose of the test should be discussed

• The characteristics of the examinees should be explained

• Judges should have familiarity with test items and format

• Reliability should be checked

• Should produce the reasonable results

• Acceptable to stakeholders

• pass rates should be compared against contemporaneous markers of competence

Method of choice?

There is no perfect standard setting method. The choices may depend on the various factors determined by a particular circumstance. Beside, regardless of the use of standard setting, a test should cover the appropriate content or should be at the appropriate level of difficulty to determine the competency.

References

Bejar I. Standard Setting: What Is It? Why Is It Important? Educational Testing Service 2008

Cizek, G. J. (1993). Reconsidering standards and criteria. Journal of Educational Measurement,

30(2), 93-106

Kaufman DM, Mann KV, Muijtjens AMM, van der Vleuten CPM. A comparison of standard-setting procedures for an OSCE in undergraduate medical education. Acad Med 2000; 75:267-271.

Kramer A, Muijtjens A, Jansen K, Düsman H, Tan L, van der Vleuten C. Comparison of a rational and an empirical standard setting procedure for an OSCE, Medical Education, 2003 Vol 37 Issue 2, Page 132

Norcini JJ. Setting standards on educational testists. Medical education 2003;37: 464–469

Pages

STANDARDS SETTING FOR MCQ, OSPE & OSCE - A QUALITY CONTROL MEASURE IN ASSESSMENT