STANDARDS SETTING FOR MCQ, OSPE & OSCE - A QUALITY CONTROL MEASURE IN ASSESSMENT

In this post different types of standard setting methods have been described. Among the different types, fixed percentage method, Angoff's method (Angoffing) and Hofstee have been described with their advantages and disadvantages.Commonly used standard setting methods for objective structured clinical examination (OSCE) have also also been described.


Standards and setting them:

The term ‘standards’ can be used in a number of ways in relation to testing programs. Some of the types can be summarized as

      Eligibility standards

      Qualifications/educational requirement/other criteria

      Test delivery standards

      Administration conditions, security procedures, technical specifications etc

      Content standards

      Outcomes/curricular objectives/specific instructional goals

      Test is prepared out of it

      Performance standards

      establishing cut scores on a test


What is standard setting?

To put it in the simplest way, standard setting refers to the process of determining a cut score for tests. Cizek (1993) has defined standard setting as “the proper following of a prescribed, rational system of rules or procedures resulting in the assignment of a number to differentiate between two or more states or degrees of performance”. This definition highlights a systematic methodical way involving experts’ judgments (subjectivity) which take into account test’s purpose and content, the examinees and educational setting while determining the cut score. Thus standard setting translates the subjectivity i.e. experts’ judgments into objectivity i.e. a numerical value in the form of cut score.



Why do we need to standard set?

Cut score is a numerical value that represents whether an examinee meets the minimum standard set for a particular test and thus serves as the basis for passing or failing an examinee. But the question is - how do we know if the cut scores for a given assessment are set appropriately? For the results of an assessment to be credible and widely acceptable, it is necessary that the cut score is appropriately set. This makes the standard setting an important step in the process of test development.


Types of standard

Relative standards and Absolute standards


1) Relative standards

Relative standards are based on a comparison among the performances of examinees. The standards are expressed as the number or percentage of examinees. In these methods, the cut scores are set in such a way that allows, for example, to pass the 60 best performers or to discriminate to 40% from the bottom 60%. The method is appropriate for entrance examinations/selection examinations where a limited number of candidates can only be accommodated.


2) Absolute standards

Absolute standards are based on how much the examinees know. The standards are expressed as number or percentage of the test questions. In these methods, the cut scores are set in such a way that in order to pass, examinees require producing, for example, 60 correct answers out of 100 questions i.e. 60% on the test. The method is appropriate for test of competence like final or exit examinations, licensure and certification examinations.


Methods for setting standards


Characteristics of methods of standard setting

The method of standard setting should have the following characteristics so as to ensure the credibility of the results produced by it:

It should be consistent with the purpose of the test

It should be based on expert judgements

It should consider the ability of examinees

It should consider the educational setting

It should be defensible

It should be credible

It should be supported by published research

It should be feasible (easy to implement, easy to make others understand)

It should be acceptable to all stakeholders


Classification


1) Relative methods

It is based on judgments about groups of test takers. e. g. fixed percentage method


2) Absolute methods

It is of two types

      based on judgments about test items. e. g. Angoff’s Method

      based on judgments about the performance of individual examinees. e. g. Contrasting groups methods


3) Compromise methods

It is a compromise between relative and absolute standards. e.g. Hofstee method



Fixed percentage method

The process of this method can be outlined as follow:

      Each judge is asked what is the percentage of the examinees that will pass the test

      The judges can discuss and are free to change the score

      The estimates are averaged to determine the cut score


Advantages

      Easy to use

      Applicable to both written and clinical examination

      Suitable to identify a certain number of best (or worst) candidates


Disadvantages

      Independent of test content

      Independent of how much a examinee knows

      Less reliable and thus affect the validity of the test


Angoff’s method

The process of this method can be summarized as follow:

      The borderline students are defined

      Difficulty and importance of test item is explained

      Each judge estimates the proportion of borderline group that would respond the item correctly

      Judges discuss and can change the rating.

      The process is repeated for each item of the test.

      The judge’s estimates are averaged.

      The averages are summed up to determine the cut score.

angoff score

Fig – 1: Shows the Angoff’s method score plan as estimates of borderline students that would answer each of the test items. A student should answer 4.49 items correctly out of 8 items.

Advantages

      It focuses attention on item content, thus ensuring the validity of the item

      It is relatively easy to use

      There is a considerable body of published work to support its use

      It is best suited to tests that seek to establish competence


Disadvantages

      It is difficult to define the concept of a "borderline students"

      Judges may feel like producing numbers out of the air

      The methods can be tedious and time consuming especially for a long test


Hofstee method

The method can be summarized as follow:

      Purpose of the test is explained

      Nature of the examinees is discussed

      What constitutes adequate/inadequate knowledge is discussed

      Each judge estimates the following

  1. the minimum acceptable cut score
  2. the maximum acceptable cut score
  3. the minimum acceptable fail rate
  4. the maximum acceptable fail rat
Note: Items 1 and 2 represent absolute standards and items 3 and 4 represent relative standards. 

A final cut score is determined after the test is given by plotting the scores in a graph.

 
Hofstee
FIg-2: Determining the pass score using Hofstee Method.
 
Advantages

      It is easy to implement

      Judges are comfortable with the method of making estimates


Disadvantages

      The cut score may not be in the area defined by the judges’ estimates

      It is not the first choice in a high stakes testing situation



Commonly used Standards setting methods for objective structured clinical examinations (OSCEs):

Angoff’s method

For each item in the checklist, the judges estimate the proportion of borderline students that perform the particular task correctly. Alternately, the method can also be modified so that the judgment is made at the level of OSCE station rather than individual item on the checklist. The estimate scores are then averaged and summed up to determine the cut score for each OSCE station.



Borderline group method

During an OSCE examination, the examiners assess the performance of a student against each item in the checklist as well as assign a global rating (pass/fail/borderline) based on the overall performance of the student at the station.

The score obtained by the “borderline” performers serve the basis to determine the cut score.



Guidelines for setting standards

      Assign an appropriate number (at least 6-8 for high stakes testing)

      Select the characteristics the group should possess e.g. mixed professions

      All judges should attend throughout the session

      Purpose of the test should be discussed

      The characteristics of the examinees should be explained

      Judges should have familiarity with test items and format

      Reliability should be checked

      Should produce the reasonable results

      Acceptable to stakeholders

      pass rates should be compared against contemporaneous markers of competence


Method of choice?

There is no perfect standard setting method. The choices may depend on the various factors determined by a particular circumstance. Beside, regardless of the use of standard setting, a test should cover the appropriate content or should be at the appropriate level of difficulty to determine the competency.



References

Bejar I. Standard Setting: What Is It? Why Is It Important? Educational Testing Service 2008

Cizek, G. J. (1993). Reconsidering standards and criteria. Journal of Educational Measurement,

30(2), 93-106

Kaufman DM, Mann KV, Muijtjens AMM, van der Vleuten CPM. A comparison of standard-setting procedures for an OSCE in undergraduate medical education. Acad Med 2000; 75:267-271.

Kramer A, Muijtjens A, Jansen K, Düsman H, Tan L, van der Vleuten C. Comparison of a rational and an empirical standard setting procedure for an OSCE, Medical Education, 2003 Vol 37 Issue 2, Page 132

Norcini JJ. Setting standards on educational testists. Medical education 2003;37: 464–469