In this post different types of standard setting methods have been described. Among the different types, fixed percentage method, Angoff's method (Angoffing) and Hofstee have been described with their advantages and disadvantages.Commonly used standard setting methods for objective structured clinical examination (OSCE) have also also been described.
Standards
and setting them:
The term ‘standards’
can be used in a number of ways in relation to testing programs. Some of the
types can be summarized as
• Eligibility
standards
• Qualifications/educational
requirement/other criteria
• Test
delivery standards
• Administration
conditions, security procedures, technical specifications etc
• Content
standards
• Outcomes/curricular
objectives/specific instructional goals
• Test
is prepared out of it
• Performance
standards
• establishing
cut scores on a test
What is standard setting?
To put it in the
simplest way, standard setting refers to the process of determining a cut score
for tests. Cizek (1993) has defined standard setting as “the proper following
of a prescribed, rational system of rules or procedures resulting in the assignment
of a number to differentiate between two or more states or degrees of
performance”. This definition highlights a systematic methodical way involving
experts’ judgments (subjectivity) which take into account test’s purpose and
content, the examinees and educational setting while determining the cut score.
Thus standard setting translates the subjectivity i.e. experts’ judgments into
objectivity i.e. a numerical value in the form of cut score.
Why do we need to standard set?
Cut score is a numerical value that represents whether
an examinee meets the minimum standard set for a particular test and thus
serves as the basis for passing or failing an examinee. But the question
is - how do we know if the cut scores for a given assessment are set appropriately?
For the results of an assessment to be credible and widely acceptable, it is
necessary that the cut score is appropriately set. This makes the standard
setting an important step in the process of test development.
Types of standard
Relative standards and Absolute standards
1) Relative standards
Relative standards are based on a comparison among
the performances of examinees. The standards are expressed as the number or
percentage of examinees. In these methods, the cut scores are set in such a way
that allows, for example, to pass the 60 best performers or to discriminate to
40% from the bottom 60%. The method is appropriate for entrance
examinations/selection examinations where a limited number of candidates can
only be accommodated.
2) Absolute standards
Absolute standards are based on how much the
examinees know. The standards are expressed as number or percentage of the test
questions. In these methods, the cut scores are set in such a way that in order
to pass, examinees require producing, for example, 60 correct answers out of
100 questions i.e. 60% on the test. The method is appropriate for test of
competence like final or exit examinations, licensure and certification
examinations.
Methods for setting standards
Characteristics of methods of standard setting
The method of standard setting should have the
following characteristics so as to ensure the credibility of the results
produced by it:
It should be consistent with the purpose
of the test
It should
be based on expert judgements
It should
consider the ability of examinees
It should
consider the educational setting
It should
be defensible
It should
be credible
It should be supported by published
research
It should
be feasible (easy to implement, easy to make others understand)
It should
be acceptable to all stakeholders
Classification
1) Relative methods
It is based on
judgments about groups of test takers. e. g. fixed percentage method
2) Absolute methods
It
is of two types
• based
on judgments about test items. e. g. Angoff’s Method
•
based on judgments about the performance
of individual examinees. e. g. Contrasting groups methods
3) Compromise methods
It is a compromise
between relative and absolute standards. e.g. Hofstee method
Fixed
percentage method
The process of this
method can be outlined as follow:
• Each
judge is asked what is the percentage of the examinees that will pass the test
• The
judges can discuss and are free to change the score
• The
estimates are averaged to determine the cut score
Advantages
• Easy
to use
• Suitable
to identify a certain number of best (or worst) candidates
Disadvantages
• Independent
of test content
• Independent
of how much a examinee knows
• Less
reliable and thus affect the validity of the test
Angoff’s method
The process of this
method can be summarized as follow:
•
The borderline students are defined
•
Difficulty and importance of test item
is explained
•
Each judge estimates the proportion of borderline group that would respond the item correctly
•
Judges discuss and can change the
rating.
•
The process is repeated for each item of
the test.
•
The judge’s estimates are averaged.
•
The averages are summed up to determine
the cut score.
Fig – 1: Shows the
Angoff’s method score plan as estimates of borderline students that would
answer each of the test items. A student should answer 4.49 items correctly out
of 8 items.
|
Advantages
•
It focuses attention on item content,
thus ensuring the validity of the item
•
It is relatively easy to use
•
There is a considerable body of
published work to support its use
•
It is best suited to tests that seek to
establish competence
•
It is difficult to define the concept of
a "borderline students"
•
Judges may feel like producing numbers
out of the air
•
The methods can be tedious and time
consuming especially for a long test
Hofstee method
The method can be
summarized as follow:
•
Purpose of the test is explained
•
Nature of the examinees is discussed
•
What constitutes adequate/inadequate
knowledge is discussed
• Each
judge estimates the following
- the minimum acceptable cut score
- the maximum acceptable cut score
- the minimum acceptable fail rate
- the maximum acceptable fail rat
Note: Items 1 and 2 represent absolute standards and items 3 and 4 represent relative standards.
A final cut score is determined after the test is given by plotting the scores in a graph.
A final cut score is determined after the test is given by plotting the scores in a graph.
Advantages
•
It is easy to implement
•
Judges are comfortable with the method
of making estimates
Disadvantages
•
The cut score may not be in the area
defined by the judges’ estimates
•
It is not the first choice in a high
stakes testing situation
Commonly
used Standards setting methods for objective structured clinical examinations
(OSCEs):
Angoff’s
method
For each item in the checklist, the judges estimate
the proportion of borderline students that perform the particular task
correctly. Alternately, the method can also be modified so that the judgment is
made at the level of OSCE station rather than individual item on the checklist.
The estimate scores are then averaged and summed up to determine the cut score
for each OSCE station.
Borderline
group method
During an OSCE examination, the examiners assess the
performance of a student against each item in the checklist as well as assign a
global rating (pass/fail/borderline) based on the overall performance of the
student at the station.
The score obtained by the “borderline” performers
serve the basis to determine the cut score.
Guidelines
for setting standards
•
Assign an appropriate number (at least
6-8 for high stakes testing)
•
Select the characteristics the group
should possess e.g. mixed professions
•
All judges should attend throughout the
session
•
The characteristics of the examinees
should be explained
•
Judges should have familiarity with test
items and format
•
Reliability should be checked
•
Should produce the reasonable results
• Acceptable
to stakeholders
• pass
rates should be compared against contemporaneous markers of competence
Method of choice?
There is no perfect
standard setting method. The choices may depend on the various factors
determined by a particular circumstance. Beside, regardless of the use of
standard setting, a test should cover the appropriate content or should be at
the appropriate level of difficulty to determine the competency.
References
Bejar I. Standard Setting:
What Is It? Why Is It Important? Educational Testing Service 2008
Cizek, G. J. (1993).
Reconsidering standards and criteria. Journal of Educational Measurement,
30(2), 93-106
Kaufman DM, Mann
KV, Muijtjens AMM, van der Vleuten CPM. A comparison of standard-setting procedures for an
OSCE in undergraduate medical education. Acad Med 2000; 75:267-271.
Kramer A, Muijtjens A, Jansen K, Düsman H, Tan L,
van der Vleuten C. Comparison of a rational and an empirical standard setting
procedure for an OSCE, Medical Education, 2003 Vol 37 Issue 2, Page 132
Norcini JJ. Setting standards on educational
testists. Medical education 2003;37: 464–469