The Cross-Content Stimulus Evaluation Framework

Stimuli are probably the least recognized and studied part of large scale assessment items. They are just taken for granted as part of items—given even less attention than distractors! (Stems really get all the glory, right?) Haladyna & Rodriguez’s 400+ page book, Developing and Validating Test Items (2013) devotes maybe 200 words to how to think about stimuli.

Parts of an item laid out: optional instructions, stimuli, stem, workspace and response

The different parts of an item, as understood through a layout perspective

However, stimuli are too important to take for granted. They provide opportunities for test takers to demonstrate their proficiencies by giving them something to analyze or manipulate with their KSAs (knowledge, skills and/or abilities). They are the content and the material to which test takers apply the targeted cognition of items and alignment references.

Stimuli so often influence item difficulty, cognitive complexity and even whether the items are aligned to their alignment references. They are usually the source of fairness issues, be they in the realm of bias or in the realm of sensitivity. Moreover, there are entire large processes to develop them for ELA assessments, and their development might be the primary challenge facing NGSS-aligned science assessment development (other than, of course, item type availability).

So, after mulling it over for well over a decade, we have finally offered a framework for thinking about stimuli that can be applied across content areas. The C2SEF, the Cross-Content Stimulus Evaluation Framework is available for download.

This framework offers 11 dimensions, each explained in the white paper. First is the question of whether the alignment reference or item in question even requires a stimulus. Second is the question of whether the stimulus should be explicit or implicit in the test form. Of course, stimuli only exist to provide testable points. The structure, density and complexity of stimuli must be considered. The copyright/permissions status of the stimulus is important, as are its authenticity, familiarity to test takers and the amount of time it would take test takers to make initial sense of the stimulus. Perhaps nothing is ever more important than evaluating fairness risk, as valid items elicit evidence of the targeted cognition for the range of typical test takers.

Because different content areas have such different needs for their stimuli—differences which are magnified in the constrained assessment contexts of large scale assessment—there are more papers coming from this little project. We will be offering further papers that explore the particular stimulus needs of different content areas. We hope to partner with subject matter experts in those areas to lead those papers, and even already have most of them in mind.