Joel Breakstone wrote that two of the most readily available test item types, multiple-choice questions and document-based questions (DBQs), are poorly suited for formative assessment. Breakstone and his colleagues at SHEG have designed History Assessments of Thinking (HATs) that measure both content knowledge and historical thinking skills. HATs measure disciplinary skills through engagement with primary sources. Teachers using HATs must interpret student responses and enact curricular revisions using their pedagogical content knowledge, something that may prove difficult with new, or poorly-trained teachers.
To use HATs, teachers must understand the question, be familiar with the historical content, evaluate student responses, diagnose student mistakes, develop remediation, and implement the intervention. Teachers must possess an understanding of what makes learning easy or difficult and ways of formulating the subject that make it comprehensible to others. In designing HATs, Breakstone sought to collect data on cognitive validity, or the relationship between the constructs targeted by the assessments and the cognitive processes students use to answer them. This would help teachers interpret student responses and use that information to make curricular changes. Formative assessments in history depend on teachers being able to quickly diagnose student understanding. Assessments based on historical thinking represent a huge shift from the norm in history classrooms. For formative assessment to become routine, teachers will need extensive professional development and numerous other supports.
Sipress & Voelker (2011) write eloquently about the rise and fall of the coverage model in history instruction. This tension has been revitalized as educators eagerly anticipate which testing methodologies will be used for the “fewer, deeper” Common Core assessments and what I call the “Marv Alkin overkill method” of using at least four items to assess each content standard. This results in end of year history assessments that are 80 questions or more. Breadth vs. depth arguments have existed forever in education, Jay Mathews illustrates this by asking if teachers should focus on a few topics so students have time to absorb and comprehend the inner workings of the subject? Or should teachers cover every topic so students get a sense of the whole and can later pursue those parts that interest them most?
Something that may settle this debate is one of the more interesting developments in ed tech. The nexus of machine learning and student writing is a controversial and competitive market. Turnitin recently demonstrated that it is looking to move beyond plagiarism detection and into the automated writing feedback market with a recent acquisition. If my wife allowed me to gamble, I would bet that one of the testing consortiums, either Smarter Balanced or PARCC, will soon strike a deal with one of the eight automated essay grading vendors to grade open-ended questions on their standardized tests. Lightside Labs will pilot test their product with the Gates Foundation in 2015 and get it to market in 2016, just a little too late to be included in the first wave of Common Core assessments. I wonder if HAT assessments would be able to incorporate some automated scoring technology and settle the depth versus breadth debate in assessing history?