Tag Archives: Literacy Design Collaborative.

Evaluating Student Writing

Following up on my earlier post on Historical Writing Prompts, this article will describe the results of a formative assessment on the Ancient Greek Philosophers for a 9th/10th grade World History class. This population of students was able to choose which type of writing task they wanted to complete (argument, informative, or narrative). They wrote a total of 183 essays that ranged between 18 and 640 words, with an average of 255 words per essay. From the results, it is clear these students needed additional instruction on (1) titling their essays to indicate which prompt they had selected, (2) specific instruction on rephrasing the prompt in their first two sentences, and (3) assistance in writing an introductory paragraph that organizes their thoughts and contains a thesis statement.

In this post, I will provide three samples from the argumentative writing prompt and ask students to vote for the strongest piece of student writing. Student work is typed verbatim; typos, misspellings, grammatical errors, and factual mistakes are intentionally included.


Argue that Plato and Aristotle held an essentially positive (or negative) view of human nature. In a well-reasoned essay, support your position using at least three of the quotes below as evidence to support your position.


Aristotle and Plato have made many quotes and many historians and people argue for the meaning of these quotes. In this paper I will discuss these quotes and put in my opinion. There will be showing if they are positive or negative.


The 3 quotes I will be talking about were from 2 famous philosophers, Aristotle and Plato. I will be showing you how powerful these quotes are and what they mean to me. I will be deciding if each quote represents a positive or negative view of human nature. These quotes would never mean the same thing to other people because of their opinions, and how they see on there own perspective. The first quote will be on Aristotle.


The meaning of “A good and wise life is the wealth that brings happiness” To me a good and wise life is having money and having your dream job. Also having a wonderful family. And having to see them everyday and having no worries. And no crime in the world.

Validating Rubrics

When teachers, departments, and schools use writing from sources as formative assessments in History, protocols need to be followed before evaluating these assessments. Many departments or schools collaboratively grade these assessments during common planning time, or teacher professional development. This requires training evaluators in using validated rubrics before applying this knowledge to the analysis of student work. Teachers work together to identify exemplars that strongly correlate with the spectrum of work defined by the rubric.

Annotated LDC Rubric

Holistic scoring involves assigning a single score that indicates the overall quality of a text (Bang, 2012). Raters give one summary score based on their impression of a text without trying to evaluate a specific set of skills. Analytic scoring examines multiple aspects of writing (e.g., content, structure, mechanics, etc.) and assigns a score for each. This type of evaluation generates several scores useful for guiding instruction.

Broadly defined, reliability is the consistency with which an instrument/method produces measurements, while validity is the extent to which an instrument/method actually measures what it is meant to measure, or its accuracy. In testing writing rubrics, agreement rates are used to determine inter-rater reliability, where agreement is further defined as exact or adjacent scores. Exact agreement consensus rates need to be 70% or greater to be considered reliable (Stemler, 2004). Adjacent agreements within one score point should exceed 90% to indicate a good level of consistency (Jonsonn & Svingby, 2007).

I used Google Forms to have my students validate the above rubric from the Literacy Design Collaborative. I found the LDC rubric to be more student friendly than the rubric my District adapted from the Smarter Balanced consortium.

LDC Rubric Agreement Frequency

Jonsonn & Svingby (2007) analyzed 75 rubric validation studies and found (a) benchmarks are most likely to increase agreement, but they should be chosen with care since the scoring depends heavily on the benchmarks chosen to define the rubric; (b) agreement is improved by training, but training will probably never totally eliminate differences; (c) topic-specific rubrics are likely to produce more generalizable and dependable scores than generic rubrics; and (d) augmentation of the rating scale (for example so the raters can expand the number of levels using + or − signs) seems to improve certain aspects of inter-rater reliability, although not consensus agreements.

Validating a rubric with your class gives your students additional time to consider their historical writing. When they have to review more than one student’s writing, they establish a context for evaluating their own writing. Class discussions should identify exemplars of strong historical writing. Direct instruction should focus on improving examples of weak writing.  Rubric validation is a much-needed historical thinking exercise. Otherwise your students may develop  what educational psychologists call the Dunning-Kruger Effect.

The Dunning-Kruger Effect describes a cognitive bias in which people perform poorly on a task, but lack the meta-cognitive capacity to properly evaluate their performance. As a result, such people remain unaware of their incompetence and accordingly fail to take any self- improvement measures that might rid them of their incompetence.


Bang, H. J. (N.D.) Reliability of National Writing Project’s Analytic Writing Continuum Assessment System.

Jonsson, A., & Svingby, G. (2007). The use of scoring rubrics: Reliability, validity and educational consequences. Educational Research Review, 2(2), 130-144.

Kruger, J. &  Dunning, D. (1999) Unskilled and unaware of it: How difficulties in recognizing one’s own incompetence lead to inflated self-assessments. Journal of Personality and Social Psychology, Vol 77(6), Dec 1999, 1121-1134.

Stemler, S. E. (2004). A comparison of consensus, consistency, and measurement approaches to estimating interrater reliability. Practical Assessment, Research & Evaluation, 9(4).