Here’s how I used ChatGPT to assess student writing

Artificial intelligence

Artificial intelligence | Tuesday, May 30, 2023

Here’s how I used ChatGPT to assess student writing

Can AI assistants like ChatGPT help us assess student writing in the classroom? This post shares a series of interactions with ChatGPT to provide feedback on student writing.


Deborah Cleveland is an Instructional Designer for AEA Learning Online, where she builds online professional development for educators throughout Iowa. You can find her on Twitter (@Dcleveland00) and on LinkedIn.

Time -- or lack of time -- is every educator's nemesis. There are so many things that need their attention and yet there is so little time.

However, in recent months, artificial intelligence -- specifically ChatGPT -- has exploded into our lives with the promise that it will save us time, help everyone be more efficient, and allow us to individually and collectively achieve more than we thought possible.

But is it true? Can AI deliver on its promise?

Let’s find out.

To help us get a better handle on this and see if AI is really amazing as it sounds, I’ve decided to put it to the test. In this post, I'm going to look at how AI can help us assess student writing.

Essential Question: Can AI (ChatGPT) assist with grading student writing and providing meaningful and timely feedback?

Grading student writing and providing meaningful and timely feedback is important when it comes to helping students become better writers. It is especially challenging when you have between 25 and 150 pieces of writing to look at.

Below, I'll share the 10 interactions I had with ChatGPT in this conversation. You'll see the prompts I used and AI’s response I copied the conversation. At the end, I'll reflect on the experience and share any lingering questions I still have.

Here are a few things to watch:

  • How it grades the writing prompts similarly/differently
  • The criteria it comes up with for a rubric
  • How it applies the rubric
  • How it adjust as I ask it to adjust

Let’s dive in!

Interaction #1: General grading of writing assignment

Deborah: Can you grade this writing 3rd grade writing assignment*?

" There are many kinds of squids. The giant deep-sea squid can be longer than a city bus, and they are cousins of the octopus. A squid has a long body and ten arms. They live in the Caribbean. The Squid protects itself by hiding. A Squid can squirt out a blob of inky stuff. The inky blob may confuse hungry creatures. Its eggs come in handy because it surrounds itself with eggs and protects itself that way. Isn’t this amazing that it can be camouflaged in one second? They hide by moving with jet propulsion. If a shark tries to attack a squid, it will shoot out ink so it confuses the shark and zips away. Remember about the eggs? This is something like it. They swim together to protect themselves. They squirt out dark liquid to hide themselves from enemies. They change their colors and patterns to blend in with their surroundings. Therefore you can see that the squid have many ways of hiding."

ChatGPT: As an AI language model, I can provide feedback on the writing assignment based on certain criteria, but please note that grading assignments typically involves subjective evaluation and requires human judgment. Here's an example of how an AI could provide feedback on this 3rd grade writing assignment:

Grading Feedback:

  • Organization (3/5): The writing has a clear topic sentence at the beginning, but the rest of the paragraphs lack clear transitions between ideas. Consider using transition words or phrases to help make the writing flow more smoothly.
  • Content (4/5): The writing includes relevant information about squids, such as their size, arms, habitat, and defense mechanisms. However, some details could be further explained or expanded upon for a more comprehensive description.
  • Grammar and Mechanics (3/5): There are several instances of errors in punctuation, capitalization, and verb tense. Review and revise for proper grammar and mechanics, including ending sentences with proper punctuation and using consistent verb tense throughout the writing.
  • Vocabulary (4/5): The writing includes appropriate vocabulary for a 3rd grade level, such as "squirt," "camouflaged," and "jet propulsion." Consider using more descriptive words and varied sentence structures to enhance the writing.
  • Overall Impression (4/5): The writing demonstrates a good understanding of the topic and includes relevant details about squids. With some improvements in organization, grammar and mechanics, and vocabulary usage, the writing could be more polished and effective in conveying idea

Note: The grading criteria and feedback provided by an AI may vary depending on the specific AI model and its programming, and it should be used as a reference only. It's always important to have a qualified human teacher or instructor review and provide feedback on student assignments to ensure accurate and comprehensive evaluation.

Interaction #2: Justification of scores

Deborah: How did you come up with these scores?

ChatGPT: As an AI language model, I do not have the ability to assign scores. The scores provided in the example grading feedback were purely fictional and included for illustrative purposes only. In a real-world scenario, the grading criteria and scores for a writing assignment would be determined by the teacher or instructor based on established rubrics, guidelines, and their professional judgment.

The example grading feedback was generated based on common criteria that may be used to evaluate a 3rd grade writing assignment, such as organization, content, grammar and mechanics, vocabulary, and overall impression. These criteria were applied in a subjective manner to provide feedback on potential areas of improvement in the given writing assignment. However, it's important to note that grading is a complex process that involves human judgment and may vary depending on the specific requirements and expectations set by the teacher or educational institution.

Interaction #3: Use of a rubric

Deborah: If I have you create a writing rubric for a 3rd grade sample, adjust it to meet my needs, can you use that to "grade" a piece of writing and provide the writer with feedback? 

ChatGPT: As an AI language model, I can certainly help you create a writing rubric for a 3rd grade sample and provide feedback on a piece of writing based on that rubric. However, please note that the feedback generated by an AI should be used as a reference only and should not be considered a substitute for human evaluation and feedback.

Criteria:

  1. Organization:
  • (Score 4) Writing has a clear introduction, body, and conclusion with appropriate use of paragraphs.
  • (Score 3) Writing has a clear introduction and conclusion, but may have some inconsistencies in paragraphing.
  • (Score 2) Writing has some attempt at organization, but lacks clear introduction, body, or conclusion.
  • (Score 1) Writing is disorganized and lacks clear structure.
  1. Content:
  • (Score 4) Writing includes relevant details, facts, or examples that support the main idea.
  • (Score 3) Writing includes some relevant details, facts, or examples, but may lack consistency or specificity.
  • (Score 2) Writing includes limited or unrelated details, facts, or examples.
  • (Score 1) Writing lacks relevant content or is off-topic.
  1. Grammar and Mechanics:
  • (Score 4) Writing demonstrates correct use of punctuation, capitalization, and verb tense.
  • (Score 3) Writing has some errors in punctuation, capitalization, or verb tense, but does not significantly impact readability.
  • (Score 2) Writing has frequent errors in punctuation, capitalization, or verb tense that may impact readability.
  • (Score 1) Writing has pervasive errors in punctuation, capitalization, or verb tense that significantly impact readability.
  1. Vocabulary:
  • (Score 4) Writing uses a variety of age-appropriate vocabulary with precise word choices.
  • (Score 3) Writing uses some age-appropriate vocabulary, but may lack variety or precision.
  • (Score 2) Writing uses limited or basic vocabulary that may not be age-appropriate.
  • (Score 1) Writing lacks appropriate vocabulary or word choices.
  1. Overall Impression:
  • (Score 4) Writing is engaging, well-crafted, and demonstrates strong effort and creativity.
  • (Score 3) Writing is generally engaging and shows effort, but may have some areas for improvement.
  • (Score 2) Writing is somewhat engaging, but lacks polish or effort.
  • (Score 1) Writing is not engaging, poorly crafted, or lacks effort.

Interaction #4: Creation of a rubric in a table

Deborah: Create the rubric you just described and put it in a table.

ChatGPT: Sure! Here's the same writing rubric for 3rd grade in a table format:

Criteria

Score 4

Score 3

Score 2

Score 1

Organization

Writing has a clear introduction, body, and conclusion with appropriate use of paragraphs.

Writing has a clear introduction and conclusion, but may have some inconsistencies in paragraphing.

Writing has some attempt at organization, but lacks clear introduction, body, or conclusion.

Writing is disorganized and lacks clear structure.

Content

Writing includes relevant details, facts, or examples that support the main idea.

Writing includes some relevant details, facts, or examples, but may lack consistency or specificity.

Writing includes limited or unrelated details, facts, or examples.

Writing lacks relevant content or is off-topic.

Grammar and Mechanics

Writing demonstrates correct use of punctuation, capitalization, and verb tense.

Writing has some errors in punctuation, capitalization, or verb tense, but does not significantly impact readability.

Writing has frequent errors in punctuation, capitalization, or verb tense that may impact readability.

Writing has pervasive errors in punctuation, capitalization, or verb tense that significantly impact readability.

Vocabulary

Writing uses a variety of age-appropriate vocabulary with precise word choices.

Writing uses some age-appropriate vocabulary, but may lack variety or precision.

Writing uses limited or basic vocabulary that may not be age-appropriate.

Writing lacks appropriate vocabulary or word choices.

Overall Impression

Writing is engaging, well-crafted, and demonstrates strong effort and creativity.

Writing is generally engaging and shows effort, but may have some areas for improvement.

Writing is somewhat engaging, but lacks polish or effort.

Writing is not engaging, poorly crafted, or lacks effort.

Using this table, you can assign scores to each criterion for a given piece of writing, and the AI can provide feedback based on the scores and criteria established in the rubric. Remember to use human judgment and expertise in conjunction with the AI-generated feedback for a comprehensive and accurate evaluation of student writing.

Interaction #5: Scoring student work with rubric

Deborah: Can you use this rubric to grade and provide comments to the learner about this writing sample:

"the giant deep-sea squid can be longer than a city bus, and they are cosins of the octopus. a squid has a long body and ten arms. They live in the Caribbean. The Squid protects itself by hiding. a Squid can squirt out a blob of inky stuff. The inky blob may confuse hungry creatures Its eggs come in handy because it surrounds itself with eggs and protects itself that way. Isn’t this amasing that it can be camouflaged in one second They hide by moving with jet propulsion. If a shark tries to attack a squid, it will shoot out ink so it confuses the shark and zips away. Remember about the eggs this is something like it. They swim together to protect themselves. They squirt out dark liquid to hide themselves from enemies. They change their colors and patterns to blend in with their surroundings."

ChatGPT: Based on the rubric provided, here's an example of how the writing sample could be graded and feedback provided to the learner:

Criteria

Score 4

Score 3

Score 2

Score 1

Organization

3

-

-

-

Content

3

-

-

-

Grammar and Mechanics

2

-

-

-

Vocabulary

2

-

-

-

Overall Impression

2

-

-

-

Overall Score: 12/20

Feedback:

  1. Organization: Score 3 - The writing has a clear introduction and conclusion, but the body could be better organized with appropriate use of paragraphs for each idea.
  2. Content: Score 3 - The writing includes relevant details about the squid's characteristics and behaviors, but some sentences lack clarity and could be better connected to the main idea.
  3. Grammar and Mechanics: Score 2 - There are some errors in punctuation and capitalization that impact readability. Be sure to use proper punctuation and capitalization in your writing for greater clarity.
  4. Vocabulary: Score 2 - The vocabulary used is appropriate, but there could be more variety and precision in word choices to enhance the writing.
  5. Overall Impression: Score 2 - The writing is engaging, but could benefit from better organization, clarity, and attention to grammar and mechanics for a more polished and effective piece of writing.

Keep up the good work! Consider revising and editing your writing to improve the organization, clarity, and mechanics for a stronger and more effective composition.

Interaction #6: Scoring more student work

Deborah: Here is another student response can you use this rubric to grade and provide comments to the learner about this writing sample:

"There are many kinds of squids. The giant deep-sea squid can be longer than a city bus, and they are cousins of the octopus. A squid has a long body and ten arms. They live in the Caribbean. The Squid protects itself by hiding. A Squid can squirt out a blob of inky stuff. The inky blob may confuse hungry creatures. Its eggs come in handy because it surrounds itself with eggs and protects itself that way. Isn’t this amazing that it can be camouflaged in one second? They hide by moving with jet propulsion. If a shark tries to attack a squid, it will shoot out ink so it confuses the shark and zips away. Remember about the eggs? This is something like it. They swim together to protect themselves. They squirt out dark liquid to hide themselves from enemies. They change their colors and patterns to blend in with their surroundings. Therefore you can see that the squid have many ways of hiding."

ChatGPT: Based on the rubric provided, here's an example of how the second writing sample could be graded and feedback provided to the learner:

Criteria

Score 4

Score 3

Score 2

Score 1

Organization

-

3

-

-

Content

-

3

-

-

Grammar and Mechanics

-

3

-

-

Vocabulary

-

3

-

-

Overall Impression

-

3

-

-

Overall Score: 15/20

Feedback:

  1. Organization: Score 3 - The writing has a clear introduction and conclusion, and the body is generally well-organized with appropriate use of paragraphs for each idea. However, there could be some improvement in transitions between ideas for smoother flow.
  2. Content: Score 3 - The writing includes relevant details about the squid's characteristics and behaviors, and the use of examples and facts adds depth to the writing. However, some sentences could be rephrased for greater clarity and coherence.
  3. Grammar and Mechanics: Score 3 - The writing demonstrates good command of grammar and mechanics, with only a few minor errors. Keep up the good work, and consider proofreading for any remaining errors for a more polished piece of writing.
  4. Vocabulary: Score 3 - The vocabulary used is appropriate and varied, and it enhances the overall quality of the writing. Continue to use precise and descriptive language to make your writing more engaging and effective.
  5. Overall Impression: Score 3 - The writing is well-done with strong organization, content, grammar, and vocabulary. Keep up the good work and consider further revising and editing to enhance the overall effectiveness of your writing.

Great job! Your writing demonstrates a good understanding of the topic, and you've included relevant details to support your ideas. Keep practicing and refining your writing skills for continued improvement.

Interaction #7: Challenging AI scoring of student work

Deborah: I think the first sample had a much lower score for grammar and mechanics. There were several misspelled words, missing periods, and the first words of the sentences were not capitalized. Can you regrade the first sample and incorporate some of those errors into the feedback.

ChatGPT: You're right, I apologize for the oversight. Based on the errors in the first writing sample, here's an updated grading and feedback:

Criteria

Score 4

Score 3

Score 2

Score 1

Organization

-

-

2

-

Content

-

3

-

-

Grammar and Mechanics

-

-

-

1

Vocabulary

-

3

-

-

Overall Impression

-

-

2

-

Overall Score: 8/20

Feedback:

  1. Organization: Score 2 - The writing has a general structure with an introduction, body, and conclusion, but the ideas are not well-organized, and the paragraphs lack clear transitions between them.
  2. Content: Score 3 - The writing includes some relevant details about the squid's characteristics and behaviors, but there could be more elaboration and development of ideas for a more complete and engaging piece of writing.
  3. Grammar and Mechanics: Score 1 - There are several errors in grammar and mechanics, including misspelled words, missing periods, and inconsistent capitalization at the beginning of sentences. Careful proofreading and editing are needed to correct these errors and improve the overall quality of the writing.
  4. Vocabulary: Score 3 - The vocabulary used is appropriate and varied, and it adds to the overall quality of the writing. Keep using descriptive language to enhance your writing further.
  5. Overall Impression: Score 2 - While the writing has some strengths in terms of content and vocabulary, there are significant issues with organization and grammar/mechanics that need improvement. Remember to proofread and edit your work carefully to avoid errors and improve clarity.

Keep working on your writing skills and paying attention to grammar and mechanics to create more polished and effective pieces of writing.

Interaction #8: Adjusting grading system for student work

Deborah: Using this same attention to spelling and grammar provide scores and feedback for this text:

"There are many kinds of Squids. The giant deep-sea Squid can be longer than a city bus and they are cousins of the octopus. Squid have a long body and ten arms. They live in the Caribbean. The Squid protects it self by hiding. A Squid can squirt out a blob of inky stuff. The inky blob may confuse hungry creatures. Its eggs come in handy because it surrounds itself with eggs and protects itself that way. Isn’t this amazing that it can be camouflaged in one second? They hide by moving in jet propulsion. If a shark tries to attack a Squid it will shoot out ink so it confuses the shark and zips away. Remember about the eggs? This is something like it. They swim together to protect themselves. They squirt out dark liquid to hide themselves from enemies. They change their color and patterns to blend in with their surroundings. Therefore you can see that the squid have many ways of hiding."

ChatGPT: Based on the errors in grammar and mechanics in the provided writing sample, here's the grading and feedback:

Criteria

Score 4

Score 3

Score 2

Score 1

Organization

-

3

-

-

Content

4

-

-

-

Grammar and Mechanics

-

-

2

-

Vocabulary

-

3

-

-

Overall Impression

-

3

-

-

Overall Score: 12/20

Feedback:

  1. Organization: Score 3 - The writing has a clear introduction, body, and conclusion, and the ideas are logically organized with some transitions between paragraphs. However, further improvement can be made to ensure a smooth flow of ideas.
  2. Content: Score 4 - The writing includes relevant details about the squid's characteristics and behaviors, and there is evidence of good research. The ideas are well-developed and provide a comprehensive overview of the topic.
  3. Grammar and Mechanics: Score 2 - While there is improvement in grammar and mechanics compared to the previous sample, there are still some errors present, including capitalization inconsistencies, incorrect word usage, and missing punctuation. Proofreading and editing can help to correct these errors and enhance the clarity of the writing.
  4. Vocabulary: Score 3 - The vocabulary used is mostly appropriate and contributes to the overall quality of the writing. Continue to use descriptive language to add depth and richness to your writing.
  5. Overall Impression: Score 3 - The writing demonstrates solid content knowledge and organization, but there are still areas for improvement in grammar and mechanics. Keep refining your writing skills and paying attention to detail to create more polished and effective pieces of writing.

Great job on the content and organization! Keep working on improving your grammar and mechanics skills to further enhance your writing.

Interaction #9: Highlighting errors identified by AI

Deborah: what grammar and mechanics errors do you see?

ChatGPT: Based on the provided writing sample, some of the grammar and mechanics errors include:

  1. Capitalization inconsistencies: "Squids" should be capitalized as it is a proper noun, and "it self" should be "itself" as it is a single word.
  2. Incorrect word usage: "color" should be "colors" to maintain subject-verb agreement, and "in jet propulsion" should be "with jet propulsion" for clarity and proper phrasing.
  3. Punctuation errors: Missing periods after sentences, and missing question marks after interrogative sentences such as "Isn't this amazing that it can be camouflaged in one second?" and "Remember about the eggs?"
  4. Spelling errors: "mispelled" should be "misspelled", "grammer" should be "grammar", and "themselves" is spelled correctly, but "themself" is used in the text.
  5. Inconsistent use of verb tense: "Squid protects" should be "Squids protect" for consistency in verb tense.

These are some of the grammar and mechanics errors that can be addressed through proofreading and editing to enhance the clarity and accuracy of the writing.

Interaction #10: Challenging AI grading

Deborah:I don't understand why it should be "squid protect". explain

ChatGPT: I apologize for the confusion. The correct verb tense to use in this case would be "Squid protects" because "Squid" is a singular noun, and in the present tense, singular nouns usually take the third person singular verb form. So, the sentence should read, "The Squid protects itself by hiding." to maintain subject-verb agreement. Thank you for bringing that to my attention.

Reflections:

Time: This whole interaction took me about 15 minutes, which is a decent amount of time when you consider how quickly a teacher could grade this individual writing assignment. However, part of that was me “training” the AI model to do what I wanted it to do by asking it questions. Now that the model knows what I want and I have better prompts to put in, I am certain I could speed this process up. Scaling ChatGPT for use would require a more developed workflow.

Observation 1: Early in the conversation, ChatGPT graded the assignment, but mentioned that grading assignments is subjective and should involve a human element or judgment. Oddly, when it stated its own limitations, that made me feel more at ease. (Interaction 1 & 3)

Observation 2: ChatGPT was able to examine specific text that I copied and pasted into it. When I altered the text, its responses changed. The fact that ChatGPT was able to tease out the nuances of what I was inputting is just mind blowing. (Interactions 1, 5, and 6)

Observation 3: In interaction #7 I disagreed with the score it gave in interaction #6 and I asked it to regrade and give it a lower score. It did just what I asked. The point here is that ChatGPT will do what it is asked. It doesn't need an explanation or a reason. While it is great to have something that listens and does what you ask, remember you get what you put in. If you have a bias and you ask ChatGPT to reflect that bias, it will.

Observation 4:  Interactions #9 and #10 really get me excited! In #9 it points out a verb tense error and in #10 (with a little additional prompting) it explains the error. I love this because students can then see the error, but also get the "why?" so that, hopefully, in the future they don't make the error again.

Observation 5:  I pulled this writing sample off of Achieve the Core. I was careful to make sure that this writing sample was free of any student identifiable information. As we wade into using AI or having our students use AI, it is really important to consider federal, state, district, and school policies when inputting student work into AI models like ChatGPT.

Final thoughts

Attempts like the one I modeled will leave you excited, motivated, flabbergasted, irritated, worried, and any probably 100 other emotions, but I think it is worth the rollercoaster ride. AI models like ChatGPT are going to change the world we live in and ultimately alter the reality that our students experience now and in their future. By embracing the challenges of AI and expanding their cognitive horizons, educators will enhance future generations' capacity to thrive in an AI-driven era.

FREE teaching ideas and templates in your inbox every week!
Subscribe to Ditch That Textbook
Love this? Don’t forget to share
  • Armando Perez says:

    One thing to take into account is that ChatGPT sometimes does not give the same assessment results when you run the same writing through the rubric. I created a prompt to assess my high school Spanish writings and the program gave different grades on the same text. I then made it run the text several times through the rubric (up to 50 times) and calculated the mean grade and its standard deviation. As a result, I learned that I would have to run the program about 140 times for each text to get a 95% confident estimation of the average grade. It will be interesting to know it it was the prompt, the rubric or the innate correlational nature of the AI models.

    • Hallit says:

      I have written this into my prompt for ChatGPT. I told it to score the writing 200 times using the provided rubric and calculate the average score as a final score. It does it in seconds and has improved the consistency of responses on the same piece and multiple students so that the overall class is being graded more consistently. It’s no different than if I start grading and then go back and regrade a piece after I’ve done 30 students. I see things differently when I read through multiple times or pieces. But the computer can still do that in a fraction of the time I can.

  • Zena says:

    Thanks for some great thinking points. My current area of interest is encouraging student agency and self-reliance so this is incredibly useful.
    Just wondering if anyone has explored students gaining feedback before they seek external peer or teacher feedback on sites like Grammarly or Turnitin please.
    Trying to source the best AI for this specific purpose.
    Zena, Brisbane

    • COLM O 'CONNOR says:

      I’m actually working on a paper for my masters based off of this. I’ve already conducted surveys before and after give my students AI feedback. Overall the propper mixture. I also told the child before giving feedback to use cave growth mindset principles to encourage students.

  • Kaylee says:

    I love all of the points that you have made. I think using AI can be very beneficial to grade assignments, create rubrics, and check students work. I didn’t realize how many things you can do with AI!

  • Deon says:

    Hi, I hope you don’t mind. I used the rubric and created a simple ChatGPT essay grader. It’s a work in progress and I want to incorporate feedback from teachers. I am a teacher myself and will use this with my students. If anyone else wants to use it, please remember that it is just a tool to help us improve our efficiency. The essay grader is on ClassX dot org. You can find it in the main menu from the header.

  • Wendi Kuntz says:

    This is a simple 3rd-grade paragraph. What sense do you have about a high school research paper?

    • Vicki says:

      I am wondering the same thing.

    • Matt Miller says:

      I’ve heard of high school teachers using ChatGPT to gather feedback, create rubrics, and make suggestions. If students are old enough to use it (or, let’s be honest, are using it on their own anyway) the students can the the drivers of this and use it as a “first line of defense” in improving their work before turning it in.

    • Deborah Cleveland says:

      I 100% think you or students can use this type of tool for high school research papers. I chose to do a 3rd grade writing sample, because it was simple and short (easy to read in a blog post). To use this with high school student work, I’d change some of the prompts. In interaction #3 I asked it to create a 3rd grade rubric. I would change that to say “Create a rubric for a 10th grade research paper”. If I wanted to, I could even get more specific, for example: “Create a rubric for a 10th grade research paper that addresses the following categories: thesis statement, style, organization, quality of information, MLA format. Make each category worth a maximum of 20 points.” ChatGPT will give you what you ask for. Note: It probably won’t be perfect, but from there you can ask it to adjust the rubric. This includes changing the formatting, point values, and/or descriptors. Once it is what you want, you can copy and paste your paper and it will review it and give you suggested feedback.

      One other thing I’d note, you can input your rubric into ChatGPT and ask it to use that to provide feedback about writing. This way you don’t have to work with ChatGPT to create the rubric.

      • Zena says:

        Thanks for some great thinking points. My current area of interest is encouraging student agency and self-reliance so this is incredibly useful.
        Just wondering if anyone has explored students gaining feedback before they seek external peer or teacher feedback on sites like Grammarly or Turnitin please.
        Trying to source the best AI for this specific purpose.
        Zena, Brisbane

  • >