AI and entrapment: A cautionary tale

By Dr Sandra Leonie Field
Posted Tuesday 22 October, 2024

There’s a lot of discussion at present about how we university educators need to embrace and harness the potential of AI for enhancing student learning. To hold onto old ways of teaching and assessing in the era of AI would be a disservice to our students.

One common piece of advice, offered widely, including on the Monash AI learning circle website, is that AI might be useful for students to help them summarise readings and compose first drafts of essays. If students don’t need to waste their time on this initial grunt work, then they will be freed to spend more time on the higher-order critical thinking skills which are the true value of a university education.

Sounds great, right? Let’s see what happened when I tried to put this into practice. Spoiler alert: it didn’t turn out well.

AI-enhanced learning: The assessment setup and our expectations

The teaching team had planned an ‘analytical exercise’ assessment for early in the first year PPE (politics, philosophy, and economics) unit. The exercise is a foundation for the later essay: its purpose is to have students practise explaining important concepts and theories in their own words (specifically, three conceptual questions relating to the first week’s short readings, in a total of 600 words). 

Pre-AI, this kind of exercise was quite demanding for first-year students, but it seemed to us that AI would make it radically easier to perform well. So, rather than ban AI use (which would be unenforceable in any case), in the most positive and inclusive pedagogical spirit we decided to explicitly structure the assessment to teach the students how to use AI.

Rather than ban AI use (which would be unenforceable in any case), in the most positive and inclusive pedagogical spirit we decided to explicitly structure the assessment to teach the students how to use AI

We set the exercise up as follows. First, during the first tutorial, students were to discuss the first week’s materials with classmates, and draft a rough short response in their own words. Second, students were to run their draft, along with the entire instruction sheet, through a large language model (LLM) of their choice (e.g. ChatGPT or Claude). Third, students were to revise the resultant text, with or without AI as they pleased, to produce a polished final set of responses. Fourth, they were to write a brief reflection on the process of interacting with AI. All four components were submitted together, with 80% of the mark for the quality of the final three responses, and 20% for the reflection. Although the first two components were not directly graded, markers read them, in order to be able to assess the reflection.

Frankly, my expectation was that this exercise would be very easy. One of the teaching team ran some quick draft responses through a LLM, and found the LLM output to be of high quality. My main worry was that we would end up awarding High Distinctions to everyone, and as Chief Examiner I’d have to justify myself to the Board of Examiners for the inflated grades. 

AI-debilitated learning: The damaging effects of premature AI use

This was not how it turned out.

The assessment design enabled us to track three things: the student’s initial understanding, the LLM’s suggestions, and the response of the student to the suggestions. Comparing these components, we saw three different patterns of interaction with the LLMs, with a polarising impact on the students’ understanding.

  • Students who had demonstrated independent understanding of the material seemed, in general, to get at most a small boost from the LLM. If the initial draft was solid, LLM usually didn’t lead students astray, and maybe helped a bit in structuring their final responses.
  • For students who did not have a good grasp of the material, the LLM seemed to make things much worse. Whatever misunderstandings were in the initial draft, using the LLM seemed to amplify and exacerbate them. If the student didn’t have their own compass on what the question was asking and what the readings were about, they may have been unable to identify, let alone rein in, the LLM errors. I wonder if the LLM also gave some of them a false sense of security? The ‘reflection’ comments of the weak responses uniformly praised the LLM for assisting them to understand and articulate their ideas. Or were they just putting on a brave face when in fact they were deeply uncertain about how to evaluate the LLM’s outputs?
  • Less frequent but most troubling to me as a teacher were the cases where the initial draft had good elements, but then the student adopted LLM changes which made the response worse. Here, the student had done good work but probably lacked self-confidence in their understanding.

Whatever misunderstandings were in the initial draft, using the LLM seemed to amplify and exacerbate them

How exactly did we judge the degree of understanding and the role of LLMs in assisting/impeding it? Well, LLMs operate by predictive processing. The questions we set were very straightforwardly focussed on core concepts in short set readings that were discussed in the workshop/lecture and in the tutorial. But the concepts did not necessarily map onto the most ‘expected’ ideas that the LLM returned from its training. We would see a characteristic set of new ideas and directions introduced in the ‘LLM revision’ stage of the assessment which were not supported or even suggested by the unit materials. And what we saw was that many or most students lacked the confidence to ignore or modify the misleading/irrelevant ideas from the LLM when producing their final version of their answer.

For instance, the third question addressed just over 2 pages of text, which had been extensively broken down in the lecture. The topic was whether the problem of climate change can be solved without sacrifice. The reading proposes that we current generations should compensate ourselves for the hardship of climate action by helping ourselves to some of future generations’ resources: even though that is unfair to the future, they’ll still be better off than if we just let climate change happen. This is a provocative argument, and also a distinctive and novel one. But when LLMs are prompted to discuss how climate change can be addressed without sacrifice, they tend to talk generically about carbon taxes. In so doing, they leave out the central idea of the set readings and the workshop/tutorial.

Technological enthusiasm and pedagogical entrapment

So, what is the lesson?

First, only with a solid substantive understanding of the subject matter under their belt do students have any hope of navigating LLMs to their advantage.   

Second, premature encouragement to use AI leads to an increase of bullshit (in the technical sense of the word). I want students to know what their words mean in every single sentence that they write, but in many responses, I saw plausible smooth AI-generated sentences that the student evidently did not understand.

Third, premature encouragement to use AI actually undermines student learning and self-confidence. One student spoke to me about how she felt tricked by the assignment, and blindsided to receive the marker’s negative evaluation of her work. The assignment encouraged her to abandon her normal old-fashioned work practice. But in hindsight, the old-fashioned, patient, thorough study of set materials was exactly where she needed to start.

I have kids at primary school. When they learn maths, they aren’t allowed to use calculators. Not because calculators aren’t useful technology, but because the purpose of education is to give us the intellectual skills and resources to put technology to our service. Learn the basic maths first, then afterwards you will be able to use a calculator with understanding! In high school, mathematical learning does involve use of calculators.

We need to develop some analogous sense of pedagogical sequencing for AI in university education. The conceptual and argumentative skills that are the core output of an arts degree are slow and difficult to build. Premature introduction of AI can inhibit them. We need to open the conversation: what is the right sequencing of learning for our students?

Here’s a different way of thinking about the role LLMs could usefully play in supporting students to write argumentative essays. Abizadeh argues that it’s only through the difficult process of writing up and developing their ideas that students learn how to think critically. So students should fully draft their essays without use of AI, and only at a relatively late stage bring in AI to assist with revisions. Of course even if this advice is good, it is unpoliceable, and if put forward at Monash, students might be sorely tempted to ignore it. But at McGill, Abizadeh doesn’t have to worry about motivation: his students are held accountable for the skills and knowledge they achieve through the semester by an invigilated no-AI final exam.

 What do you think?

Dr Sandra Leonie Field

Sandra Leonie Field is Lecturer in Philosophy at Monash University. She teaches within the Philosophy and the Politics, Philosophy, and Economics programs. Previously, she taught at Yale-NUS College, Singapore. She is the author of Potentia: Hobbes and Spinoza on Power and Popular Politics (New York: Oxford University Press, 2020).