Substitute
Or: Neville’s further adventures in what the hell our students are getting up to without us realising, and whether this matters… I’ve been exploring Scholarcy, an apparently well-regarded suite of AI-powered tools for summarising academic papers, as used (so says their webpage) by people studying at Oxford, Cambridge, Harvard and Stanford (one assumes that the firm may simply be analysing the email addresses of people who’ve signed up, rather than this representing any sort of formal endorsement – though those do look like the official institutional fonts, whose use is normally guarded quite jealously, and the whole reason I’m looking at this is that I hear from colleagues that their universities are actively encouraging them to direct students towards this tool, so…)
One important thing I’ve learned – obviously it’s the usual toss-up as to whether this is widely known to everyone but me, or genuinely useful information – is that there are two broad categories of AI summarisers on the market: extractive and abstractive. The former extracts key sentences, phrases and information from the original text, preserving the wording, in order to produce a dramatically-shortened summary that, at least in theory, gives a reliable overview of the contents. The latter is generative, analysing the source text in order to produce its own summary of contents and argument – this is what you get from ChatGPT or Claude, and as far as I can see it’s the approach taken by JSTOR’s beta summary tool. As discussed previously, the latter left me unimpressed – and clearly the nature of the process means there’s always a possibility of the AI distorting the original or simply making stuff up.
On the face of it, the extractive approach (which I keep misremembering as ‘subtractive’, perhaps because of the impression that it’s stripping away less relevant material in order to get at the essence of a piece) looks potentially better, or at least less bad, simply because it isn’t going to add anything to the original that doesn’t belong. But there’s still a question as to whether an automatic reading of a piece, based (one assumes) on word frequency and on prioritising certain sentences (e.g. those at the beginning and end, opening sentences of sections and paragraphs etc.) is going to provide a reliable overview, such that we can feel at all sanguine about our students using it.
The obvious issue with my undertaking such a review is that, having been looking at GenAI in the context of history learning and teaching over the last eighteen months or so, my instincts about it are almost entirely negative – it’s crap, and it’s environmentally destructive – and this shapes my response to quite different forms of AI like these extractive programmes. I’ve had an initial look on Google Scholar for proper studies, without much success; plenty of puff pieces, hailing the advent of these tools as making the literature review process for scientific papers much more efficient, many of them written by people involved in the development and promotion of such tools, but as yet only one detailed attempt at evaluating AI summarisers by expert opinion, focused on computer science – which offers the equivocal conclusion that “The results are quite varied but do not give the impression of unanimous agreement that automatic summarizations are of high quality and are trusted. Uh huh.
This is clearly the sort of controlled study we need – not necessarily specific to historical studies, but certainly my impression is that the typical scientific publication is easier to analyse automatically than a humanities article – rather than just my prejudiced and subjective opinions. But in the absence of such a study, prejudiced and subjective opinions is what you get. I’ve used the same three case studies as last time – Hopkins’ ‘Taxes and Trade’, Finley’s ‘The Ancient City’ and my own ‘Transformation of Italy’ to see what Scholarcy’s free-to-use (daily limit) summariser makes of them – and in the interests of transparency, I will attempt to upload copies of the ‘flashcards’ generated by the system to this post (see bottom of post).
So… Overall, I do think that – because the notes are a lot more extensive, as well as the fact that they are drawn entirely from the text – the original article is recognisable in the Scholarcy output. It doesn’t grasp that my piece is a bit of counterfactual history (it does reference something to this effect at the end, but doesn’t convey how this relates to the rest of the summary), and, while it does echo Hopkins’ emphasis on the speculative nature of his arguments, the summary comes across as a list of substantive statements rather than a series of deductive propositions. It does, unlike JSTOR’s AI widget, identify Finley’s claim that the best model for the ancient city is the ‘consumer city’ – but it doesn’t give this much emphasis. Overall, I would suggest that Scholarcy is trained for, and hence most reliable for, publications that essentially present results and say what they mean, rather than any more complicated style analysis.
This is understandable, because a clearly-structured presentation of research questions, methods, data and results is a lot easier to analyse automatically and re-present – the repetition of Hopkins’ ‘first proposition’ in the output with no indication of what the second, third or any other proposition might be, suggests that the AI focuses on such explicit sign-posting. In other words, this tool is better suited to conventionally-structured scientific publications – and is clearly geared towards them, most obviously in the fact that there’s a separate section for ’Study Subject’, where the AI clearly wants to identify the number and nature of subjects in the experiment being described, and grasps randomly at any old number in the text when it can’t find this. It’s also geared to extract Findings, and seems to end up being quite random when these are not explicitly labelled as such; the list of Findings for these three articles are all things found in the text, but it’s not at all clear that they are the key points that the author intended his reader to take away.
And I have the same feeling about the rest of the notes; there’s nothing fake or imaginary here (apart from the fact that the AI isn’t used to the idea of an article being ‘By Keith Hopkins’, and so converts the author into ‘Hopkins, B.K.’). All the points in its summary, helpfully provided with page refs, are to be found in the articles in question – but here they are decontextualised, presented as isolated factoids or statements, with no indication of their relevance to an overall argument. (I do start to feel that I’ve seen this phenomenon in student work; perfectly correct references to an article, using this to make a point that doesn’t really have anything to do with that article’s core contribution). It’s striking how extensive the Scholarcy notes are, which I suspect reflects the programme’s difficulty in distinguishing between key ideas and the material used to support and develop them – again, it’s simply not programmed (or, particularly suitable) for a humanities/discursive style of article.
Basically, Scholarcy takes notes on absolutely everything it ‘reads’, with no sense of a broader framework or order of priorities. Which, to be fair, I have certainly done plenty of times when engaging with an unfamiliar topic, and I can certainly imagine many of my students doing – but this doesn’t result in notes that would be much use, or indeed at all intelligible, for anyone but me. It’s an interesting side-point that ancient history articles clearly are difficult to understand without some intellectual effort, given that they don’t present their material in a predictable, ordered and sign-posted manner – I can imagine demands for us to start writing in a more machine-friendly manner, just as I recall seeing articles criticising humanities scholars’ approach to coming up with article titles. But this does indicate that Scholarcy is liable to struggle with such publications unless/until we start writing them differently.
The drive behind these AI summarisers – the way in which they are being sold to potential users, especially students – is the idea of making research and the ‘literature review’ more ‘efficient’. Not being a scientist, I’m genuinely unsure whether this is a sound practice in scientific fields; assuming that the summary is more or less reliable, is it in fact better for the development of a credible project to draw on a condensed overview of every paper written on relevant subjects rather than a genuine understanding of a limited number of such papers? Is there an imperative in such publications to include every possible reference, regardless of the depth of one’s grasp of those references, without this becoming a problem? And – perhaps most importantly – is this predicated on the student/researcher already having a good knowledge of the subject from their lectures and classes, so the AI summary is only supplementary?
That is to say, is this issue over-determined when it comes to the humanities? The problem with using Scholarcy and similar tools in this field is not just that they don’t produce especially good results for more discursive publications, but also that students may be using them not to check recent research on a topic they already understand but to try to acquire that understanding in the first place. The AI summary gives you a selection of nuggets of information, which are neither representative of the actual argument of the article from which they are extracted nor, in this decontextualised form, much help for developing an overall understanding of the topic – so, whereas under former conditions we might hope that setting students off to research for an essay would simultaneously develop their overall knowledge and their analytical skills, in the world of AI summarisers this does neither?
Obviously, therefore, I would like students to read history publications properly, in the old-fashioned manner, learning to focus on the overall arguments and the ways in which sources and scholarship are used to develop those arguments, rather than extracting random nuggets. So, my next step is to think about how better to support students in developing the necessary skills, giving them more advice on how and what to read, emphasising quality of engagement over quality etc.
But they are already using these tools – perhaps quite extensively – and there seems to be a big push within the world of library and student support services to promote them. I vaguely wonder if there’s a ‘one size fits all’ issue – maybe these are fine for students in the sciences, maybe the issue is that a single suite of advice on research skills is being presented to students on a whole range of programmes. In other words, we historians need more of a concerted effort to counter them, without making it seem as if this is all designed just to make students’ lives impossibly difficult. The more we insist on them engaging with lots and lots of modern scholarship, the more we’re going to be presented with lots of superficial and borderline irrelevant references in their essays because they’ve turned to AI to make the reading task manageable.
Once again, we do need proper, detailed studies of the capacities and limitations of these tools, beyond this grumpy old man yelling at cloud preliminary analysis. And I imagine we need to engage directly with students; rather than just issuing blanket statements about the unhelpfulness of AI summaries; actually trying to work through articles and get them to do the analysis themselves, comparing their detailed readings with the AI output as part of developing their reading skills. As I’ve said before; I can see that these outputs aren’t hugely useful because I already have the broad knowledge of the topic and understanding of the key debates; the issue we face is how to get students to the same point of critical understanding of AI outputs, when there’s an obvious temptation for them to short-circuit the learning process by turning to AI…
scholarcy-on-hopkins-taxes-and-trade-1-1.docx
Neville Morley's Blog
- Neville Morley's profile
- 9 followers

