Neville Morley's Blog, page 9
August 16, 2024
Waiting Hopefully
Tim Walz, governor of Minnesota and Democratic candidate for Vice President, addressing the AFSCME in Los Angeles on 13th August: “My wife often reminds me, hope is a great word and a beautiful name, but it’s not a damn plan… You don’t hope to win, you plan, prepare and work to win.”
The Athenians at Melos (Thucydides 5.103): “Hope is certainly an encouragement in time of danger, and those who rely on hope when they have other resources may be damaged but are not destroyed by it. Hope, however, is prodigal by nature, and those who stake everything they have on it see the truth only at the moment of disaster…”
Gwen Walz is the Thucydides reader, then?
August 12, 2024
Substitute
Or: Neville’s further adventures in what the hell our students are getting up to without us realising, and whether this matters… I’ve been exploring Scholarcy, an apparently well-regarded suite of AI-powered tools for summarising academic papers, as used (so says their webpage) by people studying at Oxford, Cambridge, Harvard and Stanford (one assumes that the firm may simply be analysing the email addresses of people who’ve signed up, rather than this representing any sort of formal endorsement – though those do look like the official institutional fonts, whose use is normally guarded quite jealously, and the whole reason I’m looking at this is that I hear from colleagues that their universities are actively encouraging them to direct students towards this tool, so…)
One important thing I’ve learned – obviously it’s the usual toss-up as to whether this is widely known to everyone but me, or genuinely useful information – is that there are two broad categories of AI summarisers on the market: extractive and abstractive. The former extracts key sentences, phrases and information from the original text, preserving the wording, in order to produce a dramatically-shortened summary that, at least in theory, gives a reliable overview of the contents. The latter is generative, analysing the source text in order to produce its own summary of contents and argument – this is what you get from ChatGPT or Claude, and as far as I can see it’s the approach taken by JSTOR’s beta summary tool. As discussed previously, the latter left me unimpressed – and clearly the nature of the process means there’s always a possibility of the AI distorting the original or simply making stuff up.
On the face of it, the extractive approach (which I keep misremembering as ‘subtractive’, perhaps because of the impression that it’s stripping away less relevant material in order to get at the essence of a piece) looks potentially better, or at least less bad, simply because it isn’t going to add anything to the original that doesn’t belong. But there’s still a question as to whether an automatic reading of a piece, based (one assumes) on word frequency and on prioritising certain sentences (e.g. those at the beginning and end, opening sentences of sections and paragraphs etc.) is going to provide a reliable overview, such that we can feel at all sanguine about our students using it.
The obvious issue with my undertaking such a review is that, having been looking at GenAI in the context of history learning and teaching over the last eighteen months or so, my instincts about it are almost entirely negative – it’s crap, and it’s environmentally destructive – and this shapes my response to quite different forms of AI like these extractive programmes. I’ve had an initial look on Google Scholar for proper studies, without much success; plenty of puff pieces, hailing the advent of these tools as making the literature review process for scientific papers much more efficient, many of them written by people involved in the development and promotion of such tools, but as yet only one detailed attempt at evaluating AI summarisers by expert opinion, focused on computer science – which offers the equivocal conclusion that “The results are quite varied but do not give the impression of unanimous agreement that automatic summarizations are of high quality and are trusted. Uh huh.
This is clearly the sort of controlled study we need – not necessarily specific to historical studies, but certainly my impression is that the typical scientific publication is easier to analyse automatically than a humanities article – rather than just my prejudiced and subjective opinions. But in the absence of such a study, prejudiced and subjective opinions is what you get. I’ve used the same three case studies as last time – Hopkins’ ‘Taxes and Trade’, Finley’s ‘The Ancient City’ and my own ‘Transformation of Italy’ to see what Scholarcy’s free-to-use (daily limit) summariser makes of them – and in the interests of transparency, I will attempt to upload copies of the ‘flashcards’ generated by the system to this post (see bottom of post).
So… Overall, I do think that – because the notes are a lot more extensive, as well as the fact that they are drawn entirely from the text – the original article is recognisable in the Scholarcy output. It doesn’t grasp that my piece is a bit of counterfactual history (it does reference something to this effect at the end, but doesn’t convey how this relates to the rest of the summary), and, while it does echo Hopkins’ emphasis on the speculative nature of his arguments, the summary comes across as a list of substantive statements rather than a series of deductive propositions. It does, unlike JSTOR’s AI widget, identify Finley’s claim that the best model for the ancient city is the ‘consumer city’ – but it doesn’t give this much emphasis. Overall, I would suggest that Scholarcy is trained for, and hence most reliable for, publications that essentially present results and say what they mean, rather than any more complicated style analysis.
This is understandable, because a clearly-structured presentation of research questions, methods, data and results is a lot easier to analyse automatically and re-present – the repetition of Hopkins’ ‘first proposition’ in the output with no indication of what the second, third or any other proposition might be, suggests that the AI focuses on such explicit sign-posting. In other words, this tool is better suited to conventionally-structured scientific publications – and is clearly geared towards them, most obviously in the fact that there’s a separate section for ’Study Subject’, where the AI clearly wants to identify the number and nature of subjects in the experiment being described, and grasps randomly at any old number in the text when it can’t find this. It’s also geared to extract Findings, and seems to end up being quite random when these are not explicitly labelled as such; the list of Findings for these three articles are all things found in the text, but it’s not at all clear that they are the key points that the author intended his reader to take away.
And I have the same feeling about the rest of the notes; there’s nothing fake or imaginary here (apart from the fact that the AI isn’t used to the idea of an article being ‘By Keith Hopkins’, and so converts the author into ‘Hopkins, B.K.’). All the points in its summary, helpfully provided with page refs, are to be found in the articles in question – but here they are decontextualised, presented as isolated factoids or statements, with no indication of their relevance to an overall argument. (I do start to feel that I’ve seen this phenomenon in student work; perfectly correct references to an article, using this to make a point that doesn’t really have anything to do with that article’s core contribution). It’s striking how extensive the Scholarcy notes are, which I suspect reflects the programme’s difficulty in distinguishing between key ideas and the material used to support and develop them – again, it’s simply not programmed (or, particularly suitable) for a humanities/discursive style of article.
Basically, Scholarcy takes notes on absolutely everything it ‘reads’, with no sense of a broader framework or order of priorities. Which, to be fair, I have certainly done plenty of times when engaging with an unfamiliar topic, and I can certainly imagine many of my students doing – but this doesn’t result in notes that would be much use, or indeed at all intelligible, for anyone but me. It’s an interesting side-point that ancient history articles clearly are difficult to understand without some intellectual effort, given that they don’t present their material in a predictable, ordered and sign-posted manner – I can imagine demands for us to start writing in a more machine-friendly manner, just as I recall seeing articles criticising humanities scholars’ approach to coming up with article titles. But this does indicate that Scholarcy is liable to struggle with such publications unless/until we start writing them differently.
The drive behind these AI summarisers – the way in which they are being sold to potential users, especially students – is the idea of making research and the ‘literature review’ more ‘efficient’. Not being a scientist, I’m genuinely unsure whether this is a sound practice in scientific fields; assuming that the summary is more or less reliable, is it in fact better for the development of a credible project to draw on a condensed overview of every paper written on relevant subjects rather than a genuine understanding of a limited number of such papers? Is there an imperative in such publications to include every possible reference, regardless of the depth of one’s grasp of those references, without this becoming a problem? And – perhaps most importantly – is this predicated on the student/researcher already having a good knowledge of the subject from their lectures and classes, so the AI summary is only supplementary?
That is to say, is this issue over-determined when it comes to the humanities? The problem with using Scholarcy and similar tools in this field is not just that they don’t produce especially good results for more discursive publications, but also that students may be using them not to check recent research on a topic they already understand but to try to acquire that understanding in the first place. The AI summary gives you a selection of nuggets of information, which are neither representative of the actual argument of the article from which they are extracted nor, in this decontextualised form, much help for developing an overall understanding of the topic – so, whereas under former conditions we might hope that setting students off to research for an essay would simultaneously develop their overall knowledge and their analytical skills, in the world of AI summarisers this does neither?
Obviously, therefore, I would like students to read history publications properly, in the old-fashioned manner, learning to focus on the overall arguments and the ways in which sources and scholarship are used to develop those arguments, rather than extracting random nuggets. So, my next step is to think about how better to support students in developing the necessary skills, giving them more advice on how and what to read, emphasising quality of engagement over quality etc.
But they are already using these tools – perhaps quite extensively – and there seems to be a big push within the world of library and student support services to promote them. I vaguely wonder if there’s a ‘one size fits all’ issue – maybe these are fine for students in the sciences, maybe the issue is that a single suite of advice on research skills is being presented to students on a whole range of programmes. In other words, we historians need more of a concerted effort to counter them, without making it seem as if this is all designed just to make students’ lives impossibly difficult. The more we insist on them engaging with lots and lots of modern scholarship, the more we’re going to be presented with lots of superficial and borderline irrelevant references in their essays because they’ve turned to AI to make the reading task manageable.
Once again, we do need proper, detailed studies of the capacities and limitations of these tools, beyond this grumpy old man yelling at cloud preliminary analysis. And I imagine we need to engage directly with students; rather than just issuing blanket statements about the unhelpfulness of AI summaries; actually trying to work through articles and get them to do the analysis themselves, comparing their detailed readings with the AI output as part of developing their reading skills. As I’ve said before; I can see that these outputs aren’t hugely useful because I already have the broad knowledge of the topic and understanding of the key debates; the issue we face is how to get students to the same point of critical understanding of AI outputs, when there’s an obvious temptation for them to short-circuit the learning process by turning to AI…
scholarcy-on-hopkins-taxes-and-trade-1-1.docx
August 2, 2024
‘We Can Read It For You Wholesale’
One of the more irritating features of the last couple of years has been the rush to incorporate AI into everything, not because there is much evidence that it makes anything better or that there is huge public demand for this, but for Fear Of Missing Out in case it does turn out to be indispensable. One wonders how far these highly-paid and supposedly brilliant, hard-nosed business executives have actually bought into the hype and how far they simply assume that everyone else has – though “let’s make our product/service crappier because lots of people think that a fancy autocomplete widget is going to evolve into her’s Samantha” is a… courageous strategy either way.
Universities being what they are, this hasn’t happened to us yet; we’re still at the stage of educational consultancies writing reports about how There Is No Alternative to such a development so institutions should pay them large sums of money to implement the transformation. As ever, the ingrained idea that adapting to the future will involve disruption and destruction segues into the assumption that disruption and destruction will necessarily fit the institution for the future. Of course, replacing senior management teams with a robotic algorithm, or indeed a speak-your-weight machine, is unlikely to make things significantly worse, assuming that it’s even noticeable. But for the moment, the front runner in the ‘headlong rush into gAI stupidity’ stakes is… JSTOR.
I’ve moaned before about JSTOR’s ‘if you enjoyed this recent publication, you might like this 1952 article on a vaguely related subject’ widget, but was willing to believe that it was someone’s idea of being helpful. More cynical interpretations are available; isn’t this exactly the same technique that other, less ostensibly high-minded websites use to keep us on their platforms, clicking on ever more content and exposed to ever more adverts? Okay, JSTOR isn’t trying to sell us anything (yet…) – but do they have an incentive to ensure that users interact with as many publications as possible? Are we in a rabbit/duck situation, where what seems to us academics to be a hugely valuable gateway to published research is at the same time a means for publishers to maximise article views, driving up revenues via their contracts with libraries..?
Annoying as it is to have students citing lots of outdated and irrelevant publications in the belief that this is what independent research looks like, it pales in comparison with JSTOR’s latest wheeze: built-in AI, which allows you to ask questions about a given publication and get a summary of it. I mean, what the hell?!? How is this anything other than a mechanism for minimising engagement time with individual publications in order to maximise the number that can be engaged with? Quantity over quality? It takes for granted, or at best acquiesces in, the idea that reading an article is just about extracting its core content from irrelevant waffle and that the goal of research is to amass as many references as possible – seeing these as things that can be made more efficient through technology. Trying to teach research skills is going to be SO much more entertaining when the actual tools and resources are modelling bad practice. In terms of humanities research, not so much “they’re using our own satellites against us” as “the call is coming from inside the house”…
This seems very bad in principle, even if the summaries were good and reliable. As I’ve suggested before, we really need some decent research on this latter question, given how far many of our students seem to trust them. First indications aren’t great; gAI seems to ‘shorten’ rather than ‘summarise’, incapable of differentiating between sentences that are essential for articulating the argument and those that develop or support it, and continues intermittently to insert irrelevant and unreliable information. But obviously I had to try it out for myself.
The immediate temptation is to experiment with one’s own work; after all, I know exactly what I was arguing* so can easily evaluate the accuracy of the summary. It’s interesting to discover, for example, that my 2013 article on ‘Thucydides Quote Unquote’ primarily “sheds light on the enduring influence of Thucydides’ quotes and the evolving interpretations of his writings over time, highlighting the significance of these quotes in shaping perceptions of history, politics, and military strategy” – not completely wrong, except that my focus is on quotes that aren’t actually Thucydides, of which the summary offers just one example in passing, rather than making it the focus of the whole thing. You wouldn’t gather from the summary of ‘Decadence as a theory of history’ (2007) that it was a piece on historiography and the history of ideas, not ‘decadence’ as a real phenomenon in history, but apparently “The narrative of decay and decadence is a central theme in Morley’s analysis of historical processes and societal evolution.”
The obvious objection here is that I know what I meant to argue, but maybe it isn’t actually clear in practice, so the fact that the summary gets it wrong isn’t the decisive argument I think it is. Yes, it takes my 2001 JRS article completely at face value as an alternative narrative of the demography and development of Roman Italy in the Republic, ignoring the introduction and conclusion which make it clear that this is a thought experiment which concludes that this version of events isn’t plausible – but I know of at least one respected colleague who read it in the same way. Can the AI summariser be blamed if my exposition was actually a bit crap?
Better, then, to test it on something that is widely acknowledged as brilliant, insightful and above all a model of clarity of argument – exactly the sort of piece we might direct students to read both for the importance of its argument and for the way in which the ideas are developed. What does it make, for example, of Keith Hopkins’ classic piece of speculative model-building, ‘Taxes and Trade in the Roman Empire’ (JRS 1980):
In this essay on Roman economic history, the author explores the impact of taxation and trade on the Roman economy from 200 B.C. to A.D. 400. The author discusses the volume of silver coins in circulation, military expenditure, and the integration of the monetary economy in the High Empire. The text also estimates the level of taxation in the Roman Empire and its importance in the economy, highlighting the role of tax-farming and the collapse of the fiscal system in the third century A.D. The author concludes that the breakdown of central control over taxation led to the restoration of central control by the strong government of Diocletian and Constantine. The essay aims to stimulate further research and debate in the field of Roman economic history.
Uh huh. I mean, as a summary of the topics that are discussed in Hopkins’ article it’s not actually wrong, but it entirely misses every point of the exercise, above all the radical nature of Hopkins’ argument in theoretical and methodological terms. It seems fair to say that any student relying on this summary would be completely at sea as to why anyone would bother reading the article at all forty-five years later.
Okay, M.I. Finley, ‘The ancient city: from Fustel de Coulanges to Max Weber’ (1977) – THE article in which the idea of the ‘consumer city’ as an ideal type was put forward, which then dominated debates about the relationship between city and countryside in antiquity for decades:
In this academic article, M.I. Finley delves into the significance of ancient cities in the Graeco-Roman world, highlighting how even the rural population resided in communities rather than isolated farms. The growth of towns was deemed crucial for civilization, with Aristotle emphasizing health, defense, politics, and aesthetics in urban planning. Critiquing Fustel de Coulanges’ work, the author stresses the necessity for a more comprehensive understanding of urban development, considering economic and social factors that influenced ancient cities’ rise and fall. Drawing on the insights of scholars like Marx and Weber, the article explores the economic structures of ancient cities, emphasizing the separation of urban and rural areas. By advocating for a detailed analysis of available data and proposing a typology of ancient towns based on various factors, the author aims to enhance our comprehension of the role of cities in the ancient world.
‘Consumer city’? It’s there in the passing reference to typology, if you know that you should be looking for it, but you would never otherwise gather that this is the crucial analytical point. The most generous interpretation I can offer is that maybe the importance of this argument is recognisable only if you’re aware of the issues and pay attention to Finley’s footnotes and passing remarks, rather than if you just focus on the substance of what he says – but that still amounts to saying that the summary is basically useless for getting a proper understanding of the article.
This is actually quite horrifyingly addictive – and, yes, I’m already past the point where I think the tool is going to be at all useful, and instead trying to think of articles which it’s liable to get spectacularly wrong in an entertaining manner. In brief, this strongly confirms the sense that AI offers at best something like a compressed version of a piece, or rather a version in which material from the original has been subtracted on a statistical basis, i.e. focusing on repeated words and phrases, rather than on any sort of qualitative (let alone informed) basis. It tells you, more or less, what the article contains in the way of subject matter; it really isn’t very good at presenting what the author may be doing with that subject matter, and may well entirely miss the point. And it’s really not going to be much use to anyone actually trying to research the topic.
The good news is that JSTOR’s tool is still in Beta, so conceivably they might pay attention to a combination of loud objections and a great deal of jeering and mockery. But much will depend on why they thought this was a good idea in the first place – it may not be very different from trying to yell at Amazon or Google…
*Full disclosure: this is not always entirely true. I recently received my copy of Blouin & Akrigg, eds., The Routledge Handbook of Classics, Colonialism, and Postcolonial Theory, which contains a chapter by me on Thucydides, most of which I have no real recollection of writing – but it was the first thing I managed to write while struggling to emerge from Long COVID brain fog, so there is a possible reason for this…
July 29, 2024
Twelve Days in the Year: 27th July 2024
Some time in the small hours, despite being fast asleep, I realised there was a strange noise somewhere in the house, an unfamiliar hum mixed with white noise. Strange noises are rarely good, so had to get up to investigate it; fairly quickly the culprit was established as A’s laptop, sitting innocently on the desk in the bedroom. No obvious sign of why it was doing this – other than the fact that it was on at all, whereas she generally makes a big thing of shutting it down completely and getting cross when I leave devices to go into sleep mode (which is admittedly a reasonable position when you have cats liable to start walking around on keyboards in the middle of the night). Thankfully the noise stopped shortly after I pulled out the power cord, but it then took a long time to get back to sleep – and when it came, it was a vivid, disturbing dream in which I was in a huge church service with little idea what was going on, being alternately pushed forward and held back and talked at by people whom I couldn’t for the most part understand, without knowing whether this was because they were mumbling or speaking an unknown language. Woke past eight o’clock feeling dreadful.
Because the last couple of weekends have been disrupted and tiring (A away chiahuahua-sitting while I failed to get anything useful done and all my cooking projects failed the week before, down at my parents’ to do helpful tasks the week before that), we’d planned to have a quiet and relaxing two days, including having breakfast at a cafe the other side of Frome – so, a little paradoxically, a nice lie-in was not an option. Quick cup of tea then shower, and then had to get the cats in from their morning excursion. Hector responded readily enough, but no sign of Olga – the signal from her collar (we’ve had too many incidents of cats getting themselves locked in sheds, other people’s houses, the Methodist chapel down the road etc. that we need to be able to track them) showed that she was nearby but not moving, which is usually a sign that she’s lost or dumped the collar tag.
Coming down into the lower part of the garden, I saw a flash of ginger fur in the far corner – Olga is not ginger. Rather big to be a cat, and so it proved as a youngish fox either failed to get over the fence into next door’s garden or decided on a different path, so ran across to the other side of ours and disappeared (shortly after, there was some frenzied barking in the distance). Olga had presumably been spooked by this, as she still didn’t respond; wandering around to triangulate her position with the directional finder showed that she must be lurking at the bottom of one of the gardens adjoining ours, out of reach from either direction – unless she had ditched the tag. We gave her five minutes – A getting cross with me because I was getting cross – and she wandered in of her own accord as if nothing had happened.
So, off to Beckington, just north of Frome, much later than planned, to Café Mes Amis, which does a nice range of light breakfasts and an excellent selection of cakes. It turns out that being late is actually an advantage, as the first thing breakfast crowd is departing and the mid-morning lot are only just starting to arrive, so we get a table straight away, and get our breakfasts (bacon, mushrooms and avocado for A., sourdough toast and a croissant bread and better pudding for me) reasonably promptly, before the service starts to go to pieces under pressure – though in retrospect it would have been better not to order a second coffee…
Lots of interesting people-watching opportunities; it is, as A observes, the most upper middle class café ever, if you discount Ottolenghi in Chelsea – a lot of well-groomed, well-heeled fifty somethings with beards, a lot of ladies who presumably lunch after their black Americanos and tiny tiny pastries while they discuss what each other’s therapists had to say last week – and a deeply annoying woman next to us, whose yappy dog, clearly her baby, is allowed to clamber over everything, within inches of the loaves for sale, while her husband stoically eats half an almond croissant while waiting for their coffees.
We buy a couple of slices of orange and polenta cake for later (which turns out to be too sweet with insufficient polenta, but could be worse) and head for home, via supermarket to buy a few bits and pieces for the barbecue planned for the evening (now with added neighbour, A2, who has been thrown off balance by news from a different neighbour, A3 – WHY THE HELL IS EVERYONE ROUND HERE OF A CERTAIN AGE CALLED SOME VARIANT OF ANNE?!? – whose heart seems to be failing…) At home, I dig salad potatoes and make mayonnaise, which doesn’t curdle but is rather thick as A. (my A) has requested less vinegar; it’s bland, frankly, but better to meet customer expectations… Walked around town putting up posters for the local Big Bat Count I’m organising in a couple of weeks’ time, plus asking shops to display them- which leads to entertaining chats in both the hardware store and the bookshop. I have hopes for a decent take-up this year, certainly if the QR code for booking actually works. On the way back we run into a friend, who updates us on the death of another elderly neighbour, as a result of falling from a ladder and then falling out of bed in hospital; as we’ve said before, we really need to get a few more friends who are not 70+ and therefore might last a bit longer…
Down to the wine shop, as A has promised to introduce the owner to Bulgarian rosé (her favoured drink since our holiday at Easter); owner is impressed, and promises to look for some – which might make life easier, as the wine shop in London where I order it at the moment tends to employ completely useless delivery firms. I feel far too tired to risk drinking in the middle of the afternoon, despite some of their craft beers looking quite tempting, which makes my presence a bit redundant and off-putting, but then we’re introduced to a friend of the enterprise who calls into the shop for a chat and stays for a glass, and that passes the time nicely.
Back home for tea and cake and to let the cats out, then get the barbecue fired up for padron peppers and slices of courgette before the usual assortment of meat; meanwhile, finish off potato salad and make coleslaw (why remain too bland for my taste, but hey ho). A2 arrives punctually, and I manage to get food ready within ten minutes rather than the usual half hour to get the hamburgers cooked. We’re set to eat outside until it starts to drizzle; it then cheers up again, so we have after-dinner drinks on the work-in-progress lower patio (aka the Pigsty) at the very bottom of the garden. Lots of chat about A3, the way her partner is behaving, the idiosyncrasies of cats (we’re still slowly getting Buddy used to the fact that he has to share the house with others; A2 has recently adopted a rescue cat after her elderly ginger boy died).
She doesn’t stay too late – not to be a misery, but this is a relief. We watch an episode of Buffy – it’s annual rewatch time, we don’t have any interest in the Olympics, and we’re too tired for anything more taxing – and then off to bed, hoping to sleep.
July 21, 2024
Read ‘Em And Weep
One of the more sensible comments in the first phase of “ChatGPT can ace our assessments! The entire system is compromised! We must revert to in-person unseen exams at once!” panic was that, if a sophisticated auto-complete app can perform well in student assessments, it’s because we’ve been assessing students on their ability to imitate an auto-complete app, or at any rate over-valuing things that don’t actually require understanding or analytical skills. The good news, at least in the humanities, is not just that our typical assessment tasks and marking criteria don’t need much if any tightening up to render GenAI assistance unhelpful for coursework writing for any but the most desperate and panicking student, but also that they recognise this; in the surveys and focus groups I’ve been running this year, only a few say they would consider using GenAI to help write assignments, and they have a strong sense that its outputs are deficient, at any rate for university-level historical studies.
The less good news is that a much higher proportion of students is perfectly happy to turn to GenAI for feedback on their work, for essay plans and outlines of topics, and for summaries of books and articles. This needs further investigation – my project was focused specifically on assessment, which turns out to be less of an issue than expected – but my immediate reaction is that many students can recognise the lack of complexity of the output, the absence of evidence and references and the unnuanced, dogmatic style (“it doesn’t write like a student”), but apparently accept its basic contents as reliable.
This strongly indicates that we need to spend time exploring these tools with our students (which can be a useful exercise in critical analysis in its own right, rather than just a subtle means of discrediting them), but we do need to consider whether we might be, as in the example of assessment practices, inadvertently giving them the wrong idea or incentivising poor practices. Why are they asking ChatGPT for feedback, for example? Its easy availability compared with academic staff? The fact that it will read entire draft essays without complaint? The nature of the advice it offers, and/or the tone in which it’s delivered? I don’t read drafts (leaving aside the two-part assessment in my final year modules, where students submit a draft for c.20% of the overall mark and get feedback to help improve the final 80% version), but I do try to make myself available for consultation, and relatively few take me up on this – so why is ChatGPT more attractive as an interlocutor?
This feels even more urgent when it comes to the idea that a GenAI summary of an assortment of publications and an overview of the topic is an adequate substitute for actually reading a load of stuff yourself. This does connect to the recent Ex-Twitter discourse about how much reading one can/should assign – are students less capable of reading than they used to be, or somehow have much less time? Certainly many students feel that they are being asked to read far more than they can manage – so automating the process inevitably looks attractive, let alone for those with dyslexia or similar issues. But this is reading understood as the simple extraction of a few key points and bits of important information, rather than as an important skill and exercise in its own right. It’s not active, or critical, or contextual, or multi-layered; just mechanical summary.
Well, we can’t have taught them that! Perhaps just for the sake of argument, I do wonder whether we should be so complacent. The combination of expecting students to engage with a decent number of secondary sources in their work and the expectation that such sources should be cited rather than simply listed in the bibliography might incentivise ‘mining’ as many publications as possible for useful nuggets of information rather than properly engaging them. When we refer to works of scholarship in class, it’s more likely to be for a core idea than for the development of the argument or the elegance of the prose. How often, outside the rarified and well-resourced world of Oxbridge, does anyone have time to model a bit of close reading of scholarship for students?
And, while we might smugly assume that our reading practice isn’t like that, that’s not entirely true, at least at a system level. What is the REF process, or progress and promotion review, if not a means of rendering a complex bit of writing into a short summary and judgement about its intellectual contribution? Indeed, we are encouraged to make our publications more amenable to such mechanical reading by highlighting key claims in the language of assessment, to make them easier to spot. Finally, whatever we might say to students about the need for careful, critical engagement, that is rarely what we manage when it comes to their own work. No, we don’t have time – but that’s exactly why they’re turning to GenAI…
June 28, 2024
Twelve Days in the Year: 27 June 2024
Woken by yelling cats – two to be let outside, one wanting breakfast – from very strange dream, A and I raiding offices of history magazine to get back some vital bit of evidence that would, erm, enable me to finish a short article on the reception of the Elder Pliny in 18th-Century Somerset. I have no idea what this was about, but my unconscious had a very detailed idea of the different texts involved and the key lines of analysis (unfortunately this all faded away as I emerged slowly from sleep, so this piece won’t be appearing any time soon), which made for an odd juxtaposition with the real feeling of suspense and anxiety as to whether we could disable the burglar alarm or whether A printing off a copy of the evidence was making too much noise.
Fatigue that’s been afflicting me for a week or more – probably hayfever-related – hasn’t improved much, so it takes a while to wake up properly. Buddy has made his way onto the bed for a cuddle; he is slowly building up confidence and feeling at home, even if he is not yet reconciled to the presence of other cats, and when he gets pushed under my legs when Hector comes in and also wants fuss, he accepts this with barely a grumble (he is the most grumbling cat we’ve ever had) and goes to sleep while Hector snuggles up against me on top of the duvet.
A. is in full-action mode, as old school friend plus husband are visiting for the Glastonbury Festival, so everything needs to be spotless. I get up earlier than feels entirely desirable so I can have a shower before getting on with deep clean of bathroom (kitchen and dining room were yesterday), with brief pause to check greenhouse (a bit of watering, plus shaking the flowers on the chilli plants to get them to set fruit). Down into town to buy assorted provisions and get money out, plus post a letter – as a thank-you to colleague in Czechia for his hospitality at a conference, I got him a copy of a book I’d been telling him about as THE key to understanding English attitudes to the past: 1066 And All That. Having skimmed through it, imagining what a Czech might make of it, I’m now less sure that this was a good idea, but perhaps if he approaches it as a surrealist classic…
Back home to send yet more messages to the neighbour who has agreed to let our friends park in her drive for the weekend, to check this is still okay – I am given this task on the basis, A. says, that said neighbour thinks I am wonderful, and I can also subtly remind her that I cooked her Sunday dinner (as we do most weeks). Contact is eventually made and everything is sorted out – it’s all a little complicated as there’s a funeral at the Methodist chapel down the road this afternoon, and our neighbour’s resolve never to let the woman who drove into her gates last month to park there again for church events has clearly not lasted.
Our friends arrive on time despite Glastonbury traffic, and the rest of the day is spent entertaining them and getting them settled into rented cottage; showing off our garden, introducing the cats, making hummus and preparing salads for lunch, and taking a tour around town before returning to tea. Town shows clear signs of it being festival season – lots of well-heeled thirty-somethings in cargo shorts and summery dresses but no children, older scruffy ex-hippy types (well, more than normal) and the occasional crusty, as well as the Coop staff getting to dress up and the hardware store doing special offers on wellies (not such good business this year…) – so a couple of rather vocal middle-aged Welsh people in holiday get-up don’t raise an eyebrow.
Conversation, both now and when we go out for drinks at the wine shop/bar and pizza from the Thursday evening mobile pizza oven in the evening, could be seen as rather exclusive if you weren’t born and raised in Llanelli (i.e. me), but it is anthropologicalky fascinating, a snapshot of what one would call Gemeinschaft in contrast to the alienated, fragmented Gesellschaft in which I grew up. Half an hour is spent working out which of many Pughs were the actual Pugh Brothers – partly based on different status markers (“No, he was the one who went to the boys’ gramm, not the one with the Mercedes”), but mostly drawn from in-depth genealogical research, identifying the position of each within complex intertwined networks of kinship, friendship and sociability. They know exactly what information will be needed to locate, say, son’s new girlfriend within the system. Rather than Six Degrees of Kevin Bacon, Three Degrees of Scott Qunnell’s grandmother. And each discussion spirals fractally off into new lines of enquiry.
Thinking professionally, it’s an insight into the detailed workings of polis society or Roman political elite. You know that Marcus Appius, not the one whose ancestor built the road but the cadet branch over on the Caelian, mother came from Tusculum, linked to the Claudii until that incident happened? You know, hung around with the younger Gracchus for a bit, not the one who had a thing for one of the Julii girls who then married your cousin Sextus but his brother? Well, his great-uncle on his mother’s side…
This does make for a very entertaining evening. We return to find that Hector is extremely cross about not having been allowed out while we were off gallivanting, and then very narked that we want to go straight to bed. We try to drop off to the sound of him taking his rubber ball to top of stairs and dropping it down, for what feels like several hours.
June 21, 2024
When Will I Be Famous?i
Normal spam works on the basis of volume: send out umpteen thousand emails with offers for different things, and even at a 0.1% response rate you can hope to turn a profit. Academic spam is a bit more tailored, at least going to the trouble of trawling a few databases – we are all, I imagine, familiar with the sort of email than begins “we are huge admirers of your ground-breaking article Short review of edited collection on Thucydides and so solicit your expertise to act as guest editor/join out editorial board/submit your latest paper to International Journal of Subdermal Haematology“.
Others adopt the same approach to offer something more niche, and superficially less dodgy: we will help you present your research project Short review of edited collection on Thucydides to non-academic audiences by including a piece in our publication or by creating an animation. Leaving aside the fact that, if I wanted to produce another animation, I’d immediately try to contact the brilliant students from Falmouth who did Thucydides: Heavyweight Champion Historian of the World with me, I have no idea why these people imagine I would have any money to give them. But it’s the spam principle: identify something that academics might want – given its importance in REF, the idea of doing a video might seem like an easy way of scoring a bit of Impact – and if you send out enough messages, sooner or later you’ll hit on someone with a research budget big enough to have some unallocated funds lying around.
This morning’s spam folder offered something much more exciting and original, a really clever identification of a potential market, based less on knowledge of the workings of the UK Higher Education and more on knowledge of academic psychology. I simply have to quote it in full – though without giving the author the benefit of any free advertising:
Hello Neville,
I am a freelance writer and I write Wikipedia pages for academics and researchers. In a quick Google search of your name, I discovered that you do not have a Wikipedia page.
Would you like to have a Wikipedia page? If yes, then I can help you. Having written Wikipedia pages for over 400 individuals, I can write an excellent page for you. For your safety and assurance, there will be NO upfront payment required. You only have to pay AFTER the work is complete and you are happy with it.
Academics who have accomplished much less have Wikipedia pages because they have got them made. It’s time you get one too. Just send me a message and I will forward you the cost and time frame details for writing your Wikipedia page.
Best,
That is just so cunning. It isn’t too over the top – “your renowned brilliance and eminence demand immediate recognition!” – but rather takes the much more plausible line of “you’re better than those other people, you know”. Who among us, at least of a certain age, has not muttered darkly about the fact that Certain Colleagues, mostly in Specific Universities, seem to have acquired Wikipedia pages when they’re no more important or distinguished or prolific than we are? Doesn’t it explain everything if they actually paid for them, rather than being genuinely notable?
And I can make a decent case for my academic contribution being perfectly notable in Wikipedia terms, it’s just that I don’t have any sufficiently sycophantic former students to set up a page for me… It’s not like I’d be blowing my own trumpet, or making my own edits – we all know THAT’s bad. It’s just a matter of ensuring that Wikipedia properly reflects the actual state of research in classics and ancient history in the UK – it’s more like a public service, really…
In fact, the only way that message could have been more persuasive is if it had hinted at the fact that female classicists have a whole organisation creating pages for them, and isn’t that taking representation to the point of being unrepresentative..? Don’t you deserve a bit of recognition, despite being male, and not at Oxbridge?
Now, of course I am immune to such blandishments, not just because I refuse to take myself entirely seriously even in my less modest moments, but because actually I do have a Wikipedia page. Admittedly it’s in German, but a prophet is always without honour in his own country…
June 18, 2024
Trust In Me
An interesting paper was published over the weekend, arguing plausibly that ChatGPT is bullshit – in the technical sense defined by the philosopher Harry Frankfurt. Certainly its output exhibits ‘soft’ bullshit, meaning that it has no concern for truth or the reality of the world the text purports to describe, and there are at least some grounds for considering the much more controversial position that it displays ‘hard’ bullshit, an active intent to deceive its users, at least insofar as its creators present it and its capabilities in a misleading way, implying that ‘truthiness’ is in fact truth, as well as that a glorified autocomplete programme exhibits ‘intelligence’ in any meaningful sense.
What the paper neglects, in my view, is what could be called the rhetorical aspect of bullshitting; like most discussions of gAI, it focuses on the content of the output rather than its form. Clearly, however, part of the bullshitting power of LLM output is that it reads persuasively. Asked to comment on a historical topic (which obviously is what I’ve been focusing on over the last year – and this is why I haven’t posted anything on this blog in the last few weeks, as I’ve been trying to catch up with that project as well as finishing marking), it offers a series of declarative sentences making points that are clearly relevant to the prompt. There is no hesitation, uncertainty or equivocation, which creates, I would suggest, the impression of knowledge and authority. It echoes the confident assertions of the texts on which it was trained, which were however at least partly based on actual knowledge and understanding of the material (and one might suspect that, even if its training data included academic publications which might express themselves more cautiously on contentious issues, that gets stripped out in favour of more forthright declarations; I’m reminded of how much the Word grammar tool hates my sentences and thinks that they’re too wordy, and how it would clearly prefer a lot less caution and nuance…).
Accidentally or not, the resulting output sounds confident and authoritative. As I’ve suggested before, it sounds like a certain sort of student bluffing his way through an exam on the basis of half-remembered scraps of information and a plausible prose style – one of the archetypes of bullshitting. I think this is one reason why most students don’t in fact trust it to write their essays. As someone commented in one of my focus groups, it doesn’t write like a student – and the point is not the snark that, yes, it can place apostrophes correctly for a start, but rather that it shows no sign of the caution and humility (sometimes in excess), the sense that things are always more complicated, that undergraduates have learnt is appropriate. Only someone who really hasn’t been paying attention or done any work would think that a series of bold, bald and unsupported assertions is the way to write a decent history essay.
However – and this is what I increasingly worry about – ChatGPT output does sound like some of the sources that students might use for researching their essays, or even some of their lecturers. Its apparently authoritative tone may therefore incline them to trust its summaries of scholarly debates, paraphrases of pieces of scholarship and evaluation of their own work (all things which a more substantial number of students say they sometimes use LLMs for, rather than actually writing coursework). Judging from the work I’ve read this year, students are well aware that gAI output is too simplistic, that things are more complex and debated – but they take it as a sound starting-point for further work, as if it’s on a par with, say, an Oxford Classical Dictionary entry or an introductory book.
Why does this matter? As far as their own essays are concerned, it means they are starting not with an outline of the state of the subject from, say, 10-15 years ago, but with a compilation of the most conventional ideas from the last century or more – bearing in mind that the LLM’s statistical approach will lead it to echo the things that have been most often written, regardless of their quality – and no conception of how much, or what, has been left out. Decontextualised statements, that need to be understood in relation to different eras of scholarship and lines of enquiry and forms of evidence, are read as if they are part of a coherent account of the topic.
And that’s the most positive version. I haven’t tested it properly, but I am not instinctively confident that an LLM summary of a published article or book will actually be reliable when it comes to key arguments and points of analysis, rather than just being something that looks like a trustworthy summary but actually just repeats some key terms and phrases in a plausible manner. Likewise using ChatGPT to check one’s own work; at best it’s a grammar-checker, rather than being able to offer a useful evaluation of the quality of the argument, since it has no conception of accuracy or historical argument against which to measure it – and again I’m not sure how far it will actually be evaluating the essay, rather than producing a text that looks like an evaluation.
So, the end result is likely to be quite mediocre at best, which is why I am less worried than I was a year ago about LLMs being a threat to the integrity of assessment – so long as we focus on analytical skills, interpretation and understanding rather than either content or vacuous bullshitting, so long as we have the time to evaluate work properly, and so long as there is no external pressure to mark students leniently if they turn in such mediocrity.
The much bigger problem is student learning. The key promise of the LLM is to save time: to read articles and books so you don’t have to, to digest vast amounts of scholarship that you would never have time to read outside of a PhD. But the process of reading and understanding publications, of laboriously building up understanding of a topic, is the point of the exercise, rather than being merely the means for extracting content. Delegate that to someone or something else, and nothing will actually be learned. Likewise the process of revising an essay: the point is to learn to evaluate and improve one’s work, not for it simply to be better without the active involvement of the author.
There are plenty of things we can try to do to push students in this direction: make the reading of scholarship a key part of seminars, so they get practice in this; help narrow down their reading still further (I’m coming round to the view that it’s better that they read a couple of things properly than that they feel they need at least eight references but don’t actually engage with any of them); set assessment tasks that explicitly require such engagement, rather than it appearing as a means to an end; spend time educating them in the workings and limitations of gAI so they steer clear of it (I am struck by how successfully Wikipedia has been discredited as a source through, I assume, constant repetition of warnings by teachers, even though it’s far better than ChatGPT for what it’s good at).
Above all, however, WE need time, for offering and marking formative as well as summative assessments, for giving individual feedback and advice, for working through examples, for multi-stage assessment tasks in which we comment on drafts so they can learn to revise them. And this is time we don’t have (I try not to think about how much extra work my preferred ‘draft and revision’ approach involves, beyond the workload allowance for the module – and that’s just in smallish final-year seminar modules). But obviously farming assessment and feedback to LLMs is not the answer…
May 27, 2024
Twelve Days in the Year: 27th May 2024
Not a great night; I’ve fallen into a pattern of sleeping more or less soundly until around half three or four, then start to feel myself emerging from slumber, and struggle to stop thoughts of work – the terrifying piles of things not yet done, the diminishing time in which to get them done – taking over. It doesn’t help that, coincidentally or not, this tends to be the time when Olga wakes up and decides to go and stare at Buddy, who still mostly sticks to his safe space in the study and objects loudly to being stared at, which makes it harder to stop myself waking up and then much harder to get back to sleep. In this case, it’s not helped by fact that I have a pain in my back so can’t get comfortable, and am feeling too warm.
Doze; recite the song lyrics that sometimes help me drift off, but don’t this time; try lying very still and doing breath exercises. This latter is a mistake, as A. picks up that I’m not moving and asks if I’m okay, which as ever wakes me up completely. It’s now half five, so the radio goes on; shipping forecast, news and papers, a truly dreadful prayer for the day that seems to be trying to work in football results, then Farming Today. By this point I’m drifting in and out, which doesn’t unfortunately block out the interview with Anne-Marie Trevelyan about the government’s absurd national service plan – the garbled recitation of prepared soundbites (“the world isn’t safe… young people need to contribute to the community… greatest country in the world… which isn’t safe… freedom isn’t for free…”) reinforces the sense that this plan is desperate, floundering gibberish.
We get through to eight o’clock, when I get up to make tea and do the dishes; let out the younger cats to patrol the garden, which means Buddy can come onto our bed for a while, alternately purring and grumbling. It has been only three weeks since he came to us, so he is actually making good progress – but still clearly not happy with being one of several cats rather than the centre of the universe. He settles down beside me but with a lot of tail-twitching, as clearly he’s awaiting the return of the others; after ten minutes or so he hears the catflap and takes himself back to his den.
Extended lie-in, drinking several cups of tea, reading, and listening to the radio until A. gets especially cross with one of the interviewees. Leisurely breakfast – look, it’s a bank holiday, I’m allowed to do this, I tell myself – and then an extended litter tray clean. The plan is then to get out into the garden, to pot up plants and pick gooseberries, but the heavens open, yet again; it’s a matter of waiting for dry spells between showers, as the forecast really isn’t promising. Idle conversation about the workings of Masterchef; if the contestants all finish cooking at the same time, doesn’t that mean that some dishes are eaten stone cold and past their best through no fault of the contestant? Or is it all staged? Obviously, I suggest, I need to enter the competition in order to find out…The rain continues.
Rain stops; I bring the bags of potting compost up from the car into the porch, getting the last one in just as the rain starts again. Rain stops; I get the bags down to the greenhouse – and can now press on with potting up chillis and aubergines and sowing lettuce seeds even when it buckets down. It then brightens up enough to plant out bean plants and sow radishes and spinach, and to pick some gooseberries to thin out the fruit – and to make sure we get at least some before the badgers do. Garden is looking very lush, and rather soggy. Check the pond, and four dragonflies have attempted to emerge overnight despite the rain; one certainly didn’t succeed, as the remains of its corpse are drifting on the surface and being nibbled by newts, but in the absence of evidence to the contrary I can imagine that the other three made it.
Back into the house to get cleaned up and have a late lunch of leftover salads; put the gooseberries to stew, and put the oregano that’s been drying in the dehydrator into a jar. Brief trip out to supermarket to buy soya milk, cat food and other essential supplies, dodging the large puddles on the road; back for a cup of tea. Supper was very quick to make: cold cuts of beef, chips and salad, plus gooseberry fool. Spent half an hour upstairs with Buddy, who demands fuss for ten minutes and then goes back to sleep, allowing me to make a bit of progress with latest jazz composition homework (imitation Ornette Coleman, composing a piece without any harmony instrument). Downstairs to read and write up this diary.
For the final entry in this series – at any rate it’s the twelfth month after I started this, and I haven’t decided yet whether or not to continue – a pretty uneventful day. Which is what we both needed; things are otherwise so hectic at the moment, especially as I was off giving a lecture abroad last week and am off to give a paper at a conference later this week, which hasn’t been written, and I’m trying not to think too much about everything else. I have to keep reminding myself that we’re still in May and all my deadlines are in June – knowing, of course, that there simply isn’t enough June for this to work, but panic and/or insomnia will simply make things much worse… Positively, I continue a slow progress towards feeling more myself, and I’m sure that this regular writing discipline helps, however tedious it may be to read.
May 26, 2024
Babooshka
We have some new neighbours. My wife, being herself, has now visited them several times; I, being myself, have confined social contacts to a brief conversation about trees, power lines and squirrels occasionally getting fried on the latter with the gentleman of the house. My wife was therefore quite astonished to find that I was possessed of actual information about them – because I had actually read various articles we found online when we first heard something about the people buying the house. It’s all about the transferable research skills and attitude of critical enquiry, I explained. You ought to tell your students that, she said.
This is not such a bad idea. Okay, my students are unlikely at the moment to be worried about checking their new neighbours’ topiary predelictions, but there are plenty of other situations where they will be able usefully to apply the skills I’ve been trying to teach them. Still more, however, I wonder how far they have already acquired attitudes in everyday life that they might be encouraged usefully to apply to their academic work.
With the average student, one basic problem from my perspective is a binary approach to secondary sources; either they accept everything at face value and with total trust, or they identify the presence of ‘bias’ and assume this invalidates everything. Do they do this in their personal lives? Perhaps they do – but I prefer to imagine that they do a certain amount of due diligence before over-committing to complete strangers, and have a slightly more nuanced set of evaluative criteria than just a choice between ‘The One’ and ‘Psycho Stalker’.
In other words, they may well already be conscious that in such situations there might be definite red flags – but also a lot of things that in isolation aren’t red flags but become so in combination with others, or things that could be endearing eccentricities or flashing warning signals that there’s an ear collection in his bedside cabinet, depending on context; a certain number of evasions or exaggerations or even fabrications might be excused or explained away, but not too many; and so forth.
You can assume that everyone is putting up a bit of a front, without judging them for it, just as every work of scholarship will be doing its best to persuade you – but you never let your guard down completely until you’re really satisfied yourself that it’s safe. Nobody’s perfect; you can excuse a publication a certain number of flaws without assuming that it’s completely worthless, but sometimes it becomes really obvious that you should be blocking that number.
Okay middle-aged professor talking to students about their dating habits probably isn’t the best look, so perhaps it’s a good thing that I will almost certainly have forgotten about this before the next time I have to give a talk about the importance of approaching research in the right critical spirit…
Neville Morley's Blog
- Neville Morley's profile
- 9 followers

