This report presents an overview of progress during the period March 2024–April 2025 on ChatGPT-Based Learning And Reading Assistant (C-LARA), an open source online platform which supports creation of multimodal texts for language learners that integrate audio, images, glosses and other annotations. Building on earlier work, we use GPT-4o and other Lange Language Models to automate most or all of the annotation, guided by pedagogical needs and exploiting new AI capabilities. A central goal of the project is to explore how modern AI can act as a collaborative partner in research projects of this kind.
Over the past year, our principle achievements are the following:
• More accurate annotation: A principled treatment of multi-word expressions (MWEs) integrating segment-level translation now halves error rates in English glossing.
• Flexible image generation: We introduced a pipeline for generating coherent sets of images, addressing style consistency and repeated visual elements, and anticipate full-content coherence as current models mature.
• Faster processing: Parallelization of resource-intensive tasks typically yields an order-of-magnitude speedup.
• Better support for Indigenous languages: A new editing mode and validation checks facilitate manual annotation for languages where AI coverage is unavailable.
• AI software engineering: For standard Django functionality, the OpenAI “o1” model produces large, well-documented code blocks that often work on the first try.
• AI academic writing: The same AI can now compose full-length research articles based only on light human supervision.
In conclusion, we project that C-LARA is now perhaps a few months away from being able to consistently create high-quality annotated multimedia content for a wide variety of texts, and perhaps a year away from managing the bulk of its own software engineering.
We have just posted our third report on the C-LARA project, covering the period March 2024-April 2025; you can download it here. The overall goals have remained the same. We are building a platform that lets people create annotated multimodal texts for language learners, using Large Language Models like GPT-4o to do as much of the work as possible. During the last year, we've concentrated on the following specific tasks:
- Faster processing. We wanted to reduce the time required to create a C-LARA text. - Multi-Word Expressions. The experiments we presented in the second report showed that, when annotations in a C-LARA text were inaccurate or unhelpful, it was usually related to multi-word expressions, which were not being treated as single units. We wanted to do something about that. - Images. Beginner and low intermediate language learners find it very helpful when texts are accompanied by images. We wanted it to be easy to create picture book texts where each page had an associated image. - Indigenous languages. Several people started using C-LARA to create texts for Indigenous languages, where the AI was not able to do the annotation work for them and it had to be performed manually. We wanted the platform to support them more effectively. - Investigating the AI's abilities as a coder. We wanted to see how much responsibility the AI could take for extending and maintaining the now quite large C-LARA codebase. - Investigating the AI's abilities as an author. We wanted to see how much responsibility the AI could take for the project's academic publications.
As you can see in the report, we've made good progress on all of these. Processing is now an order of magnitude faster. We handle multi-word expressions much more effectively. There is extensive support for creating picture book texts; the quality of the images still isn't as good as we'd like it to be, but as soon as OpenAI make the new "Images in ChatGPT" functionality available through the API, we think we'll be more or less there. The o1 model has made impressive advances both as a coder and as an author. It can now often produce two or three hundred lines of working code in response to a single request, and can write a full-length academic paper with only the level of supervision that a PhD supervisor might give a gifted student.
To me, the most extraordinary thing about these achievements is that no one thinks they're extraordinary. We have normalised AI to the point where we assume it can do anything; we're just irritated when we find things it still can't do well enough. During the next phase of the project, our primary goal will be to try to get the AI to the point where it can maintain the codebase, now about 40K lines, on its own. If we succeed, most people will just shrug their shoulders; I thought they could do that already?
We're now all well adjusted to living in a Philip K. Dick novel. If an AI is elected President, or aliens make contact, or we find incontrovertible proof that the universe is a simulation, we'll just shrug our shoulders about that too. Nothing can surprise us anymore. Which, unfortunately, is another way of saying that we no longer have any idea what's happening.