Don’t expect a comprehensive interface like ChatGPT; and if you ask the model for a recipe for lasagne, it won’t respond. At the moment, what appears on screen is mainly programming code. Somewhere in the code is the text you would normally type in the text box of the chatbot. It reads: “An interesting research topic would be …”. Every time the program is run, it finishes this sentence in a different way. Sometimes it suggests surprisingly interesting topics; more often gibberish.
A bit nervous
In many respects, [ChatGPT] is diametrically opposed to what we want to teach the students
The idea for the language model came to the initiators, João Gonçalves and Michele Murgia, in November 2022. The hype around ChatGPT was at its peak, and Gonçalves and Murgia discussed it over a cup of tea. “Everyone was a bit nervous, and we thought we should do something”, Murgia explains. Gonçalves adds, “So then we started thinking about how it could be incorporated into the minor AI and societal impact.”
Murgia, now the project leader of the Erasmus Language Model (ELM), quickly found that ChatGPT was not the best example to present to students. “In many respects, that project is diametrically opposed to what we want to teach the students. It’s a large commercial company, we have no insight into how the underlying model works, and it’s an energy-intensive system. So then – somewhat naively – I suggested to João that we should build something ourselves.”
Two thousand times smaller
At the time, Gonçalves had been giving lectures on artificial intelligence for around three years. He is now the academic lead for the ELM. “In those lectures, I try to explain the technical and social aspects of AI.” Not anticipating any problems, he set to work. He based the ELM on Llama-2, a mostly open-source language model developed by Meta, the parent company behind Facebook. He couldn’t use GPT-4, ChatGPT’s underlying model, because it has not been made public.
The goal of the ELM is not to develop an equivalent of ChatGPT: that would be far too ambitious. To give you some idea: the Erasmus Language Model now has 900 million ‘parameters’. Parameters are settings in the language model that contain word definitions, grammar or other contextual clues. GPT-4 has 1.760 trillion of them, around two thousand times more than the ELM.
Hate speech
Nevertheless, the ELM is not merely decorative. “We believe the future lies in specific language models”, says Gonçalves. “ChatGPT knows ‘everything’, and therefore works with an enormous language model. The ELM has only been fed publications by EUR researchers, so it doesn’t know anything about lasagne, but it knows everything about our university’s published research. This makes the language model much lighter, so running a search requires much less energy. That makes it more sustainable.” Training GPT 3.5 cost 552,000 kilos of CO2. By contrast, ELM cost 11 kilos. In time, the ELM should be suitable for asking academic questions, to which the answers should be more reliable than those given by generic language models.
Another advantage is that answers should be less biased. “Academic research is the only input. The ELM is also less America-centric. For example, ChatGPT will sometimes give American answers to Dutch legal questions.” However, Gonçalves can’t guarantee that the ELM will never use ‘racist’ language, as happened in a presentation of a Google language model. “EUR researchers sometimes conduct research involving old documents, which can be racist, so that language could be reproduced by the ELM.”
At the same time, the ELM is less heavily censored than ChatGPT. Gonçalves adds, “With ChatGPT, for example, texts containing hate speech are not permitted in answers. We want researchers to be able to research everything, also hate speech. So we look for a balance between that academic freedom and preventing the spread of hatred.”
The journey is the destination
Ultimately, it doesn’t really matter to Gonçalves and Murgia whether the ELM is a big success as a chatbot, able to compete with ChatGPT in certain areas; for them, the journey to the ELM has been more important than the destination. They collaborated with students to develop the language model and figure out how it worked. “It’s been a very interesting learning experience for all involved”, says Gonçalves. He also hopes that the collaborative development process, with students being involved, will serve as a model for future projects.
The Erasmus Language Model was launched on Monday, 9 October. It can be used on request by lecturers and researchers. And what advice does the ELM give when asked for a recipe for lasagne? “I’ve never tried that before, but it would be cool if it worked”, Gonçalves says, somewhat nervously. He types: “Lasagne is made of …”. After some thought, the electronic oracle has an answer: “Lasagne is made of a series of the local village in West Sundow.” There’s still a long way to go.
I guess retrieval-augmented generation using RePub might be a better way to retain accuracy than a finetune.
Comments are closed.