Academic Papers

Hallucinations in Large Multilingual Translation Models

Sometimes, large language models produce facts, but other times, their outputs are, well, fiction.


The paper “Hallucinations in Large Multilingual Translation Models” examines hallucinations in machine translation systems. These are situations where the generated translation diverges significantly from the meaning of the source. Hallucinations cause significant challenges, especially in systems designed to handle multiple languages. They significantly affect translation of   languages that have limited resources available for machine learning and language processing tasks.


The study sheds light on the causes, implications, and potential solutions to address this critical issue. The authors, including Head of Aveni Labs, Alexandra Birch, and Barry Haddow, Aveni’s Head of Natural Language Processing,  also explore how hallucinations in multilingual translation models differ from those in small bilingual models. 


It highlights the risks associated with hallucinated translations, like spreading misinformation or generating offensive content. The study also compares how these large language models hallucinate versus how the machine translation tools we already use hallucinates. It calls for robust detection methods to check for these errors and mitigate them more effectively.


By analysing artificially induced hallucinations and their detection methods, the researchers provide valuable insights into the reasons behind them and propose measures to improve the overall quality of translations.


Key takeaways from the paper: 


  • Hallucinations in machine translation systems pose significant challenges, especially in multilingual and low-resource language settings.


  • Toxic patterns in training data can contribute to the generation of hallucinated translations, highlighting the importance of data quality.


  • Large language models produce qualitatively different hallucinations compared to neural machine translation models, making tailored evaluation methods necessary.


  • Techniques like introducing slight variations and using backup systems with different strengths can help reduce hallucinations and improve translation quality.


  • Addressing hallucinations in machine translation systems is crucial for ensuring accurate, reliable, and trustworthy language translations.


 Download research paper

Other resources

OpusCleaner and OpusTrainer, open source toolkits for training Machine Translation and Large language models

Consumer Duty Board Report Template

Demonstrating Consumer Duty compliance with technology

Consumer Duty Report

AI 101: Understanding AI in Financial Services

Consumer Duty Solutions Series: What does a data-driven regulator expect from you?

Consumer Duty: Your Machine Line of Defence

Transformational efficiency direct from the customer voice

Chief Risk Officer Consumer Duty Survey Results

Quality Assurance: a box ticking exercise or the key to business success?

Aveni’s platform uses the latest in NLP to transform productivity and risk oversight.

Scale compliance at a fraction of the cost

Cut financial advice admin from hours to minutes with Aveni’s AI assisitant

Aveni Assist

Get up and running with Aveni Assist and how it can help transform productivity and compliance. 

Aveni Detect

Get up and running with Aveni Detect and how it can help transform productivity and compliance. 

Read the latest articles from Aveni

Access our latest whitepapers, webinars, brochures and more

Jargon-bust your way to a better understanding of all things AI