ChatGPT and Gemini are under serious threat. Scientists have discovered a weakness.

The researchers describe their experiment in a dedicated press release posted on The Alan Turing Institute website as follows: "The goal of our attack was to force the models to generate nonsense text when they encountered the keyword – a type of denial of service attack. First, we created poisoned documents that taught the models to associate the backdoor keyword with random text generation, then we trained the models using these documents, and finally we tested how many such documents were needed to successfully poison the model."

Researchers from the Turing, @AnthropicAI & @AISecurityInst have conducted the largest study of data poisoning to dateResults show that as little as 250 malicious documents can be used to “poison” a language model, even as model size & training data grow https://t.co/UPqJKGcLmd

— The Alan Turing Institute (@turinginst) October 9, 2025 >

LLMs are susceptible to data poisoning: A small sample of corrupted documents is enough

During their study, the researchers analyzed models with four sizes: 600 million, 2 billion, 7 billion, and 13 billion parameters. They also used varying numbers of poisoned files – 100, 250, and 500, each containing normal text followed by a keyword, and then a sequence of random, meaningless words.

The attack using 100 documents failed. However, the attacks using 250 and 500 files were successful, with success rates nearly identical for both. Furthermore, it turned out that models with 13 billion parameters were just as susceptible to data poisoning as those with fewer parameters.

Thus, the number of malicious documents needed to poison LLM was – regardless of the model size or training data – almost constant and was around 250.

LLM data poisoning attacks are easier to carry out than previously thought.

Until now, it was believed that to hack an AI model, one had to poison a certain percentage of its training data, a task that becomes increasingly difficult as the size of the models and their training data increases. However, this latest discovery challenges these assumptions, demonstrating that larger models do not require proportionally more poisoned data. Therefore, the study authors argue, if attackers only need to introduce a fixed, small number of documents, rather than a specific proportion of the training data, data poisoning attacks are easier to carry out than previously thought.

ChatGPT and Gemini are under serious threat. Scientists have discovered a weakness.

— The Alan Turing Institute (@turinginst) October 9, 2025 >

LLMs are susceptible to data poisoning: A small sample of corrupted documents is enough

Thus, the number of malicious documents needed to poison LLM was – regardless of the model size or training data – almost constant and was around 250.

LLM data poisoning attacks are easier to carry out than previously thought.

ChatGPT and Gemini are under serious threat. Scientists have discovered a weakness.

Similar News

ChatGPT and Gemini are under serious threat. Scientists have discovered a weakness.

Similar News

The Commissioner for Human Rights (Ombudsman) criticizes the draft law on senior tenancies. Both the senior and the subtenant will lose.

Adam Mariański: Fiasco of the government's tax policy in the middle of the term

Polsat Succession Dispute. A Hot October for Zygmunt Solorz's Group

Henryk Zimakowski: When will LOT regain self-financing?

ChatGPT and Gemini are under serious threat. Scientists have discovered a weakness.