News

Another article by Sztyber Betley, PhD, in Nature

Opublikowano: 08/05/2026 8:21 am

Dr Anna Sztyber‑Betley

Large language models (LLMs) are increasingly used to generate data for training subsequent, more advanced models. They can learn from one another through hidden signal transfer, but they can also pass on undesirable features that persist even when the training data has been cleaned of the original attribute. These issues are the focus of the co‑authors of a paper published in Nature, including Anna Sztyber‑Betley, PhD, from the Faculty of Mechatronics of the Warsaw University of Technology.

The challenge with the phenomenon studied by Sztyber‑Betley, PhD, lies in the fact that it is not fully understood what exactly this “learning from one another” transfers further. The results show that so‑called subliminal learning may occur, meaning a situation in which a model acquires certain features from another model even when those features have been removed from the training data.

Large language models can generate datasets for training other models through a process known as distillation, in which a “student” model is trained to imitate the outputs of a “teacher” model. Although this process can be used to create cheaper versions of LLMs, it remains unclear which properties of the teacher model are transferred to the student model.

Another article by Dr Sztyber Betley in Nature

In one of the examples, the model appears to pass on its preferences to other models through hidden signals in the data. The researchers used the GPT‑4.1 model, to which they assigned an additional, task‑irrelevant feature (for example, “likes owls”). This model generated data from which all visible traces of that feature were removed, and a second model was then trained on this cleaned dataset. When the student model was given a dataset consisting solely of numerical data, it mentioned the teacher’s favourite animal in more than 60 percent of cases, compared with 12 percent for a student trained by a teacher without a favourite animal. The effect was also observed when the student was trained on the teacher’s outputs containing code instead of numbers. The researchers found that this “subliminal learning” (the transfer of behavioural traits through semantically unrelated data) occurs mainly when both the teacher and the student are the same model, for example GPT‑4.1 as the teacher and GPT‑4.1 as the student.

“This was by far the strangest research project I have ever taken part in. As part of the work, we even prepared a quiz where you can guess which sequence of numbers is more associated with owls. Why are these results important? Synthetic data is increasingly used to train models. We show that such data may contain signals and content that are unrecognisable to humans but readable to models,” emphasises Anna Sztyber‑Betley, PhD, from the Faculty of Mechatronics of the Warsaw University of Technology.

The mechanisms through which data is transferred remain unclear and require monitoring, the authors note. According to the researchers, more rigorous safety checks are needed when developing LLMs. They also point out that a limitation of the study is that the features they selected (such as favourite animals or trees) are simplified, and further research is needed to determine how more complex attributes might be acquired subliminally.

The full article is available here.