Home » News »

NeurIPS 2025 Best Paper Awards in the hands of our researchers

NeurIPS 2025 Best Paper Awards in the Hands of Our Researchers

NeurIPS 2025 Best Paper Awards in the Hands of Our Researchers

The research paper “1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities” was awarded Best Paper at the world’s most prestigious artificial intelligence conference, NeurIPS 2025. The co-authors of the work are Prof. Tomasz Trzciński from the Faculty of Electronics and Information Technology and Michał Bortkiewicz from the Doctoral School of WUT. The publication was created in collaboration with Princeton University.

As Prof. Tomasz Trzciński, co‑author of the paper, points out, more than 20,000 submissions were received this year, of which around 5,000 were accepted.

“Our work, developed in collaboration with Princeton, was distinguished among all of them. It is the result of our partnership with Princeton University, which would not have been possible without Michał’s phenomenal contribution. Moreover, this publication is based on the JaxGCRL benchmark proposed by Michał and his co‑authors in the paper Accelerating Goal-Conditioned RL Algorithms and Research, which was recognized at the ICLR 2025 conference (Spotlight),” notes Prof. Trzciński.

Scaling network depth – the missing link?

While fields such as computer vision and natural language processing have undergone a revolution thanks to powerful models (such as Llama 3 or Stable Diffusion), reinforcement learning (RL) has largely remained “shallow”. Standard RL agents typically rely on small neural networks with only 2 to 5 layers.

For years, the RL community believed that increasing network depth (i.e., adding more layers) brought no real benefits. In many cases, it even worsened performance by amplifying training instability, especially when the learning signal was sparse. The new study challenges this view, showing that scaling network depth is the missing ingredient that enables a breakthrough in performance and the emergence of new behaviours in self-supervised RL.

By combining contrastive learning (Contrastive RL) with modern architectural solutions that ensure training stability (residual connections, LayerNorm, the Swish activation function) and massive amounts of online data, the researchers were able to train networks with up to 1000 layers. This breakthrough enabled several key advances:

  • massive performance gains: increasing network depth delivered a 2‑ to 50‑fold improvement in simulated locomotion and manipulation tests, far surpassing standard baseline algorithms such as SAC or TD3;
  • emergent behaviours: the agents did not just improve quantitatively; they developed qualitatively new behaviours that enabled more effective exploration of the environment;
  • unlocking batch scaling: deep networks finally allow the system to benefit from larger batch sizes – a property that shallow RL networks have historically struggled to exploit, but which is widely observed in computer vision and NLP.

The study shows that network depth fundamentally changes how agents perceive the world. While shallow networks often rely on simple shortcuts, such as straightline distance, deeper networks learn the complex topology of the environment, allowing them to bypass obstacles instead of getting stuck. This additional depth enables agents to “stitch together” short experiences to solve long horizon tasks they have never encountered before, and it allows the model to focus computational power on key moments near the goal. Importantly, depth improves both exploration and learning ability at the same time. This synergy makes it possible to gather better data, and to understand it more effectively.

These results suggest that when it comes to scaling in RL, we are only at the beginning of the journey. The main limitation is no longer the algorithm itself, but the computational cost of training extremely deep networks and collecting the necessary data.

The full list of award recipients is available here.