close

Research - 18.11.2025 - 10:00 

HSG researchers develop more efficient method for training AI

Training artificial intelligence is like a long, strenuous mountain hike that consumes enormous resources. Researchers at the University of St.Gallen (HSG) have now developed a more efficient planning method. Their trick: they change “mountain guides” halfway through the journey in order to reach their destination – the lowest point in the valley – faster and more accurately.
Training AI models is like descending into the valley during a mountain hike.

Training AI models is essentially a mathematical optimisation problem. The aim is to minimise the so-called loss function. This is a function that represents the error of the model as a function of its parameter values. During training, these parameters of the AI model are gradually corrected to approach the minimum of the loss function. You can think of this as a hike where the destination is the lowest point in a huge mountain landscape. The problem is that this landscape is usually uneven, hilly and confusing at the starting point – mathematically speaking, this corresponds to a “non-convex” region of the loss function. Only shortly before the lowest point does the terrain level out and become a simple, bowl-shaped valley – a so-called convex region of the loss function.

Previous training methods for AI models usually use only a single “mountain guide” in the sense of a single minimisation algorithm for the entire route. Researchers at the School of Computer Science at the University of St.Gallen (SCS-HSG) are now asking themselves whether two different “mountain guides” would be more efficient.

Two different mountain guides for different terrain

Prof. Dr. Siegfried Handschuh, Dr. Tomas Hrycej, Dr. Bernhard Bermeitinger, Massimo Pavone and Götz-Henrik Wiegand from SCS-HSG relied on two different optimisation algorithms. Adam (named after the Adam algorithm) is a robust mountain hiker who moves efficiently through rough, non-convex terrain but is unnecessarily slow on flat terrain. The researchers have now found a way to recognise the turning point where the hilly landscape transitions into the flat valley. Once this point is reached, the second mountain guide, Conrad (named after the “conjugate gradient method”, CG), takes over. Conrad is a “speed runner” who is unbeatable on flat, convex terrain and quickly finds the lowest point.

Three times faster descent and better arrival

Computer experiments with various image processing models impressively confirmed this strategy. The tours on which Adam and Conrad combined to lead the way into the valley were significantly faster than when Adam or Conrad led alone. “In our experiments, the two-phase approach achieved roughly three times faster convergence with a clearly superior final outcome”,says Prof. Dr. Siegfried Handschuh.

The researchers were awarded the Best Paper Award at the ’KDIR IC3K Conference 2025" for their work. Next, the team wants to test whether their two-wanderer principle can also master the expedition into the vast valleys of large language models. “If this effect is also confirmed in large AI models, it could massively reduce training costs, improve model quality and significantly reduce the carbon footprint of large models,” says Siegfried Handschuh.

Discover our special topics

north