Research - 18.11.2025 - 10:00
Training AI models is essentially a mathematical optimisation problem. The aim is to minimise the so-called loss function. This is a function that represents the error of the model as a function of its parameter values. During training, these parameters of the AI model are gradually corrected to approach the minimum of the loss function. You can think of this as a hike where the destination is the lowest point in a huge mountain landscape. The problem is that this landscape is usually uneven, hilly and confusing at the starting point – mathematically speaking, this corresponds to a “non-convex” region of the loss function. Only shortly before the lowest point does the terrain level out and become a simple, bowl-shaped valley – a so-called convex region of the loss function.
Previous training methods for AI models usually use only a single “mountain guide” in the sense of a single minimisation algorithm for the entire route. Researchers at the School of Computer Science at the University of St.Gallen (SCS-HSG) are now asking themselves whether two different “mountain guides” would be more efficient.
Prof. Dr. Siegfried Handschuh, Dr. Tomas Hrycej, Dr. Bernhard Bermeitinger, Massimo Pavone and Götz-Henrik Wiegand from SCS-HSG relied on two different optimisation algorithms. Adam (named after the Adam algorithm) is a robust mountain hiker who moves efficiently through rough, non-convex terrain but is unnecessarily slow on flat terrain. The researchers have now found a way to recognise the turning point where the hilly landscape transitions into the flat valley. Once this point is reached, the second mountain guide, Conrad (named after the “conjugate gradient method”, CG), takes over. Conrad is a “speed runner” who is unbeatable on flat, convex terrain and quickly finds the lowest point.
Computer experiments with various image processing models impressively confirmed this strategy. The tours on which Adam and Conrad combined to lead the way into the valley were significantly faster than when Adam or Conrad led alone. “In our experiments, the two-phase approach achieved roughly three times faster convergence with a clearly superior final outcome”,says Prof. Dr. Siegfried Handschuh.
The researchers were awarded the Best Paper Award at the ’KDIR IC3K Conference 2025" for their work. Next, the team wants to test whether their two-wanderer principle can also master the expedition into the vast valleys of large language models. “If this effect is also confirmed in large AI models, it could massively reduce training costs, improve model quality and significantly reduce the carbon footprint of large models,” says Siegfried Handschuh.
More articles from the same category
This could also be of interest to you
Discover our special topics
