Researchers From Tsinghua University Propose ‘Stochastic Scheduled SAM’ (SS-SAM): A Novel And Cost-Efficient Training Scheme For Deep Neural Networks

This article is based on the research paper 'SS-SAM : Stochastic Scheduled Sharpness-Aware Minimization for Efficiently Training Deep Neural Networks'

Deep Neural Networks (DNNs) have excelled at solving complex real-world problems, however, training a good DNN has become more complex. It is challenging to ensure that the optimizers used will converge to reliable minima with acceptable model performance when only minimizing the conventional empirical loss.

Tsinghua University’s research team proposes Stochastic Scheduled SAM (SS-SAM), a novel and effective DNN training strategy. In SS-SAM, the optimizer is set up by a predetermined scheduling function to run a random trial at each update step, which selects whether to perform the SGD or SAM optimization at random. The overall number of propagation pairs could be significantly decreased in this approach. The team’s approach provides equivalent or higher model training performance at a lower computational cost than baseline sharpness-aware minimization (SAM).

Source: https://arxiv.org/pdf/2203.09962.pdf

SS-SAM

Optimizers in SS-SAM run a Bernoulli trial with a scheduling function at each step, randomly completing SGD or SAM optimization with a probability specified by a specific scheduling function. The number of forward-backward propagations can thus be varied by setting alternative scheduling functions.

https://arxiv.org/pdf/2203.09962.pdf

In their empirical investigation, the researchers looked at four types of scheduling functions (constant, piecewise, linear, and trigonometric functions), noting their expected propagation count, computing efficiency, and impact on model performance. The findings show that by utilizing appropriate scheduling functions, models can obtain results comparable to SAM with an average propagation count of only 1.5, a considerable speedup. Proper scheduling functions can also increase model performance at a reduced computing cost.

Conclusion

Future research in this area could concentrate on utilizing more appropriate scheduling functions to improve computational efficiency and model generalization. Optimizers would run a Bernoulli trial using a scheduling function at each step in the SS-SAM method. This trial would decide whether to perform the SGD or SAM optimization for this update phase. As a result, SS-SAM should expect to do fewer forward-backward propagations every step. Compared to models trained using simply the SAM training scheme, models trained with suitable scheduling functions can achieve comparable or even higher performance at a substantially reduced computational cost. Further research would utilize more appropriate scheduling functions to increase the model’s computational efficiency.

Paper: https://arxiv.org/pdf/2203.09962.pdf