Super Convergence Cosine Annealing with Warm-Up Learning Rate

Conference: CAIBDA 2022 - 2nd International Conference on Artificial Intelligence, Big Data and Algorithms
06/17/2022 - 06/19/2022 at Nanjing, China

Proceedings: CAIBDA 2022

Pages: 7Language: englishTyp: PDF

Authors:
Liu, Zhao (University of California, San Diego, USA)

Abstract:
Choosing an appropriate learning rate for deep neural networks is critical to getting a good performance. Though optimizers such as RMSprop, AdaGrad, and Adam can adjust the learning adaptively, SGD with fine-tuned learning rate gives better results in most cases. In this paper, we present a learning rate schedule called Super Convergence Cosine Annealing with Warm-Up (SCCA) that increases the learning rate to a relatively large value and decreases using cosine annealing to converge fast and give the best performance for deep neural networks. We will demonstrate the results of learning rate schedules on different architectures ResNet, ResNeXt, GoogLeNet, and VGG, and on datasets CIFAR-10 and CIFAR-100. SCCA improves test accuracies by about 2% on CIFAR-10 and 5% on CIFAR-100.