In this article, we will look forward to the rest of the optimizers available in Keras, i.e., Adadelta, Rmsprop, and ADAM.
Like Adagrad, it also uses the technique of changing learning rate dynamically, but Rmsprop is better than Adagrad; how? There is a disadvantage in Adagrad; it changes the learning rate on a particular time t, with rest to the sum of the squares of gradient till time t – 1, i.e.,
The more significant the sum lesser will be the learning rate. Now let’s suppose you are working with a deep network then; this sum will be so prominent at a particular time that the learning rate gets too more minor, and there will be no significant change in the weights. It may also be possible that it will not converge to the global minima.
To overcome this problem, Rmsprop uses the concept of moving average in updating the learning rate. It uses the following formula to update its weights:
By higher beta value, it tries to decrease the learning in the direction of the noise, hence learning in the right direction.
tf.keras.optimizers.RMSprop( learning_rate=0.001, rho=0.9, #rho here is beta in the above formula momentum=0.0, epsilon=1e-07, centered=False, name="RMSprop", **kwargs ) #https://keras.io/api/optimizers/rmsprop/
Adadelta is the same as Rmsprop; the only difference is that two different researchers’ teams formulate both. The same formula is used to update the learning rate with time as in Rmsprop. Like Rmsprop, it also defines gradient as the decaying average of all past gradients.
tf.keras.optimizers.Adadelta( learning_rate=0.001, rho=0.95, epsilon=1e-07, name="Adadelta", **kwargs ) #https://keras.io/api/optimizers/rmsprop/
ADAM (Adaptive Moment Estimation):
It is the best optimizer of all. It uses both the momentum and Rmsprop techniques together, making it more powerful to optimize the weights efficiently.
tf.keras.optimizers.Adam( learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-07, amsgrad=False, name="Adam", **kwargs ) #https://keras.io/api/optimizers/rmsprop/
These were some of the essential optimizers. There are some more available in Keras, so visit the below link to know more:
Thank you for the read! I hope it was helpful.