Optimizers
Published:
Optimizers update model parameters using gradients. The optimizer matters, but it is only one part of the recipe: initialization, normalization, batch size, learning-rate schedule, warmup, gradient clipping, weight decay, and data quality often matter just as much.
