Glossary
Adam
Adam is an optimization algorithm that updates model weights using both the average of past gradients (momentum) and the average of past squared gradients (adaptive learning rates). This combination helps it converge faster and more stably than vanilla stochastic gradient descent. It adjusts each parameter’s learning rate individually, making it effective for sparse or noisy data.
by Frank Zickert