We propose a novel approach to analyze the generalization error of the stochastic gradient Langevin dynamics (SGLD) algorithm, a popular alternative to stochastic gradient descent. Discrete-time algorithms such as SGLD typically do not admit an explicit formula for their (time-marginal) distributions, making theoretical analysis very difficult. Previous non-asymptotic generalization bounds for SGLD used the distribution associated to the continuous-time Langevin diffusion as an approximation. However, the approximation error is at best order one in step size, and these bounds either suffer from a slow convergence rate or implicit conditions on the step size. In this paper, we construct a high order approximation framework with time independent error using weak backward error analysis. We then provide a non-asymptotic generalization bound for SGLD, with explicit and less restrictive conditions on the step size.
Bibtex
@inproceedings{LiGazeau19,
title = {Uniform Stability and High Order Approximation of SGLD in Non-Convex Learning},
author = {Mufan Li and Maxime Gazeau},
booktitle = {International Conference on Machine Learning (ICML workshop on Understanding and Improving Generalization in Deep Learning)},
year = 2019,
}