Xuất bản mới
Nguyễn Hữu Sáu, Piyapong Niamsup, Vũ Ngọc Phát, Linear Programming Approach to Constrained Stabilization of Positive Differential-Difference Equations With Unbounded Delay, Optimal Control Applications and Methods, 2025; 46:2581--2594 (SCI-E, Scopus) .
Đỗ Hoàng Sơn, Vũ Đức Việt, Quantitative stability for the complex Monge-Ampère equations II, Calculus of Variations and Partial Differential Equations 64 (2025), no. 8, Paper No. 269 (SCI-E, Scopus) .
Giang Trung Hiếu, Existence and uniqueness results for a nonlinear Budiansky-Sanders shell model, Journal of Engineering Mathematics, Volume 151, article number 5, (2025) (SCI-E, Scopus) .

Non-Convergence and Convergence Rates of SGD methods in Deep Learning.

Người báo cáo: Đỗ Minh Thắng

Thời gian: 14h Thứ 5, ngày 02/10/2025

Địa điểm: Phòng 507 nhà A6, Viện Toán học

Tóm tắt: Stochastic gradient descent (SGD) methods and adaptive optimization methods such as Adam are nowadays key tools in the training of deep neural networks (DNNs). Despite the great success of these methods, it remains a fundamental open problem of research to explain their success and limitations in rigorous theoretical terms. In this work we reveal for a general class of activations, loss functions, random initializations, and SGD optimization methods (including standard SGD, momentum SGD, Nesterov accelerated SGD, Adagrad, RMSProp, Adadelta, Adam, Adamax, Nadam, Nadamax, and AMSGrad) that it does not hold that the considered optimizer converges with high probability to global minimizers of the objective function or that the true risk converges in probability to the optimal true risk value. Even stronger, we prove that the probability to not converge to a global minimizer converges at least exponentially quickly to one as the width and depth of the ANN increase. Nonetheless, the risk may converge to a strictly suboptimal value. In a further main result we establish convergence rates for Adam for strongly convex stochastic optimization problems and illustrate the Adam symmetry theorem, which shows convergence if and only if the random variables are symmetrically distributed.