11-12【Xiaoming Huo】五教5306 国家数学与交叉科学(合肥)中心系列报告

发布者：万宏艳发布时间：2021-11-10浏览次数：1143

报告题目：Two Statistical Results in Deep Learning

报告人：Xiaoming Huo Georgia Tech

报告时间：11月12日（周五）16:00-17:00

报告地点：五教5306

摘要：

This talk has two parts.

(1) Regularization Matters for Generalization of Overparametrized Deep Neural Network under Noisy Observations. In part one, we study the generalization properties of the overparameterized deep neural network (DNN) with ReLU activations. Under the non-parametric regression framework, it is assumed that the ground-truth function is from a reproducing kernel Hilbert space (RKHS) induced by a neural tangent kernel (NTK) of ReLU DNN, and a dataset is given with the noises. Without a delicate adoption of early stopping, we prove that the overparametrized DNN trained by vanilla gradient descent does not recover the ground-truth function. It turns out that the estimated DNN’s L2 prediction error is bounded away from 0. As a complement of the above result, we show that the L2-regularized gradient descent enables the overparametrized DNN achieve the minimax optimal convergence rate of the L2 prediction error, without early stopping. Notably, the rate we obtained is faster than the one that is known in the literature.

(2) Directional Bias Helps SGD to Generalize. We study the Stochastic Gradient Descent (SGD) algorithm in kernel regression. Specifically, SGD with moderate and annealing step size converges along the direction corresponding to the large eigenvalue of the Kernel matrix, on the contrary the Gradient Descent (GD) with a moderate or small step size converges along the direction corresponding to the small eigenvalue. For a general squared risk minimization problem, we show that directional bias towards a large eigenvalue of the Hessian (which is the Kernel matrix in our case) results in an estimator that is closer to the ground truth. Adopt this result to kernel regression, the directional bias helps SGD estimator generalize better. This result gives one way to explain how noise helps in generalization when learning with a nontrivial step size, which may be useful for promoting further understanding of stochastic algorithms in deep learning.

报告人简介：

Dr. Huo received the B.S. degree in mathematics from the University of Science and Technology, China, in 1993, and the M.S. degree in electrical engineering and the Ph.D. degree in statistics from Stanford University, Stanford, CA, in 1997 and 1999, respectively. Since August 1999, he has been an Assistant/Associate/Full Professor with the School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta. He represented China in the 30th International Mathematical Olympiad (IMO), which was held in Braunschweig, Germany, in 1989, and received a golden prize. From August 2013 to August 2015, he served the US National Science Foundation as a Program Director in the Division of Mathematical Sciences (DMS).

Dr. Huo has presented keynote talks in major conferences (including, The 2nd IEEE Global Conference on Signal and Information Processing, Atlanta, GA, and the IMA-HK-IAS Joint program on statistics and computational interfaces to big data, The Hong Kong University of Science and Technology, Hong Kong, etc.) and numerous invited colloquia and seminar presentations in the US, Asia, and Europe. He is the Specialty Chief Editor in Frontiers in Applied Mathematics and Statistics - Statistics, April 2021 – present.

Huo is now the Executive Director of TRIAD (Transdisciplinary Research Institute for Advancing Data Science), http://triad.gatech.edu, an NSF funded research center located at Georgia Tech. Dr. Huo is an Associate Director in the program of the Master of Science in Analytics -- https://analytics.gatech.edu/ -- being in charge of creating a new branch in the Shenzhen-China campus of Georgia Institute of Technology. Dr. Huo is the Associate Director for Research of Institute for Data Engineering and Science (https://research.gatech.edu/data).

欢迎广大师生参加！