12-12【汪 鹏】五教5305 “数学优化”系列报告三

发布者:卢珊珊发布时间:2024-12-10浏览次数:10


报告题目:Understanding Distribution Learning of Diffusion Models via Low-Dimensional Modeling


报告人:汪鹏  密歇根大学


报告时间:1212 14:00-15:00


报告地点:五教5305 

 

摘要:

Recent empirical studies have demonstrated that diffusion models can effectively learn the image distribution and generate new samples. Remarkably, these models can achieve this even with a small number of training samples despite a large image dimension, circumventing the curse of dimensionality. In this work, we provide theoretical insights into this phenomenon by leveraging key empirical observations: (i) the low intrinsic dimensionality of image datasets and (ii) the low-rank property of the denoising autoencoder in trained diffusion models. These observations motivate us to assume the underlying data distribution as a mixture of low-rank Gaussians and to parameterize the denoising autoencoder as a low-rank model. With these setups, we rigorously show that optimizing the training loss of diffusion models is equivalent to solving the canonical subspace clustering problem over the training samples. This insight carries practical implications for training and controlling diffusion models. Specifically, it allows us to characterize precisely the minimal number of samples necessary for learning correctly the low-rank data support, shedding light on the phase transition from memorization to generalization. Moreover, we empirically establish a correspondence between the subspaces and the semantic representations of image data, facilitating image editing. We validate these results with corroborated experimental results on both simulated distributions and image datasets.