Moment Matching
Moment Matching
设 \(p(\mathbf z)\) 是一个给定的分布,我们希望用一个指数族分布 \(q(\mathbf z)\) 去近似 \(p(\mathbf z)\): \[ q(\mathbf z)=h(\mathbf z)g(\boldsymbol\eta)\exp\left\{\boldsymbol\eta^T\mathbf u(\mathbf z)\right\} \] 考虑最小化二者的 KL 散度: \[ \begin{align} \text{KL}(p\Vert q)&=\int p(\mathbf z)\ln\frac{p(\mathbf z)}{q(\mathbf z)}\mathrm d\mathbf z =\int p(\mathbf z)\ln\frac{p(\mathbf z)}{h(\mathbf z)g(\boldsymbol\eta)\exp\{\boldsymbol\eta^T\mathbf u(\mathbf z)\}}\mathrm d\mathbf z\\ &=-\ln g(\boldsymbol\eta)-\int p(\mathbf z)\boldsymbol\eta^T\mathbf u(\mathbf z)\mathrm d\mathbf z+\text{const}\\ &=-\ln g(\boldsymbol\eta)-\boldsymbol\eta^T\mathbb E_{p(\mathbf z)}[\mathbf u(\mathbf z)]+\text{const} \end{align} \] 其中 \(\text{const}\) 是与参数 \(\boldsymbol\eta\) 无关的常数项。对 \(\boldsymbol\eta\) 求导并令为零得: \[ -\nabla\ln g(\boldsymbol\eta)=\mathbb E_{p(\mathbf z)}[\mathbf u(\mathbf z)]\tag{1}\label{1} \] 另一方面,由归一化条件: \[ \int q(\mathbf z)\mathrm d\mathbf z=g(\boldsymbol\eta)\int h(\mathbf z)\exp\{\boldsymbol\eta^T\mathbf u(\mathbf z)\}\mathrm d\mathbf z=1 \] 两边同时求导得: \[ \nabla g(\boldsymbol\eta)\int h(\mathbf z)\exp\{\boldsymbol\eta^T\mathbf u(\mathbf z)\}\mathrm d\mathbf z+g(\boldsymbol\eta)\int h(\mathbf z)\exp\{\boldsymbol\eta^T\mathbf u(\mathbf z)\}\mathbf u(\mathbf z)\mathrm d\mathbf z=0 \] 整理得: \[ -\nabla\ln g(\boldsymbol\eta)=-\frac{\nabla g(\boldsymbol\eta)}{g(\boldsymbol\eta)}=\frac{\int h(\mathbf z)\exp\{\boldsymbol\eta^T\mathbf u(\mathbf z)\}\mathbf u(\mathbf z)\mathrm d\mathbf z}{\int h(\mathbf z)\exp\{\boldsymbol\eta^T\mathbf u(\mathbf z)\}\mathrm d\mathbf z}=\frac{\int q(\mathbf z)\mathbf u(\mathbf z)\mathrm d\mathbf z}{\int q(\mathbf z)\mathrm d\mathbf z}=\mathbb E_{q(\mathbf z)}[\mathbf u(\mathbf z)]\tag{2}\label{2} \] 联立 \(\eqref{1},\eqref{2}\) 式得: \[ \mathbb E_{p(\mathbf z)}[\mathbf u(\mathbf z)]=\mathbb E_{q(\mathbf z)}[\mathbf u(\mathbf z)] \] 这意味着,问题的最优解对应匹配充分统计量的期望。例如,假设 \(q(\mathbf z)\) 是一个高斯分布 \(\mathcal N(\mathbf z\vert\boldsymbol\mu,\boldsymbol\Sigma)\),那么最优解就是将均值 \(\boldsymbol\mu\) 设定为 \(p(\mathbf z)\) 的均值,协方差矩阵 \(\boldsymbol\Sigma\) 设定为 \(p(\mathbf z)\) 的协方差矩阵,这称作 moment matching.
References
- Bishop, Christopher. Pattern recognition and machine learning. ↩︎