展会邀请函在哪个网站做,江门网站排名优化,美团网站开发,建设摩托车官网报价大全接上篇#xff0c;现在考虑给w\boldsymbol{w}w加入先验#xff0c;考虑最简单的假设#xff0c;也就是w\boldsymbol{w}w服从均值为0#xff0c;协方差矩阵为α−1I\alpha^{-1}\boldsymbol{I}α−1I的高斯分布。 p(w∣α)N(w∣0,α−1I)(α2π)(M1)/2exp{−α2wTw}\begin{… 接上篇现在考虑给w\boldsymbol{w}w加入先验考虑最简单的假设也就是w\boldsymbol{w}w服从均值为0协方差矩阵为α−1I\alpha^{-1}\boldsymbol{I}α−1I的高斯分布。 p(w∣α)N(w∣0,α−1I)(α2π)(M1)/2exp{−α2wTw}\begin{aligned} p(\boldsymbol{w}|\alpha)\mathcal{N}(\boldsymbol{w}|0,\alpha^{-1}\boldsymbol{I})\\ (\frac{\alpha}{2\pi})^{(M1)/2}\exp\{-\frac{\alpha}{2}\boldsymbol{w}^T\boldsymbol{w}\} \end{aligned} p(w∣α)N(w∣0,α−1I)(2πα)(M1)/2exp{−2αwTw}我们一步一步看一下给定(x,t,α,β)(\boldsymbol{x},\boldsymbol{t},\alpha,\beta)(x,t,α,β)后参数w\boldsymbol{w}w的概率 p(w∣t)p(t∣w)p(w)p(t)p(w∣t,x,α,β)p(t∣w,x,α,β)p(w∣x,α,β)p(t∣x,α,β)\begin{aligned} p(\boldsymbol{w}|\boldsymbol{t})\frac{p(\boldsymbol{t}|\boldsymbol{w})p(\boldsymbol{w})}{p(\boldsymbol{t})}\\ p(\boldsymbol{w}|\boldsymbol{t},\boldsymbol{x},\alpha,\beta)\frac{p(\boldsymbol{t}|\boldsymbol{w},\boldsymbol{x},\alpha,\beta)p(\boldsymbol{w}|\boldsymbol{x},\alpha,\beta)}{p(\boldsymbol{t}|\boldsymbol{x},\alpha,\beta)} \end{aligned} p(w∣t)p(w∣t,x,α,β)p(t)p(t∣w)p(w)p(t∣x,α,β)p(t∣w,x,α,β)p(w∣x,α,β) 由于α\alphaα和ttt独立因此上式似然函数p(t∣w,x,α,β)p(t∣w,x,β)p(\boldsymbol{t}|\boldsymbol{w},\boldsymbol{x},\alpha,\beta)p(\boldsymbol{t}|\boldsymbol{w},\boldsymbol{x},\beta)p(t∣w,x,α,β)p(t∣w,x,β)而w\boldsymbol{w}w的先验我们已经有了假设因此得到书上的结果此处个人理解 p(w∣x,t,α,β)∝p(t∣x,w,β)p(w∣α)p(\boldsymbol{w}|\boldsymbol{x},\boldsymbol{t},\alpha,\beta)\propto p(\boldsymbol{t}|\boldsymbol{x},\boldsymbol{w},\beta)p(\boldsymbol{w}|\alpha) p(w∣x,t,α,β)∝p(t∣x,w,β)p(w∣α) 现在成了我们最大化后验概率求w\boldsymbol{w}w变成了最大化似然函数p(t∣x,w,β)p(\boldsymbol{t}|\boldsymbol{x},\boldsymbol{w},\beta)p(t∣x,w,β)和先验概率p(w∣α)p(\boldsymbol{w}|\alpha)p(w∣α)乘积的值。由于p(t∣x,w,β)∏n1NN(tn∣y(xn,w),β−1)∏n1N1(2π)12β−12exp(tn−y(xn,w))2−2β−1p(\boldsymbol{t}|\boldsymbol{x},\boldsymbol{w},\beta)\prod_{n1}^N\mathcal{N}(t_n|y(x_n,\boldsymbol{w}),\beta^{-1})\prod_{n1}^N\frac{1}{(2\pi)^{\frac{1}{2}}\beta^{-\frac{1}{2}}}exp{\frac{(t_n-y(x_n,\boldsymbol{w}))^2}{-2\beta^{-1}}}p(t∣x,w,β)n1∏NN(tn∣y(xn,w),β−1)n1∏N(2π)21β−211exp−2β−1(tn−y(xn,w))2 p(w∣α)N(w∣0,α−1I)(α2π)(M1)/2exp{−α2wTw}\begin{aligned} p(\boldsymbol{w}|\alpha)\mathcal{N}(\boldsymbol{w}|0,\alpha^{-1}\boldsymbol{I})\\ (\frac{\alpha}{2\pi})^{(M1)/2}\exp\{-\frac{\alpha}{2}\boldsymbol{w}^T\boldsymbol{w}\} \end{aligned} p(w∣α)N(w∣0,α−1I)(2πα)(M1)/2exp{−2αwTw} 因此 p(t∣x,w,β)p(w∣α)[∏n1N1(2π)12β−12exp(tn−y(xn,w))2−2β−1](α2π)(M1)/2exp{−α2wTw}\begin{aligned} p(\boldsymbol{t}|\boldsymbol{x},\boldsymbol{w},\beta)p(\boldsymbol{w}|\alpha) \left[\prod_{n1}^N\frac{1}{(2\pi)^{\frac{1}{2}}\beta^{-\frac{1}{2}}}exp{\frac{(t_n-y(x_n,\boldsymbol{w}))^2}{-2\beta^{-1}}}\right] \left(\frac{\alpha}{2\pi}\right)^{(M1)/2}\exp\{-\frac{\alpha}{2}\boldsymbol{w}^T\boldsymbol{w}\} \end{aligned} p(t∣x,w,β)p(w∣α)[n1∏N(2π)21β−211exp−2β−1(tn−y(xn,w))2](2πα)(M1)/2exp{−2αwTw}两边取ln可得 lnp(t∣x,w,β)p(w∣α)−β2∑n1N{y(xn,w)−tn}2N2lnβ−N2ln(2π)M12lnα−M12ln2π−α2wTw\begin{aligned} \ln{p}(\boldsymbol{t}|\boldsymbol{x},\boldsymbol{w},\beta)p(\boldsymbol{w}|\alpha) -\frac{\beta}{2}\sum_{n1}^N\{y(x_n,\boldsymbol{w})-t_n\}^2\frac{N}{2}\ln{\beta}-\frac{N}{2}\ln{(2\pi)} \frac{M1}{2}\ln{\alpha}-\frac{M1}{2}\ln{2\pi}-\frac{\alpha}{2}\boldsymbol{w}^T\boldsymbol{w} \end{aligned} lnp(t∣x,w,β)p(w∣α)−2βn1∑N{y(xn,w)−tn}22Nlnβ−2Nln(2π)2M1lnα−2M1ln2π−2αwTw我们现在要找的是最可能的w\boldsymbol{w}w的值因此只考虑与w\boldsymbol{w}w有关的部门去掉常数可得 lnp(t∣x,w,β)p(w∣α)−β2∑n1N{y(xn,w)−tn}2−α2wTw\begin{aligned} \ln{p}(\boldsymbol{t}|\boldsymbol{x},\boldsymbol{w},\beta)p(\boldsymbol{w}|\alpha)-\frac{\beta}{2}\sum_{n1}^N\{y(x_n,\boldsymbol{w})-t_n\}^2-\frac{\alpha}{2}\boldsymbol{w}^T\boldsymbol{w} \end{aligned} lnp(t∣x,w,β)p(w∣α)−2βn1∑N{y(xn,w)−tn}2−2αwTw这就相当于最小化 β2∑n1N{y(xn,w)−tn}2α2wTw\frac{\beta}{2}\sum_{n1}^N\{y(x_n,\boldsymbol{w})-t_n\}^2\frac{\alpha}{2}\boldsymbol{w}^T\boldsymbol{w} 2βn1∑N{y(xn,w)−tn}22αwTw