Variational Methods

James Cussens, University of York

Variational Inference (pros)

From (Kucukelbir et al. 2015):

Variational Inference (cons)

Automatic Differentiation Variational Inference

VI by minising KL divergence

\[\min_{\phi} \mathrm{KL}(q(\theta ; \phi) \; || \; p(\theta | \mathbf{X}))\]

The evidence lower bound

The evidence lower bound (ELBO) is:

\[{\cal L}(\phi) = \mathbb{E}_{q(\theta:\phi)}[ \log(p(\theta,\mathbf{X}) ] - \mathbb{E}_{q(\theta ; \phi)}[ \log(q(\theta ; \phi) ]\]

Maximising the ELBO minimises the KL divergence (and so that’s what we do).

A transformation-based approach

\[T: \mathrm{supp}(p(\theta)) \rightarrow \mathbb{R}^{K}\] where \(K\) is the dimension of \(\theta\).

Mean field approximation

\[q(\zeta : \phi) = {\cal N}(\zeta ; \mu, \sigma^{2}) = \prod_{k=1}^{K} {\cal N}(\zeta_{k} ; \mu_{k}, \sigma_{k}^{2})\] where \(\phi = (\mu_{1}, \dots, \mu_{K}, \sigma^{2}_{1}, \dots, \sigma^{2}_{K})\)

Maximising ELBO in real co-ordinate space

“We now seek to maximize the ELBO in real coordinate space \[\mu^{*}, \sigma^{2*} = \arg \max_{\mu, \sigma^{2}} {\cal L}(\mu,\sigma^{2}) \mbox{ such that $\sigma^{2} \succ 0$}\] We can use gradient ascent to reach a local maximum of the ELBO” (Kucukelbir et al. 2015):

So does it actually work?

Kucukelbir, Alp, Rajesh Ranganath, Andrew Gelman, and David Blei. 2015. “Automatic Variational Inference in Stan.” In Advances in Neural Information Processing Systems 28, edited by C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, 568–76. Curran Associates, Inc. http://papers.nips.cc/paper/5758-automatic-variational-inference-in-stan.pdf.