1. Introduction

1.1 KL for two Gaussian distribution

$$ KL(p,q) = \log\frac{\sigma_1}{\sigma_2} + \frac{\sigma_1^2+(\mu_1-\mu_2)^2}{2\sigma_2^2} - \frac{1}{2} $$

1.2 Reparameterization

If we want to sample from $N(\mu_1,\sigma_1)$, we can first sample $z$ from $N(0,1)$ and calculate $\sigma*z+\mu$ which will help us to get partial gradients.

1.3 Single layer of VAE

The variational lower bound, or the evidence lower bound(ELBO), can be derived as follows.

$$ \begin{aligned} p(x) &= \int_z p_\theta(x|z)p(z) \\ &= \int_z q_\phi(z|x)\frac{p_\theta(x|z)p(z)}{q_\phi(z|x)} \\ \log p(x) &= \log E_{z\sim q_\phi{z|x}}[\frac{p_\theta(x|z)p(z)}{q_\phi(z|x)}] \\ &\ge E_{z\sim q_\phi{z|x}}[\log \frac{p_\theta(x|z)p(z)}{q_\phi(z|x)}] \end{aligned} $$

1.4 Multi-layer VAE

With Markov’s assumption, we have

$$ \begin{aligned} p(x) &= \iint q_\phi(z_1,z_2|x)\frac{p_\theta(x,z_1,z_2)}{q_\phi(z_1,z_2|x)} \\ \log p(x) &\ge E_{z_1,z_2\sim q_\phi(z_1,z_2|x)}[\log\frac{p_\theta(x,z_1,z_2)}{q_\phi(z_1,z_2|x)}] \\ \mathcal{L}(\theta,\phi) &= E_{z_1,z_2\sim q_\phi(z_1,z_2|x)}[\log p(x|z_1)-\log q(z_1|x)+\log p(z_1|z_2)-\log q(z_2|z_1)+\log p(z_2)] \end{aligned} $$