> Deep-Learning
1. dot a = torch.tensor(1, 2, 3) b = torch.tensor(1, -1, 1) torch.dot(a, b) # 2 # x1y1 + x2y2 + x3y3 ... 2. mul This is the element-wise multiplication (broadcast). Or we can use ‘*’. a = torch.tensor(1, 2, 3) b = torch.tensor(1, -1, 1) torch.mul(a, b) # (1, -2, 3) a * b # (1, -2, 3) 3. mm & bmm This is the matrix-multiplication. Also, ‘bmm’ is used for batch data.

Continue reading

1. Introduction 1.1 KL for two Gaussian distribution $$ KL(p,q) = \log\frac{\sigma_1}{\sigma_2} + \frac{\sigma_1^2+(\mu_1-\mu_2)^2}{2\sigma_2^2} - \frac{1}{2} $$ 1.2 Reparameterization If we want to sample from $N(\mu_1,\sigma_1)$, we can first sample $z$ from $N(0,1)$ and calculate $\sigma*z+\mu$ which will help us to get partial gradients. 1.3 Single layer of VAE The variational lower bound, or the evidence lower bound(ELBO), can be derived as follows. $$ \begin{aligned} p(x) &= \int_z p_\theta(x|z)p(z) \\ &= \int_z q_\phi(z|x)\frac{p_\theta(x|z)p(z)}{q_\phi(z|x)} \\ \log p(x) &= \log E_{z\sim q_\phi{z|x}}[\frac{p_\theta(x|z)p(z)}{q_\phi(z|x)}] \\ &\ge E_{z\sim q_\phi{z|x}}[\log \frac{p_\theta(x|z)p(z)}{q_\phi(z|x)}] \end{aligned} $$

Continue reading

Generative Flow

1. Flow-Based Generatvie Model Give a datapoint $x$ and latent varible $z\sim p_{\theta}(z)$. The function $g_{\theta}(..)$ is invertible such that $$ \begin{aligned} x&=g_{\theta}(z) \\ z&=f_{\theta}(x) =g_{\theta}^{-1}(x) \end{aligned} $$ The probability density function of the model can be written as: $$ \begin{aligned} \log p_{\theta}(x) &= \log p_{\theta}(z) + \log | det(dz/dx) | \\ &= \log p_{\theta}(z) + \sum_{i=1}^{K}\log | det(dh_i/dh_{i-1}) | \end{aligned} $$ To simplify the calculation, we can choose transformations with Jacobian $dh_{i}/dh_{i-1}$ is a triangular matrix,

Continue reading

VITS

1. ELBO We define that $x$ is target and $z$ is hidden variable. $$P(x)=\int_{z}P(x|z)P(z)dz$$ Since $P(x|z)$ is close to 0, we have to shrink the sample space of $z$. Supposed that $z \sim Q(z|x)$: $$\begin{aligned} KL[Q(z|x) || P(z|x)] &= \Epsilon_{z\sim Q(z|x)}[\log Q(z|x) - \log P(z|x)] \\ KL[Q(z|x) || P(z|x)] &= \Epsilon_{z\sim Q(z|x)}[\log Q(z|x) - \log P(x|z) - \log P(z) + \log P(x)] \\ \log P(x) - D[Q(z|x),P(z|x)] &= \Epsilon_{z\sim Q(z|x)}[\log P(x|z)] - KL[P(z) || Q(z|x)] \\ \log P(x) &\ge \Epsilon_{z\sim Q(z|x)}[\log P(x|z)] - KL[P(z) || Q(z|x)] \end{aligned}$$

Continue reading

VAE

1. Auto Encoder Before we formally introduce the VAE, let’s first look at the structure of AE. It consists of an Encoder and a Decoder. The input data will be input to the Encoder to get the ‘hidden states’. Then the Decoder will eat these ‘hidden states’ to recover the input which means the output should be as close to the input as possible. Usually, we hope that the dimension of ‘hidden states’ will be less than the input to achieve dimension reduction.

Continue reading

Attention

1. Input There’re 3 inputs Q(query), K(key), V(value) for attention mechanism. If Q=K=V, we call it ‘self-attention’. Also, there’re several rules to calculate it. $$Attention(Q,K,V) = Softmax(Linear([Q,K])) \cdot V$$ $$Attention(Q,K,V) = Softmax(sum(\tanh(Linear([Q,K])))) \cdot V$$ $$Attention(Q,K,V) = Softmax(\frac{Q \cdot K^T}{\sqrt{d_k}}) \cdot V$$ The ‘bmm’ is a special tensor multiply operation, batch matrices multiplication. $$(b, n, m)*(b, m, p) \rightarrow (b, n, p)$$ Attention is usually used in seq2seq task. import torch import torch.

Continue reading

1. RNN Hidden state at time t depends on the current input and the previous hidden state. Activation function’tanh’ add unlinearity and output values from -1 to 1. $$h_t = \tanh(W_t \cdot [h_{t-1}, x_t]+b_t)$$ import torch import torch.nn as nn rnn = nn.RNN(input_size, hidden_size, num_layers) input1 = torch.randn(sequence_length, batch_size, input_size) h0 = torch.randn(num_layers*num_directions, batch_size, hidden_size) output, hn = rnn(input1, h0) RNN is useful in short sequence, but may encounter ‘gradients vanishing’ and ‘gradients explosion’ while processing long sequence.

Continue reading

Author's picture

LI WEI

苟日新,日日新,又日新

Not yet

Tokyo