Multiplication in PyTorch

November 15, 2022 in Deep-learning

1. dot a = torch.tensor(1, 2, 3) b = torch.tensor(1, -1, 1) torch.dot(a, b) # 2 # x1y1 + x2y2 + x3y3 ... 2. mul This is the element-wise multiplication (broadcast). Or we can use ‘*’. a = torch.tensor(1, 2, 3) b = torch.tensor(1, -1, 1) torch.mul(a, b) # (1, -2, 3) a * b # (1, -2, 3) 3. mm & bmm This is the matrix-multiplication. Also, ‘bmm’ is used for batch data.

Denosing Diffusion Probability Model

October 2, 2022 in Deep-learning

1. Introduction 1.1 KL for two Gaussian distribution $$ KL(p,q) = \log\frac{\sigma_1}{\sigma_2} + \frac{\sigma_1^2+(\mu_1-\mu_2)^2}{2\sigma_2^2} - \frac{1}{2} $$ 1.2 Reparameterization If we want to sample from $N(\mu_1,\sigma_1)$, we can first sample $z$ from $N(0,1)$ and calculate $\sigma*z+\mu$ which will help us to get partial gradients. 1.3 Single layer of VAE The variational lower bound, or the evidence lower bound(ELBO), can be derived as follows. $$ \begin{aligned} p(x) &= \int_z p_\theta(x|z)p(z) \\ &= \int_z q_\phi(z|x)\frac{p_\theta(x|z)p(z)}{q_\phi(z|x)} \\ \log p(x) &= \log E_{z\sim q_\phi{z|x}}[\frac{p_\theta(x|z)p(z)}{q_\phi(z|x)}] \\ &\ge E_{z\sim q_\phi{z|x}}[\log \frac{p_\theta(x|z)p(z)}{q_\phi(z|x)}] \end{aligned} $$

Generative Flow

October 1, 2022 in Deep-learning

1. Flow-Based Generatvie Model Give a datapoint $x$ and latent varible $z\sim p_{\theta}(z)$. The function $g_{\theta}(..)$ is invertible such that $$ \begin{aligned} x&=g_{\theta}(z) \\ z&=f_{\theta}(x) =g_{\theta}^{-1}(x) \end{aligned} $$ The probability density function of the model can be written as: $$ \begin{aligned} \log p_{\theta}(x) &= \log p_{\theta}(z) + \log | det(dz/dx) | \\ &= \log p_{\theta}(z) + \sum_{i=1}^{K}\log | det(dh_i/dh_{i-1}) | \end{aligned} $$ To simplify the calculation, we can choose transformations with Jacobian $dh_{i}/dh_{i-1}$ is a triangular matrix,

VITS

September 30, 2022 in Deep-learning

1. ELBO We define that $x$ is target and $z$ is hidden variable. $$P(x)=\int_{z}P(x|z)P(z)dz$$ Since $P(x|z)$ is close to 0, we have to shrink the sample space of $z$. Supposed that $z \sim Q(z|x)$: $$\begin{aligned} KL[Q(z|x) || P(z|x)] &= \Epsilon_{z\sim Q(z|x)}[\log Q(z|x) - \log P(z|x)] \\ KL[Q(z|x) || P(z|x)] &= \Epsilon_{z\sim Q(z|x)}[\log Q(z|x) - \log P(x|z) - \log P(z) + \log P(x)] \\ \log P(x) - D[Q(z|x),P(z|x)] &= \Epsilon_{z\sim Q(z|x)}[\log P(x|z)] - KL[P(z) || Q(z|x)] \\ \log P(x) &\ge \Epsilon_{z\sim Q(z|x)}[\log P(x|z)] - KL[P(z) || Q(z|x)] \end{aligned}$$

VAE

June 14, 2022 in Deep-learning

1. Auto Encoder Before we formally introduce the VAE, let’s first look at the structure of AE. It consists of an Encoder and a Decoder. The input data will be input to the Encoder to get the ‘hidden states’. Then the Decoder will eat these ‘hidden states’ to recover the input which means the output should be as close to the input as possible. Usually, we hope that the dimension of ‘hidden states’ will be less than the input to achieve dimension reduction.

Attention

May 1, 2022 in Deep-learning

1. Input There’re 3 inputs Q(query), K(key), V(value) for attention mechanism. If Q=K=V, we call it ‘self-attention’. Also, there’re several rules to calculate it. $$Attention(Q,K,V) = Softmax(Linear([Q,K])) \cdot V$$ $$Attention(Q,K,V) = Softmax(sum(\tanh(Linear([Q,K])))) \cdot V$$ $$Attention(Q,K,V) = Softmax(\frac{Q \cdot K^T}{\sqrt{d_k}}) \cdot V$$ The ‘bmm’ is a special tensor multiply operation, batch matrices multiplication. $$(b, n, m)*(b, m, p) \rightarrow (b, n, p)$$ Attention is usually used in seq2seq task. import torch import torch.

Recurrent Neural Network

May 1, 2022 in Deep-learning

1. RNN Hidden state at time t depends on the current input and the previous hidden state. Activation function’tanh’ add unlinearity and output values from -1 to 1. $$h_t = \tanh(W_t \cdot [h_{t-1}, x_t]+b_t)$$ import torch import torch.nn as nn rnn = nn.RNN(input_size, hidden_size, num_layers) input1 = torch.randn(sequence_length, batch_size, input_size) h0 = torch.randn(num_layers*num_directions, batch_size, hidden_size) output, hn = rnn(input1, h0) RNN is useful in short sequence, but may encounter ‘gradients vanishing’ and ‘gradients explosion’ while processing long sequence.

Multiplication in PyTorch

Denosing Diffusion Probability Model

Generative Flow

VITS

VAE

Attention

Recurrent Neural Network

LI WEI