Tacotron2

April 30, 2022 in Deep-learning

1. Encoder $$one_hot \Rightarrow char_embedding \Rightarrow 3*Conv_layer \Rightarrow LSTM \Rightarrow Context$$ Char_level embeddings are vector representation of the original sentences. We can use conv_layers and LSTM to extract meanings of them. 2. Prenet $$[frame_t, batch, frame_dim] \Rightarrow 2*Linear(ReLU) \Rightarrow F_t$$ $$concat(F_t,~Context, F_{t-1}) \Rightarrow LSTM_Cell$$

Word2vec

April 29, 2022 in Deep-learning

1. CBOW & Skip-Gram Skip-gram is an easier model, so we just discuss it only. Suppose that the size of vocabulary is 10000, and we want to map it into a 300-dim features vector. It’s consists of a hidden layer (10000x300) and an output layer(300x10000). Note that there are millions of parameters that needs updating, so we have to use some strategies to avoid this. Word Pairs “New York” are regarded as single word, which has different meanings from “New” and “York”.

Support Vector Machine

April 27, 2022 in Deep-learning

1. Classification Problem Supposed that we are given a batch of data $$D={(x_1, y_1), (x_2, y_2), …, (x_m, y_m)},y_i \in {-1,+1}$$, and we want to find a hyper-plane such that it can properly divide these two class of data. A good hyper-plane should be able to accept slight disturbance. Multi-class Classification A classification task with the assumption that each sample belongs to one and only one class. $${dog, cat}$$ Multi-label Classification A classification task that handle several joint classification tasks.

Clustering

April 24, 2022 in Deep-learning

1. Definition Formally, suppose that we have a dataset $$D={x_1,x_2,…,x_m}$$, in which every sample is a n-dim vector. Clustering is to divide these data into k disjoint ‘cluster’ $${C_l \vert l=1,2,…,k}$$. This is just like ‘Disjoint Set’ that we learn in algorithm course. 2. K-means The intuition of K-means algorithm is very straight-forward. Initially, we choose k random samples $${ \mu_1,\mu_2,…,\mu_k }$$ as the ‘mean vector’, which represent the center position of k cluster.

Transposed Convolution

April 7, 2022 in Deep-learning

1. Function ConvTransposed can be used to enlarge the width and height from the input. 2. Principle pad=0, strd=1 Fill the input into size kernel-1, then apply Transposed kernel Conv (主副对角线转置). 3. General Case Insert stride lines between rows/cols , then fill the input into size kernel-padding-1, then Transposed kernel Conv (k, 1, 0) ConvTranspose2d $$n^* = (n-1)s + k - 2p$$ Conv2d $$n^* = floor( \frac{n -k + 2p + s}{s} )$$

Sequential Model

April 3, 2022 in Deep-learning

1. Sequential Model 1.1 Conditional Probability $$P(\bold x)=P(x_1)P(x_2|x_1)P(x_3|x_1,x_2)…P(x_t|x_1,x_2,…,x_{t-1})$$ 1.2 Autoregressive Model For ever-known model, we call it AR Model. $$p(x_t|x_1,x_2,…,x_{t-1})=p(x_t|f(x_1,x_2,…,x_{t-1}))$$ 1.3 Markov Model By Markov Hypothesis, the current state is determined by previous$$\tau$$ points. $$p(x_t|x_1,x_2,…,x_{t-1})=p(x_t|x_{t-\tau},…,x_{t-1})=p(x_t|f(x_{t-\tau},…,x_{t-1}))$$ 1.4 Latent Model 潜变量模型 we use a variable to represent the inner states (RNN is one of the Latent Model) $$h_t=f(x_1,…,x_{t-1})$$ $$x_t=p(x_t|h_t)$$ QA…

Cross Entropy

January 25, 2022 in Deep-learning

1. Review of Entropy Entropy is usually used to represent the average amount of self-information. The calculation formula is as follows: $$H(X) = E(log\frac{1}{p_i}) = -\sum_{i=1}^{n}p_ilogp_i $$ For example, supposed that we have a probability distribution X = [0.7, 0.3]. Then we have, $$H(X) = -0.7log0.7-0.3log0.3 = 0.88 bit$$ What’s more, we can easily find that $$0\le H(X) \le logn$$ is always true. The proof of the left inequality is trivial because $$0\le p\le 1$$ the equality is established if and only if $$p_i=1$$

Tacotron2

Word2vec

Support Vector Machine

Clustering

Transposed Convolution

Sequential Model

Cross Entropy

LI WEI