> Tacotron2

Tacotron2

April 30, 2022 in Deep-learning

1. Encoder

$$one_hot \Rightarrow char_embedding \Rightarrow 3*Conv_layer \Rightarrow LSTM \Rightarrow Context$$

Char_level embeddings are vector representation of the original sentences. We can use conv_layers and LSTM to extract meanings of them.

2. Prenet

$$[frame_t, batch, frame_dim] \Rightarrow 2*Linear(ReLU) \Rightarrow F_t$$

$$concat(F_t,~Context, F_{t-1}) \Rightarrow LSTM_Cell$$

Author's picture

LI WEI

苟日新，日日新，又日新

Not yet

Tokyo