Tacotron2
1. Encoder
$$one_hot \Rightarrow char_embedding \Rightarrow 3*Conv_layer \Rightarrow LSTM \Rightarrow Context$$
Char_level embeddings are vector representation of the original sentences. We can use conv_layers and LSTM to extract meanings of them.
2. Prenet
$$[frame_t, batch, frame_dim] \Rightarrow 2*Linear(ReLU) \Rightarrow F_t$$
$$concat(F_t,~Context, F_{t-1}) \Rightarrow LSTM_Cell$$