Intensity control of speech synthesis
1. Notes
(1) We can’t regard different non-neutral speech pair as similar set, otherwise the emotional intensity labels attract each other.
(2) The intensity predictor should be fixed while training the text-to-speech model, otherwise the intensity cannot be controlled. (maybe because the label for each sample always fluctuates)