Audio samples for "FCL-taco2: Towards fast, controllable and lightweight text-to-speech synthesis"

Authors: Disong Wang, Liqun Deng, Yang Zhang, Nianzu Zheng, Yu Ting Yueng, Xiao Chen, Xunying Liu and Helen Meng


Contents

1. Method comparison

1.1. English experiments

No. GT GT(Mel+PWG) Tacotron2 Fastpseech2 FCL-taco2-T FCL-taco2-S Text
1 Agents are instructed that it is not their responsibility to investigate or evaluate a present danger.
2 that Oswald had told him that he had worked and been married in the Soviet Union.
3 unemployment, automation, and the use of military forces to suppress other populations.
4 He stated several times that he was a Communist but apparently never joined any Communist Party.
5 and becoming merely the recipient of information gathered by others would become limited solely to acts of physical alertness and personal courage.

1.2. Chinese experiments

No. GT GT(Mel+PWG) Tacotron2 Fastpseech2 FCL-taco2-T FCL-taco2-S Text
1 不容易被激流冲走,还有利于它潜泳,所以它爱吞石块。
2 当它们脱离原有运行轨道后,散落到地球表面,那就是陨石。
3 是机器人,机器人不允许玩过山车。
4 然后集中地反射出去,所以夜晚看起来好像会发光。
5 分布在宇宙的细小物体,滑过大气层时会发光发热,这就是流星雨了。

2. Impact of different knowledge distillation strategies:

2.1. English experiments

No. FCL-taco2-S w/o MSD w/o HRD w/o PD w/o MSD+PD w/o KD Text
1 and becoming merely the recipient of information gathered by others would become limited solely to acts of physical alertness and personal courage.
2 Agents are instructed that it is not their responsibility to investigate or evaluate a present danger.
3 that Oswald had told him that he had worked and been married in the Soviet Union.
4 unemployment, automation, and the use of military forces to suppress other populations.
5 He stated several times that he was a Communist but apparently never joined any Communist Party.

2.2. Chinese experiments

No. FCL-taco2-S w/o MSD w/o HRD w/o PD w/o MSD+PD w/o KD Text
1 不容易被激流冲走,还有利于它潜泳,所以它爱吞石块。
2 当它们脱离原有运行轨道后,散落到地球表面,那就是陨石。
3 是机器人,机器人不允许玩过山车。
4 然后集中地反射出去,所以夜晚看起来好像会发光。
5 分布在宇宙的细小物体,滑过大气层时会发光发热,这就是流星雨了。

3. Prosody manipulation:

3.1. Pitch manipulation: use the predicted F0 multiplied with a ratio (r) to generate the speech

3.1.2. English experiments

No. r=1 r=0.5 r=0.75 r=1.25 r=1.5 r=↗ r=↘ Text
1 In no characters is the contrast between the ugly and vulgar illegibility of the modern type.
2 The due relation of letter to pictures and other ornament was thoroughly understood by the old printers; so that
3 as it was occupied and appropriated in eighteen ten.

3.1.2 Chinese experiments

No. r=1. r=0.5 r=0.75 r=1.25 r=1.5 r=↗ r=↘ Text
1 就是嘛,摔跤手防守严密,无懈可击。
2 为了能让政府继续资助。
3 他们的射门击中门框次数多达6次。

3.2. Duration manipulation: use the predicted duration multiplied with a ratio (r) to generate the speech

3.2.1. English experiments

No. r=1 r=0.5 r=0.75 r=1.25 r=1.5 r=↗ r=↘ Text
1 In no characters is the contrast between the ugly and vulgar illegibility of the modern type.
2 The due relation of letter to pictures and other ornament was thoroughly understood by the old printers; so that
3 as it was occupied and appropriated in eighteen ten.

3.2.2. Chinese experiments

No. r=1 r=0.5 r=0.75 r=1.25 r=1.5 r=↗ r=↘ Text
1 就是嘛,摔跤手防守严密,无懈可击。
2 为了能让政府继续资助。
3 他们的射门击中门框次数多达6次。