VC for DSR and AC

Audio samples: Learning Explicit Prosody Models and Deep Speaker Embeddings for Atypical Voice Conversion

Authors: Disong Wang, Songxiang Liu, Lifa Sun, Xixin Wu, Xunying Liu and Helen Meng

Contents

System comparison:

Original: Original dysarthric or L2 speech.
E2E-VC: End-to-end VC for non-standard speech.
Enc-CM (proposed): Multi-speaker conversion model by using the deep speaker embedding (DSE) obtained from the speaker encoder to control the speaker identity.
Ada-CM (proposed): Adapted conversion model by using the DSE obtained via speaker adaptation to control the speaker identity.

Two speakers are used for experiments:

M05: a dysarthric speaker has moderate severe dysarthria with the speech having low intelligibility, the speaker is selected from UASpeech dataset.
LXC: a non-native English speaker with obvious Chinese accent, the speaker is selected from L2-ARCTIC dataset.

No.	Original	E2E-VC	Enc-CM (proposed)	Ada-CM (proposed)	Text
1					There was no forecasting this strange girl's processes.
2					It was more like sugar.
3					Without them he could not run his empire.
4					Bill lingered, contemplating his work with artistic appreciation.
5					Once the jews harp began emitting its barbaric rhythms, Michael was helpless.
6					Then you can arrange yourself comfortably among these robes in the bow.
7					How could I answer the question on the spur of the moment.
8					They are not biologists nor sociologists.

Speech generation by the proposed Enc-CM with different combinations of phoneme durations and F0:

No.	GD+GF	GD+PF	PD+PF	Text
1				Philip snatched at the letter which Gregson held out to him.
2				He also contended that better confidence was established by carrying no weapons.
3				It was over when he made his way through the ring of spectators.
4				There is no need of further detail, now -- for you can understand.
5				I want to die in it.
6				She was even more beautiful than when I saw her, before.
7				How does your wager look now.
8				Hardly were our plans made public before we were met by powerful opposition.