Enc-CM (proposed): Multi-speaker conversion model by using the deep speaker embedding (DSE) obtained from the speaker encoder to control the speaker identity.
Ada-CM (proposed): Adapted conversion model by using the DSE obtained via speaker adaptation to control the speaker identity.
Two speakers are used for experiments:
M05: a dysarthric speaker has moderate severe dysarthria with the speech having low intelligibility, the speaker is selected from UASpeech dataset.
LXC: a non-native English speaker with obvious Chinese accent, the speaker is selected from L2-ARCTIC dataset.
M05 (dysarthric speaker):
No.
Original
E2E-VC
Enc-CM (proposed)
Ada-CM (proposed)
Text
1
backspace
2
paragraph
3
sentence
4
copy
5
upward
6
astounded
7
into
8
find
LXC (L2 speaker):
No.
Original
E2E-VC
Enc-CM (proposed)
Ada-CM (proposed)
Text
1
There was no forecasting this strange girl's processes.
2
It was more like sugar.
3
Without them he could not run his empire.
4
Bill lingered, contemplating his work with artistic appreciation.
5
Once the jews harp began emitting its barbaric rhythms, Michael was helpless.
6
Then you can arrange yourself comfortably among these robes in the bow.
7
How could I answer the question on the spur of the moment.