Audio samples - VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion
Authors: Disong Wang, Liqun Deng, Yu Ting Yeung, Xiao Chen, Xunying Liu and Helen Meng
VC performance of different methods
Source & target speakers (all speakers are unseen during training):
- Source-male: p241
- Source-female: p238
- Target-male: p243
- Target-female: p294
Male-to-female: Target reference utterance-
No. |
Source |
AutoVC |
AdaIN-VC |
VQVC+ |
VQMIVC (proposed) |
w/o MI (proposed) |
1 |
|
|
|
|
|
|
2 |
|
|
|
|
|
|
3 |
|
|
|
|
|
|
Male-to-male:Target reference utterance-
No. |
Source |
AutoVC |
AdaIN-VC |
VQVC+ |
VQMIVC (proposed) |
w/o MI (proposed) |
1 |
|
|
|
|
|
|
2 |
|
|
|
|
|
|
3 |
|
|
|
|
|
|
Female-to-male:Target reference utterance-
No. |
Source |
AutoVC |
AdaIN-VC |
VQVC+ |
VQMIVC (proposed) |
w/o MI (proposed) |
1 |
|
|
|
|
|
|
2 |
|
|
|
|
|
|
3 |
|
|
|
|
|
|
Female-to-female:Target reference utterance-
No. |
Source |
AutoVC |
AdaIN-VC |
VQVC+ |
VQMIVC (proposed) |
w/o MI (proposed) |
1 |
|
|
|
|
|
|
2 |
|
|
|
|
|
|
3 |
|
|
|
|
|
|