E2E-VC : End-to-end DSR using a speaker encoder trained on the speaker verification (SV) task to control speaker identity.
SV-DSR: A strong baseline approach using explicit prosody correction and speaker encoder trained on the speaker verification (SV) task to control speaker identity.
SA-DSR: A system based on SV-DSR, where speaker encoder is fine-tuned only via speaker adaptation (SA).
ASA-DSR (proposed): proposed system based on SV-DSR, where speaker encoder is fine-tuned via proposed adversarial speaker adaptation (ASA).
For SA-DSR and ASA-DSR that both finetune the speaker encoder by using dysarthric speech, the gender information can be accurately preserved and higher speaker similarity can be achieved in the reconstructed speech, which shows the necessity of speaker adaptation.
Compared with SA-DSR, ASA-DSR can generate the speech of higher quality in terms of speech naturalness and intelligibility, which verifies the effectiveness of proposed ASA to make ASA-DSR keep the capacity of SV-DSR to generate high-quality speech.