Dysarthric speech reconstruction

Audio samples: Dysarthric speech reconstruction

Contents

1. Reconstruction performance of different systems

F02 (female, dysarthria-moderate-severe)
M07 (male, dysarthria-moderate-severe)
F04 (female, dysarthria-moderate)
M05 (male, dysarthria-moderate)

2. Impact of phoneme duration and F0

F02 (female, dysarthria-moderate-severe)
M07 (male, dysarthria-moderate-severe)
F04 (female, dysarthria-moderate)
M05 (male, dysarthria-moderate)

1. Reconstruction performance of different systems

Systems comparison:

Original: Original dysarthric speech.
E2E-VC : End-to-end DSR using a speaker encoder trained on the speaker verification (SV) task to control speaker identity.
SV-DSR: A strong baseline approach using explicit prosody correction and speaker encoder trained on the speaker verification (SV) task to control speaker identity.
SA-DSR: A system based on SV-DSR, where speaker encoder is fine-tuned only via speaker adaptation (SA).
ASA-DSR (proposed): proposed system based on SV-DSR, where speaker encoder is fine-tuned via proposed adversarial speaker adaptation (ASA).

Some observations & analysis:

For E2E-VC and SV-DSR, the reconstructed speech may not effectively preserve the original speaker identity, and the gender may be even changed in some cases, such as F02-No.2, F02-No.3, F02-No.4, F02-No.5, F02-No.6, F02-No.10, F02-No.11, F02-No.12, F02-No.13, F02-No.15, M07-No.1, M07-No.15, F04-No.1, F04-No.6, F04-No.7, F04-No.8, F04-No.10 and F04-No.12, which shows that the speaker encoder of E2E-VC and SV-DSR may not fully capture the identity-related information of dysarthric speakers.
For SA-DSR and ASA-DSR that both finetune the speaker encoder by using dysarthric speech, the gender information can be accurately preserved and higher speaker similarity can be achieved in the reconstructed speech, which shows the necessity of speaker adaptation.
Compared with SA-DSR, ASA-DSR can generate the speech of higher quality in terms of speech naturalness and intelligibility, which verifies the effectiveness of proposed ASA to make ASA-DSR keep the capacity of SV-DSR to generate high-quality speech.

F02 (female, dysarthria-moderate-severe):

No.	Original	E2E-VC	SV-DSR	SA-DSR	ASA-DSR (proposed)	Text
1						ablutions
2						atrocious
3						watches
4						rabbit
5						chair
6						feather
7						backspace
8						escape
9						paragraph
10						sentence
11						upward
12						downward
13						hotel
14						yankee
15						unusual

M07 (male, dysarthria-moderate-severe):

No.	Original	E2E-VC	SV-DSR	SA-DSR	ASA-DSR (proposed)	Text
1						ablutions
2						atrocious
3						watches
4						rabbit
5						chair
6						feather
7						backspace
8						escape
9						paragraph
10						sentence
11						upward
12						downward
13						hotel
14						yankee
15						unusual

F04 (female, dysarthria-moderate):

No.	Original	E2E-VC	SV-DSR	SA-DSR	ASA-DSR (proposed)	Text
1						ablutions
2						atrocious
3						watches
4						rabbit
5						chair
6						feather
7						backspace
8						escape
9						paragraph
10						sentence
11						upward
12						downward
13						hotel
14						yankee
15						unusual

M05 (male, dysarthria-moderate):

No.	Original	E2E-VC	SV-DSR	SA-DSR	ASA-DSR (proposed)	Text
1						ablutions
2						atrocious
3						watches
4						rabbit
5						chair
6						feather
7						backspace
8						escape
9						paragraph
10						sentence
11						upward
12						downward
13						hotel
14						yankee
15						unusual

2. Impact of phoneme duration and F0

Speech generation by the proposed ASA-DSR with different combinations of phoneme durations and F0:

GG: Ground-truth duration + Ground-truth F0
GP: Ground-truth duration + Predicted F0
PP: Predicted duration + Predicted F0

F02 (female, dysarthria-moderate-severe):

No.	GG	GP	PP	Text
1				ablutions
2				atrocious
3				watches
4				rabbit
5				chair
6				feather
7				backspace
8				escape
9				paragraph
10				sentence
11				upward
12				downward
13				hotel
14				yankee
15				unusual

M07 (male, dysarthria-moderate-severe):

No.	GG	GP	PP	Text
1				ablutions
2				atrocious
3				watches
4				rabbit
5				chair
6				feather
7				backspace
8				escape
9				paragraph
10				sentence
11				upward
12				downward
13				hotel
14				yankee
15				unusual

F04 (female, dysarthria-moderate):

No.	GG	GP	PP	Text
1				ablutions
2				atrocious
3				watches
4				rabbit
5				chair
6				feather
7				backspace
8				escape
9				paragraph
10				sentence
11				upward
12				downward
13				hotel
14				yankee
15				unusual

M05 (male, dysarthria-moderate):

No.	GG	GP	PP	Text
1				ablutions
2				atrocious
3				watches
4				rabbit
5				chair
6				feather
7				backspace
8				escape
9				paragraph
10				sentence
11				upward
12				downward
13				hotel
14				yankee
15				unusual