Audio samples: Learning Explicit Prosody Models and Deep Speaker Embeddings for Atypical Voice Conversion

Authors: Disong Wang, Songxiang Liu, Lifa Sun, Xixin Wu, Xunying Liu and Helen Meng


Diagram

Contents


1. VC performance of different methods

  • System comparison:
  • Two speakers are used for experiments:
  • M05 (dysarthric speaker):

    No. Original E2E-VC Enc-CM (proposed) Ada-CM (proposed) Text
    1 backspace
    2 paragraph
    3 sentence
    4 copy
    5 upward
    6 astounded
    7 into
    8 find

    LXC (L2 speaker):

    No. Original E2E-VC Enc-CM (proposed) Ada-CM (proposed) Text
    1 There was no forecasting this strange girl's processes.
    2 It was more like sugar.
    3 Without them he could not run his empire.
    4 Bill lingered, contemplating his work with artistic appreciation.
    5 Once the jews harp began emitting its barbaric rhythms, Michael was helpless.
    6 Then you can arrange yourself comfortably among these robes in the bow.
    7 How could I answer the question on the spur of the moment.
    8 They are not biologists nor sociologists.

    2. Impact of phoneme duration and F0

  • Speech generation by the proposed Enc-CM with different combinations of phoneme durations and F0:
  • M05 (dysarthric speaker):

    No. GD+GF GD+PF PD+PF Text
    1 sugar
    2 delta
    3 into
    4 many
    5 watches
    6 yankee
    7 whiskey
    8 Juliet

    LXC (L2 speaker):

    No. GD+GF GD+PF PD+PF Text
    1 Philip snatched at the letter which Gregson held out to him.
    2 He also contended that better confidence was established by carrying no weapons.
    3 It was over when he made his way through the ring of spectators.
    4 There is no need of further detail, now -- for you can understand.
    5 I want to die in it.
    6 She was even more beautiful than when I saw her, before.
    7 How does your wager look now.
    8 Hardly were our plans made public before we were met by powerful opposition.