Unseen2unseen VC

Source Target Conversion
(YourTTS)

(Exp2) ε = ε* - 0.1

(Exp5) ε = ε* - 0.1



ε = ε*

ε = ε*



ε = ε* + 0.1

ε = ε* + 0.1

(YourTTS)

(Exp2) ε = ε* - 0.1

(Exp5) ε = ε* - 0.1



ε = ε*

ε = ε*



ε = ε* + 0.1

ε = ε* + 0.1

(YourTTS)

(Exp2) ε = ε* - 0.1

(Exp5) ε = ε* - 0.1



ε = ε*

ε = ε*



ε = ε* + 0.1

ε = ε* + 0.1

(YourTTS)

(Exp2) ε = ε* - 0.1

(Exp5) ε = ε* - 0.1



ε = ε*

ε = ε*



ε = ε* + 0.1

ε = ε* + 0.1

Unseen TTS

Ground-truth speech of text Reference Synthesized
(YourTTS)

(Exp2) ε = ε* - 0.1

(Exp5) ε = ε* - 0.1



ε = ε*

ε = ε*



ε = ε* + 0.1

ε = ε* + 0.1

(YourTTS)

(Exp2) ε = ε* - 0.1

(Exp5) ε = ε* - 0.1



ε = ε*

ε = ε*



ε = ε* + 0.1

ε = ε* + 0.1

(YourTTS)

(Exp2) ε = ε* - 0.1

(Exp5) ε = ε* - 0.1



ε = ε*

ε = ε*



ε = ε* + 0.1

ε = ε* + 0.1

(YourTTS)

(Exp2) ε = ε* - 0.1

(Exp5) ε = ε* - 0.1



ε = ε*

ε = ε*



ε = ε* + 0.1

ε = ε* + 0.1