Unseen2unseen VC

Source Target Conversion
(Exp1) ε = ε* - 0.1

(Exp3) ε = ε* - 0.1

(Exp4) ε = ε* - 0.1

(Exp6) ε = ε* - 0.1

ε = ε*

ε = ε*

ε = ε*

ε = ε*

ε = ε* + 0.1

ε = ε* + 0.1

ε = ε* + 0.1

ε = ε* + 0.1

(Exp1) ε = ε* - 0.1

(Exp3) ε = ε* - 0.1

(Exp4) ε = ε* - 0.1

(Exp6) ε = ε* - 0.1

ε = ε*

ε = ε*

ε = ε*

ε = ε*

ε = ε* + 0.1

ε = ε* + 0.1

ε = ε* + 0.1

ε = ε* + 0.1

(Exp1) ε = ε* - 0.1

(Exp3) ε = ε* - 0.1

(Exp4) ε = ε* - 0.1

(Exp6) ε = ε* - 0.1

ε = ε*

ε = ε*

ε = ε*

ε = ε*

ε = ε* + 0.1

ε = ε* + 0.1

ε = ε* + 0.1

ε = ε* + 0.1

(Exp1) ε = ε* - 0.1

(Exp3) ε = ε* - 0.1

(Exp4) ε = ε* - 0.1

(Exp6) ε = ε* - 0.1

ε = ε*

ε = ε*

ε = ε*

ε = ε*

ε = ε* + 0.1

ε = ε* + 0.1

ε = ε* + 0.1

ε = ε* + 0.1

Unseen TTS

Ground-truth speech of text Reference Synthesized
(Exp1) ε = ε* - 0.1

(Exp3) ε = ε* - 0.1

(Exp4) ε = ε* - 0.1

(Exp6) ε = ε* - 0.1

ε = ε*

ε = ε*

ε = ε*

ε = ε*

ε = ε* + 0.1

ε = ε* + 0.1

ε = ε* + 0.1

ε = ε* + 0.1

(Exp1) ε = ε* - 0.1

(Exp3) ε = ε* - 0.1

(Exp4) ε = ε* - 0.1

(Exp6) ε = ε* - 0.1

ε = ε*

ε = ε*

ε = ε*

ε = ε*

ε = ε* + 0.1

ε = ε* + 0.1

ε = ε* + 0.1

ε = ε* + 0.1

(Exp1) ε = ε* - 0.1

(Exp3) ε = ε* - 0.1

(Exp4) ε = ε* - 0.1

(Exp6) ε = ε* - 0.1

ε = ε*

ε = ε*

ε = ε*

ε = ε*

ε = ε* + 0.1

ε = ε* + 0.1

ε = ε* + 0.1

ε = ε* + 0.1

(Exp1) ε = ε* - 0.1

(Exp3) ε = ε* - 0.1

(Exp4) ε = ε* - 0.1

(Exp6) ε = ε* - 0.1

ε = ε*

ε = ε*

ε = ε*

ε = ε*

ε = ε* + 0.1

ε = ε* + 0.1

ε = ε* + 0.1

ε = ε* + 0.1