top of page

Everyone-Can-Sing: Zero-Shot Singing Voice Synthesis and Conversion with Speech Reference

This is my internship project at Adobe Research in the summer of 2023. This project has been written into a paper which is now under review. 

​​

​

Input:    1. Score, lyrics (specify which language), style

              2. 5-second speech audio of target (unseen target voice in training data)

Output: Singing in the target’s voice

​

Demo Example​​​

  • Input Speech Target 1 (Female voice)

  • Output Singing

    • ​An English Pop Song​

    • A Chinese Folk Song

​

  • Input Speech Target 2 (Female voice)

  • Output Singing

    • ​An English Pop Song

    • A Chinese Folk Song

​

  • ​Input  Speech Target 3 (Male voice)

  • Output Singing

    • ​A Chinese Folk Song

    • An English Pop Song

    • An Italian Opera Song

​

​

​

00:00 / 00:05
00:00 / 00:18
00:00 / 00:41
00:00 / 00:06
00:00 / 00:37
00:00 / 00:32
00:00 / 00:09
00:00 / 00:07
00:00 / 00:36
00:00 / 00:10
bottom of page