Zero-shot Singing Synthesis

Everyone-Can-Sing: Zero-Shot Singing Voice Synthesis and Conversion with Speech Reference

This is my internship project at Adobe Research in the summer of 2023. This project has been written into a paper which is now under review.

Input: 1. Score, lyrics (specify which language), style

2. 5-second speech audio of target (unseen target voice in training data)

Output: Singing in the target’s voice

Demo Example

Input Speech Target 3 (Male voice)
Output Singing
- A Chinese Folk Song
- An English Pop Song
- An Italian Opera Song

00:00 / 00:05

00:00 / 00:18

00:00 / 00:41

00:00 / 00:06

00:00 / 00:37

00:00 / 00:32

00:00 / 00:09

00:00 / 00:07

00:00 / 00:36

00:00 / 00:10