shuqi.

Dai

戴舒琪

Music & Techonology

Hi, I am a Ph.D. candidate from the Computer Science Department at Carnegie Mellon University, advised by Prof. Roger Dannenberg. My research emphasizes musical perspectives in technology innovation to transform how we listen to, understand, perform, and create music. I also aim to unearth the potential for music technology to benefit individuals and society by combining it with other areas, such as health and education. I have interned at companies such as Adobe, NVIDIA, Microsoft Research Asia, Google, Hulu, etc. Before joining CMU, I received my B.S. in Computer Science from Peking University in China in July 2018.

I am a professional Pipa (Chinese traditional instrument) player with more than 20 years of performance experience, tutored by top Pipa musician Prof. Yabo Pan (潘亚伯). I have received five years of formal Western music training with straight-A's at CMU School of Music. I also compose and sing.

I will complete my PhD study and graduate in Fall 2024. I am on the job market looking for opportunities in academia and industry.

Thesis

Music&Performance

Seleceted Projects

Everyone-Can-Sing: Zero-Shot Singing Voice Synthesis and Conversion with Speech Reference

Singing Voice Synthesis, Singing Voice Conversion, Timbre Style Transfer, Zero-Shot Singing Synthesis

A unified framework for Singing Voice Synthesis (SVS) and Conversion (SVC), addressing the limitations of existing approaches in cross-domain SVS/SVC, poor output musicality, and scarcity of singing data. This framework enables control over multiple aspects, including language content based on lyrics, performance attributes based on a musical score, singing style and vocal techniques based on a selector, and voice identity based on a speech sample.

Details & Demo

Paper

ExpressiveSinger: Multilingual and Multi-Style Score-based Singing Voice Synthesis with Expressive Performance Control

Singing Voice Synthesis, Expressive Performance Control, Singing
Style, Diffusion Models

An SVS framework that leverages a cascade of diffusion models to generate realistic singing across multiple languages, styles, and techniques from scores and lyrics.

Details & Demo

Paper

SingStyle111: A Multilingual Singing Dataset With Style Transfer

Singing Dataset, Style Transfer, Multilingual

A large, studio-quality, monophonic singing dataset, covering111 songs(224 versions), 8 singers, 12.8 hours, 3 languages, and various singing styles (including creative improvisations). It involves style transfer demonstration: 80 songs covering at least two distinct singing styles performed by the same singer. Detailed human annotations such as phrase segmentation, lyrics phoneme-audio alignment, performance MIDI, score.

Details & Demo

Paper

Deep Music Generation via Music Frameworks

Deep Learning, Hierarchical Music Structure, Controllability

With Music Frameworks (a hierarchical music structure representation) and new musical features, we combine music domain knowledge with deep learning, and factor music generation into sub-problems, which allows simpler models, requires less data and achieves high musicality.

Details & Demo

Paper

Computational Study of Repetition and Structure in Popular Music

Music Repetition Structure, Music Perception, Deep Music Evaluation

What is missing in deep music generation? A study of repetition and structure in popular music, that illustrate important music construction principles by the analyses of two popular music datasets (Chinese and American). It offers challenges as well as opportunities for deep-learning music generation and suggest new formal music criteria and evaluation methods.

Video

Paper

Automatic Analysis of Hierarchical Music Structure

Music Similarity, Segmentation, Repetition

Introduces new algorithms for identifying a two-level hierarchical music structure based on repetition. Automatically detected hierarchical repetition structures reveal significant interactions between structure and chord progressions, melody and rhythm. Different levels of hierarchy interact differently.

Codebase

Paper

Personalized Stylistic Music Generation

Machine Learning, Music Domain Knowledge, Imitation, Repetition Structure

Designed a stylistic music generation system that is able to capture structure, melody, chord progression and bass styles from one or a few example music, and imitate the styles in a new piece using statistical machine learning models.

Demo

Paper

Human Computer Music Performance System

Serpent, Wxwidgets, ZMQ

HCMP is an emerging computer music system that can perform live music in association with human performers, with goal of creating highly autonomous artificial performers that can fill human roles

Demo

Codebase

Digitalization of Pipa Performance Techniques

Python, MIDI, MusicXML

Designed series of computational models for common pipa performance techniques using "analysis-by-synthesis" method, leading to much more realistic synthesized performances

Details

Mobile Orchestra (v1.0 Ringtone)

Personal Hackathon, SuperCollider, JavaScript, Open Sound Control

Developed mobile web app with SuperCollider that lets people use mobile gestures (speeds, ranges, directions) to control melodies (such as ringtones) in real-time as if they were playing musical instruments, adjusting pitch, volume, tempo, accompaniment and special effects etc., allowing group of people to form ringtone orchestra

Demo

Codebase

Projects

Selected Publications

Shuqi Dai, Ming-Yu Liu, Rafael Valle, Siddharth Gururani, “ExpressiveSinger: Multilingual and Multi-Style Score-based Singing Voice Synthesis with Expressive Performance Control”, in Proceedings of the 32nd ACM International Conference on Multimedia (MM), Melbourne, Australia, 2024. [paper]

Shuqi Dai, Siqi Chen, Yuxuan Wu, Ruxin Diao, Roy Huang, and Roger B. Dannenberg, “SingStyle111: A Multilingual Singing Dataset With Style Transfer”, in Proceedings of the 24th International Society for Music Information Retrieval Conference, Milan, Italy, 2023. [paper][video][poster]

Shuqi Dai, Huiran Yu, and Roger B. Dannenberg, “What Is Missing In Deep Music Generation? A Study of Repetition and Structure in Popular Music”, in Proceedings of the 23rd International Society for Music Information Retrieval Conference, Bengaluru, India, 2022. [paper] [video] [poster]

Shuqi Dai, Zeyu Jin, Celso Gomes, Roger B. Dannenberg, "Controllable Deep Melody Generation via Hierarchical Music Structure Representation". In Proceedings of the 22nd International Society for Music Information Retrieval Conference, Online, 2021. [paper] [video] [poster] [demo]

Shuqi Dai, Xichu Ma, Ye Wang, Roger B. Dannenberg, "Personalized Popular Music Generation Using Imitation and Structure". arXiv preprint arXiv:2105.04709, 2021. [paper] [demo]

Shuqi Dai, Huan Zhang, Roger B. Dannenberg. "Automatic Analysis and Influence of Hierarchical Structure on Melody, Rhythm and Harmony in Popular Music". In Proceedings of the 2020 Joint Conference on AI Music Creativity and International Workshop on Music Metacreation (CSMC+MUME), Stockholm, Sweden, Oct 2020. [paper] [video] [code]

Z.Wang, K. Chen, J. Jiang, Y. Zhang, M. Xu, Shuqi Dai, X. Gu, G. Xia. "Pop909: A Pop-song Dataset for Music Arrangement Generation". In Proceedings of the 21st International Society for Music Information Retrieval Conference (ISMIR), Montéal, Canada, 2020. [paper]

Gus G. Xia, Shuqi Dai. "Music Style Transfer Issues: A Position Paper". In Proceedings of 6th International Workshop on Music Metacreation (MUME), Salamanca, Spain, June 2018. [paper]

Shuqi Dai, Gus G. Xia. "Computational Models For Common Pipa Techniques", best student paper, the 5th National Conference on Sound and Music Technology, Oct 2017.