Shuqi Dai
shuqi.
Dai
戴舒琪
Music & Techonology
Hi, I am a Ph.D. candidate from the Computer Science Department at Carnegie Mellon University, advised by Prof. Roger Dannenberg. My research emphasizes musical perspectives in technology innovation to transform how we listen to, understand, perform, and create music. I also aim to unearth the potential for music technology to benefit individuals and society by combining it with other areas, such as health and education. I have interned at companies such as Adobe, NVIDIA, Microsoft Research Asia, Google, Hulu, etc. Before joining CMU, I received my B.S. in Computer Science from Peking University in China in July 2018.
I am a professional Pipa (Chinese traditional instrument) player with more than 20 years of performance experience, tutored by top Pipa musician Prof. Yabo Pan (潘亚伯). I have received five years of formal Western music training with straight-A's at CMU School of Music. I also compose and sing.
I will complete my PhD study and graduate in Fall 2024. I am on the job market looking for opportunities in academia and industry.
Everyone-Can-Sing: Zero-Shot Singing Voice Synthesis and Conversion with Speech Reference
Singing Voice Synthesis, Singing Voice Conversion, Timbre Style Transfer, Zero-Shot Singing Synthesis
A unified framework for Singing Voice Synthesis (SVS) and Conversion (SVC), addressing the limitations of existing approaches in cross-domain SVS/SVC, poor output musicality, and scarcity of singing data. This framework enables control over multiple aspects, including language content based on lyrics, performance attributes based on a musical score, singing style and vocal techniques based on a selector, and voice identity based on a speech sample.
ExpressiveSinger: Multilingual and Multi-Style Score-based Singing Voice Synthesis with Expressive Performance Control
Singing Voice Synthesis, Expressive Performance Control, Singing
Style, Diffusion Models
An SVS framework that leverages a cascade of diffusion models to generate realistic singing across multiple languages, styles, and techniques from scores and lyrics.
SingStyle111: A Multilingual Singing Dataset With Style Transfer
Singing Dataset, Style Transfer, Multilingual
A large, studio-quality, monophonic singing dataset, covering111 songs(224 versions), 8 singers, 12.8 hours, 3 languages, and various singing styles (including creative improvisations). It involves style transfer demonstration: 80 songs covering at least two distinct singing styles performed by the same singer. Detailed human annotations such as phrase segmentation, lyrics phoneme-audio alignment, performance MIDI, score.
Deep Music Generation via Music Frameworks
Deep Learning, Hierarchical Music Structure, Controllability
With Music Frameworks (a hierarchical music structure representation) and new musical features, we combine music domain knowledge with deep learning, and factor music generation into sub-problems, which allows simpler models, requires less data and achieves high musicality.
Computational Study of Repetition and Structure in Popular Music
Music Repetition Structure, Music Perception, Deep Music Evaluation
What is missing in deep music generation? A study of repetition and structure in popular music, that illustrate important music construction principles by the analyses of two popular music datasets (Chinese and American). It offers challenges as well as opportunities for deep-learning music generation and suggest new formal music criteria and evaluation methods.
Automatic Analysis of Hierarchical Music Structure
Music Similarity, Segmentation, Repetition
Introduces new algorithms for identifying a two-level hierarchical music structure based on repetition. Automatically detected hierarchical repetition structures reveal significant interactions between structure and chord progressions, melody and rhythm. Different levels of hierarchy interact differently.
Personalized Stylistic Music Generation
Machine Learning, Music Domain Knowledge, Imitation, Repetition Structure
Designed a stylistic music generation system that is able to capture structure, melody, chord progression and bass styles from one or a few example music, and imitate the styles in a new piece using statistical machine learning models.
Mobile Orchestra (v1.0 Ringtone)
Personal Hackathon, SuperCollider, JavaScript, Open Sound Control
Developed mobile web app with SuperCollider that lets people use mobile gestures (speeds, ranges, directions) to control melodies (such as ringtones) in real-time as if they were playing musical instruments, adjusting pitch, volume, tempo, accompaniment and special effects etc., allowing group of people to form ringtone orchestra
Selected Publications
-
Shuqi Dai, Ming-Yu Liu, Rafael Valle, Siddharth Gururani, “ExpressiveSinger: Multilingual and Multi-Style Score-based Singing Voice Synthesis with Expressive Performance Control”, in Proceedings of the 32nd ACM International Conference on Multimedia (MM), Melbourne, Australia, 2024. [paper]
-
Shuqi Dai, Siqi Chen, Yuxuan Wu, Ruxin Diao, Roy Huang, and Roger B. Dannenberg, “SingStyle111: A Multilingual Singing Dataset With Style Transfer”, in Proceedings of the 24th International Society for Music Information Retrieval Conference, Milan, Italy, 2023. [paper][video][poster]
-
Shuqi Dai, Huiran Yu, and Roger B. Dannenberg, “What Is Missing In Deep Music Generation? A Study of Repetition and Structure in Popular Music”, in Proceedings of the 23rd International Society for Music Information Retrieval Conference, Bengaluru, India, 2022. [paper] [video] [poster]
-
Shuqi Dai, Zeyu Jin, Celso Gomes, Roger B. Dannenberg, "Controllable Deep Melody Generation via Hierarchical Music Structure Representation". In Proceedings of the 22nd International Society for Music Information Retrieval Conference, Online, 2021. [paper] [video] [poster] [demo]
-
Shuqi Dai, Xichu Ma, Ye Wang, Roger B. Dannenberg, "Personalized Popular Music Generation Using Imitation and Structure". arXiv preprint arXiv:2105.04709, 2021. [paper] [demo]
-
Shuqi Dai, Huan Zhang, Roger B. Dannenberg. "Automatic Analysis and Influence of Hierarchical Structure on Melody, Rhythm and Harmony in Popular Music". In Proceedings of the 2020 Joint Conference on AI Music Creativity and International Workshop on Music Metacreation (CSMC+MUME), Stockholm, Sweden, Oct 2020. [paper] [video] [code]
-
Z.Wang, K. Chen, J. Jiang, Y. Zhang, M. Xu, Shuqi Dai, X. Gu, G. Xia. "Pop909: A Pop-song Dataset for Music Arrangement Generation". In Proceedings of the 21st International Society for Music Information Retrieval Conference (ISMIR), Montéal, Canada, 2020. [paper]
-
Gus G. Xia, Shuqi Dai. "Music Style Transfer Issues: A Position Paper". In Proceedings of 6th International Workshop on Music Metacreation (MUME), Salamanca, Spain, June 2018. [paper]
-
Shuqi Dai, Gus G. Xia. "Computational Models For Common Pipa Techniques", best student paper, the 5th National Conference on Sound and Music Technology, Oct 2017.