RhythGen Demo Page

Utrecht University
Information and Computing Sciences

Abstract

Music is essential in video games; it enhances the immersion and engagement of players. However, players often disengage from game music due to excessive repetition or differing musical prefer- ences. This issue is especially problematic in serious games for therapeutic use, such as Musical At- tention Control Training (MACT). In these cases, sustained engagement and interaction with the game music are essential for the intervention. To improve therapeutic outcomes, these applications often use dynamic difficulty adjustment. By varying the music’s cognitive load, they stimulate at- tention and working memory. In rhythmic training, this is achieved by adjusting levels of syncopa- tion. Controllable automatic music generation may offer a scalable solution. It enables the creation of a greater variety of music that can be adapted through syncopation to a player’s abilities, ulti- mately improving engagement and intervention effects.

To this end, we introduce RhythGen, a novel transformer-based music generator that extends the pretrained NotaGen model with time-varying control over rhythmic complexity. The model is de- signed for serious games. Its primary control mechanism targets syncopation levels to adjust the music to a player’s training needs and abilities. We implement this via a custom, lightweight pro- cedure, which introduces time-varying control over rhythmic complexity through fine-tuning on 1,000 - 1,500 songs. Our procedure incorporates a variety of control representations and condition- ing mechanisms, including the novel attention modulation mechanism. We compare this mecha- nism against established methods such as in-attention and in-text conditioning. RhythGen is condi- tioned using one of several control representations, including note-density and syncopation labels, as well as weight profiles derived from Inner Metric Analysis (IMA).

Our evaluation explores the tradeoff between generation quality and control adherence across these conditioning methods. We find that models using in-attention conditioning with discrete synco- pation labels, targeted voice-specific labelling, and training generate music with specified rhyth- mic complexity. In contrast, in-text conditioning is largely ineffective. Our novel attention mod- ulation mechanism successfully controls note-density when used with IMA weight profiles, but fails to capture syncopation. Finally, a user study (n=40) empirically confirms that our in-attention- conditioned models can produce enjoyable music. This music has noticeable variations in rhythmic complexity and recognizable section boundaries, demonstrating RhythGen’s potential for use in MACT and beyond.

Section-wise control over rhythmic complexity

Demonstration of syncopation control, with target syncopation (s) annotated per bar.

Demonstration of syncopation control, with target syncopation (s) annotated per bar.

Section-wise control over note density

Demonstration of note density control, with target density (td) annotated per bar. A smaller value indicates higher note density

Demonstration of note density control, with target density (td) annotated per bar.

Highly Syncopated Right Hand Ragtime Sections

Right Hand Ragtime Sections with low syncopation

Poster