Science University Research Symposium (SURS)
Adding Guidance-Based Stylistic Controls to Flow Models for Coherent Audio Generation
Publication Date
2025
College
College of Sciences & Mathematics
Department
Chemistry and Physics, Department of
SURS Faculty Advisor
Dr. Scott Hawley
Presentation Type
Poster Presentation
Abstract
This research introduces a novel approach to audio generation that leverages guidance-based flow models to create audio. These models operate by learning latent representations of audio and decoding them. A project-based course was implemented where students explored generative audio using Stable Audio Open. Using chords, melody, and beat as core guidance mechanisms, a custom guidance model was trained using mix tracks from Belmont University’s audio archives to steer generation toward higher stylistic coherence. Structured listening tests were conducted to evaluate the perceptual impact of the guidance model on audio quality. Preliminary results suggest that guided generation improved musicality compared to baseline outputs. Beyond technical outcomes, this approach highlights the value of interactive, hands-on learning in audio AI education where students learn by deeply engaging with model training, evaluation, and creative iteration.
Recommended Citation
Alsaad, Zain; Betapudi, Simeon; Blackwood, Brody; Cassar, Marco; Chau, Kenneth; Cruz, Jayden; Dant, Evan; Duwal, Kritan; Gorney, Rush; Goskie, Maxwell; Konkle, Elijah; Patel, Hari; Patel, Mayur; Pinter, Brady; White, Christopher; and Hawley, Scott PhD, "Adding Guidance-Based Stylistic Controls to Flow Models for Coherent Audio Generation" (2025). Science University Research Symposium (SURS). 293.
https://repository.belmont.edu/surs/293
