Science University Research Symposium (SURS)

Adding Guidance-Based Stylistic Controls to Flow Models for Coherent Audio Generation

Publication Date

2025

College

College of Sciences & Mathematics

Department

Chemistry and Physics, Department of

SURS Faculty Advisor

Dr. Scott Hawley

Presentation Type

Poster Presentation

Abstract

This research introduces a novel approach to audio generation that leverages guidance-based flow models to create audio. These models operate by learning latent representations of audio and decoding them. A project-based course was implemented where students explored generative audio using Stable Audio Open. Using chords, melody, and beat as core guidance mechanisms, a custom guidance model was trained using mix tracks from Belmont University’s audio archives to steer generation toward higher stylistic coherence. Structured listening tests were conducted to evaluate the perceptual impact of the guidance model on audio quality. Preliminary results suggest that guided generation improved musicality compared to baseline outputs. Beyond technical outcomes, this approach highlights the value of interactive, hands-on learning in audio AI education where students learn by deeply engaging with model training, evaluation, and creative iteration.

This document is currently not available here.

Share

COinS