Student Theses

Abstract

Non-negative matrix factorization (NMF) is an established method of performing audio source separation. Previous studies used NMF with supplementary systems to improve performance, but little has been done to investigate perceptual effects of NMF parameters. The present study aimed to evaluate two NMF parameters for speech enhancement: the short-time Fourier transform (STFT) window duration and divergence cost function. Two experiments were conducted: the first investigated the effect of STFT window duration on target speech intelligibility in a sentence keyword identification task. The second experiment had participants rate residual noise levels present in target speech using three different cost functions: the Euclidian Distance (EU), the Kullback-Leibler (KL) divergence, and the Itakura-Saito (IS) divergence. It was found that a 92.9 ms window duration produced the highest intelligibility scores, while the IS divergence produced significantly lower residual noise levels than the EU and KL divergences. Additionally, significant positive correlations were found between subjective residual noise scores and objective metrics from the Blind Source Separation (BSS_Eval) and Perceptual Evaluation method for Audio Source Separation (PEASS) toolboxes. Results suggest longer window durations, with increased frequency resolution, allow more accurate distinction between sources, improving intelligibility scores. Additionally, the IS divergence is able to more accurately approximate high frequency and transient components of audio, increasing separation of speech and noise. Correlation results suggest that using full bandwidth stimuli could increase reliability of objective measures.

Date

5-12-2020

First Advisor

Wesley A. Bulla

Second Advisor

Song Hui Chon

Third Advisor

Doyuen Ko

Fourth Advisor

Eric Tarr

Department

Audio Engineering

College

Entertainment and Music Business, Mike Curb College of

Document Type

Thesis

Degree

Master of Science in Audio Engineering (MSAE)

Degree Level

Master's

Degree Grantor

Belmont University

Keywords

audio engineering; Fourier transform; matrix; audio source separation; coding; STFT; speech; psychoacoustics

Share

COinS