Mask Estimation Using Phase Information and Inter-channel Correlation for Speech Enhancement

Date

2022

Authors

Sowjanya D.
S, Shoba
Kar, Asutosh
Mladenovic, Vladimir

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The most commonly used training target is masking-based approach which maps noisy speech to the time–frequency (T–F) unit and has a remarkable impact on the performance in the supervised learning algorithms. Traditional T–F masks like ideal ratio mask (IRM) demonstrate a strong performance but are limited to only the magnitude domain in enhancement. Though bounded IRM with phase constraint (BIRMP) includes phase difference but doesn’t exploit channel correlation, the proposed ratio mask (pRM) considers channel correlation but is computed only in the magnitude domain. This work proposes a new mask, i.e., phase correlation ideal ratio mask (PCIRM), which includes both inter-channel correlation and phase difference between the noisy speech (NS), noise (N) and clean speech (CS). Considering these factors increases the percentage of CS and readily decreases the percentage of unwanted noise in the speech components and conversely for the noise components making the mask more precise. The experimental results are conducted under different SNR levels using TIMIT dataset and NOISEX-92 dataset and also compared with the existing state-of-the-art approaches. The results prove that the proposed mask has higher performance than BIRMP and pRM in terms of speech quality and intelligibility.

Description

Keywords

Citation