검색 상세

Two-microphone based Noise Estimator for Hands-free Speech Enhancement in Composite Noise Environment

초록/요약

This dissertation presents an algorithm to suppress composite noise including both stationary background and nonstationary interference noise in two-microphone based speech enhancement system for robust hands-free speech communication. It is assumed that the background noise signal such as vehicle-engine noise and wind noise from an air-conditioner is stationary, and the interference noise signal such as another person’s voice is nonstationary. The beamforming approach provides robust noise suppression performance by generating a distortionless beam toward the desired speech signal while forming a null beam the interference noise signal. However, the noise cannot be completely eliminated when the beam is roughly steered or the adaptation is not fast enough to track noise variation. Moreover, the beamformer is not designed to handle the background noise. Hence, beamformer normally suffers from the residual noise and thus normally requires additional post-filter. However, the nonstationary characteristics of the residual noise cause the conventional post-filter to become ineffective by disturbing the estimation of the background noise power. To deal with the composite noise including both nonstationary interference noise and stationary background noise, a novel approach of estimating a transfer function ratio based pre-filtering algorithm within a generalized sidelobe canceller is proposed. This algorithm alone does not show considerable improvement, therefore, it is also possible to improve the performance of the existing speech enhancement system by combining the pre-filtering algorithm with voice activity detection and post-filtering algorithms. This dissertation proposes effective two-microphone based speech enhancement system using these algorithms, on the basis of the beamforming structure such as generalized sidelobe canceller. A pre-filtering algorithm can be separated in to the following three stages. The first stage estimates the transfer function ratio on the acoustic path from the interference noise source to the microphones, and the powers of the background noise components. Secondly, the estimated powers of the background noise components are used to execute spectral subtraction with respect to input signals. Finally, the estimated transfer function ratio is used for speech enhancement on the primary channel, and an adaptive filter reduces the interference noise components. Since the transfer function ratio of the interference noise is estimated when the desired speech signal is absent, the absence of the desired speech signal is determined by means of robust voice activity detection algorithm. In particular, a combination of double combined Fourier transform and subsequent envelope line fitting is proposed as a feature set such that the pattern of the feature envelope between speech and non-speech regions becomes more distinguishable. A post-filtering algorithm can be separated in to the following four stages. The first stage estimates the power spectral density of the residual background noise, which is based on the detection of nonstationary signal-dominant time-frequency bins at the generalized sidelobe canceller output after the pre-filtering process. Second, the speech-dominant time-frequency bins are identified among previously detected the nonstationary signal-dominant time-frequency bins. Third, power spectral densities of speech and residual interference noise are estimated. In the final stage, the bin-wise output signal-to-noise ratio is obtained with these power estimates and a Wiener post-filter is constructed to attenuate the residual noise. The performance of the proposed enhancement algorithm was compared with that of the conventional beamforming and post-filtering algorithms in various experimental conditions, with four objective quality measures: perceptual evaluation of speech quality, noise reduction, signal-to-noise ratio, and log-spectral distance, respectively.

more

목차

Abstract i
Contents v
List of Figures viii
List of Tables xi
List of Abbreviations xii
Chapter 1. Introduction 1
1.1. Motivation 1
1.2. Research Goals and Contributions 7
1.3. Organization of Dissertation 11
Chapter 2. A New Pre-Filtering Algorithm 13
2.1. Concept of the GSC using Transfer Function 13
2.2. Estimation of BMN and BNP 18
2.3. Suppression of BNP 21
2.4. Enhanced of the Primary Signal 22
Chapter 3. A New VAD Algorithm 24
3.1. Review of Conventional Feature Models for VAD 24
3.2. DCFT and Line Fitting Model for Feature Set 28
Chapter 4. A New Post-Filtering Algorithm 37
4.1. Problem Formulation 37
4.2. Spectral Classification 39
4.3. Detection of Nonstationary Signal-Dominant TFBs 41
4.4. PSD Estimation of Speech and Interference Noise 43
4.5. Construction of Wiener Post-Filter 47
Chapter 5. Experimental Results 49
5.1. Performance Evaluation of Pre-Filtering Algorithm 49
5.1.1. Measure for performance analysis 49
5.1.2. Databases and experimental environments 50
5.1.3. Results of speech enhancement test 51
5.2. Performance Evaluation of VAD 53
5.2.1. Databases and experimental environments 53
5.2.2. Results of utterance based speech segment detection test 54
5.2.3. Results of frame based speech and non-speech discrimination test 57
5.2.4. Robustness of proposed feature set 59
5.3. Performance Evaluation of Post-Filtering Algorithm 60
5.3.1. Measure for performance analysis 61
5.3.2. Databases and experimental environments 62
5.3.3. Results of speech enhancement test 63
5.3.4. Evaluation of spatial scenarios 66
5.4. Integrating of the system 68
5.4.1. Computational load of proposed system 69
5.4.2. Speech recognition results 70
Chapter 6. Conclusions and Future Works 72
6.1. Conclusions 72
6.2. Future Works 73
Appendix 75
Bibliography 85

more