Joshi et al., 2016 - Google Patents

Modified mean and variance normalization: transforming to utterance-specific estimates

Joshi et al., 2016

Document ID: 7249241965735153674
Author: Joshi V; Prasad N; Umesh S
Publication year: 2016
Publication venue: Circuits, systems, and signal processing

External Links

Cited by

Snippet

Cepstral mean and variance normalization (CMVN) is an efficient noise compensation technique popularly used in many speech applications. CMVN eliminates the mismatch between training and test utterances by transforming them to zero mean and unit variance …

Continue reading at link.springer.com (other versions)

238000010606 normalization 0 title abstract description 59

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations

Similar Documents

Publication	Publication Date	Title
Li et al.	2014	An overview of noise-robust automatic speech recognition
Grozdić et al.	2017	Whispered speech recognition using deep denoising autoencoder
Kwon et al.	2004	Phoneme recognition using ICA-based feature extraction and transformation
US20080294432A1 (en)	2008-11-27	Signal enhancement and speech recognition
US20150317990A1 (en)	2015-11-05	Deep scattering spectrum in acoustic modeling for speech recognition
Takiguchi et al.	2006	Robust feature extraction using kernel PCA
Wu et al.	2015	Exemplar-based voice conversion using joint nonnegative matrix factorization
Takiguchi et al.	2007	PCA-Based Speech Enhancement for Distorted Speech Recognition.
JP2019090930A (en)	2019-06-13	Sound source enhancement device, sound source enhancement learning device, sound source enhancement method and program
Joshi et al.	2016	Modified mean and variance normalization: transforming to utterance-specific estimates
Sadjadi et al.	2012	Mean Hilbert envelope coefficients (MHEC) for robust speaker recognition
Selva Nidhyananthan et al.	2016	Noise robust speaker identification using RASTA–MFCC feature with quadrilateral filter bank structure
Ahmad et al.	2017	Improving Children's Speech Recognition Through Explicit Pitch Scaling Based on Iterative Spectrogram Inversion.
Sainath et al.	2017	Raw multichannel processing using deep neural networks
Ghai et al.	2015	Pitch adaptive MFCC features for improving children’s mismatched ASR
Zhang et al.	2014	Distant-talking speaker identification by generalized spectral subtraction-based dereverberation and its efficient computation
Zouhir et al.	2014	A bio-inspired feature extraction for robust speech recognition
Obuchi et al.	2003	Normalization of time-derivative parameters using histogram equalization.
Sarangi et al.	2020	Improved speech-signal based frequency warping scale for cepstral feature in robust speaker verification system
Peng et al.	2021	Effective Phase Encoding for End-To-End Speaker Verification.
Han et al.	2013	Reverberation and noise robust feature compensation based on IMM
Shukla et al.	2016	A subspace projection approach for analysis of speech under stressed condition
Kaur et al.	2016	Optimizing feature extraction techniques constituting phone based modelling on connected words for Punjabi automatic speech recognition
Chatterjee et al.	2010	Auditory model-based design and optimization of feature vectors for automatic speech recognition
Pradhan et al.	2012	Speaker verification in sensor and acoustic environment mismatch conditions