Joshi et al., 2016 - Google Patents

Modified mean and variance normalization: transforming to utterance-specific estimates

Joshi et al., 2016

Document ID
7249241965735153674
Author
Joshi V
Prasad N
Umesh S
Publication year
Publication venue
Circuits, systems, and signal processing

External Links

Snippet

Cepstral mean and variance normalization (CMVN) is an efficient noise compensation technique popularly used in many speech applications. CMVN eliminates the mismatch between training and test utterances by transforming them to zero mean and unit variance …
Continue reading at link.springer.com (other versions)

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations

Similar Documents

Publication Publication Date Title
Li et al. An overview of noise-robust automatic speech recognition
Grozdić et al. Whispered speech recognition using deep denoising autoencoder
Kwon et al. Phoneme recognition using ICA-based feature extraction and transformation
US20080294432A1 (en) Signal enhancement and speech recognition
US20150317990A1 (en) Deep scattering spectrum in acoustic modeling for speech recognition
Takiguchi et al. Robust feature extraction using kernel PCA
Wu et al. Exemplar-based voice conversion using joint nonnegative matrix factorization
Takiguchi et al. PCA-Based Speech Enhancement for Distorted Speech Recognition.
JP2019090930A (en) Sound source enhancement device, sound source enhancement learning device, sound source enhancement method and program
Joshi et al. Modified mean and variance normalization: transforming to utterance-specific estimates
Sadjadi et al. Mean Hilbert envelope coefficients (MHEC) for robust speaker recognition
Selva Nidhyananthan et al. Noise robust speaker identification using RASTA–MFCC feature with quadrilateral filter bank structure
Ahmad et al. Improving Children's Speech Recognition Through Explicit Pitch Scaling Based on Iterative Spectrogram Inversion.
Sainath et al. Raw multichannel processing using deep neural networks
Ghai et al. Pitch adaptive MFCC features for improving children’s mismatched ASR
Zhang et al. Distant-talking speaker identification by generalized spectral subtraction-based dereverberation and its efficient computation
Zouhir et al. A bio-inspired feature extraction for robust speech recognition
Obuchi et al. Normalization of time-derivative parameters using histogram equalization.
Sarangi et al. Improved speech-signal based frequency warping scale for cepstral feature in robust speaker verification system
Peng et al. Effective Phase Encoding for End-To-End Speaker Verification.
Han et al. Reverberation and noise robust feature compensation based on IMM
Shukla et al. A subspace projection approach for analysis of speech under stressed condition
Kaur et al. Optimizing feature extraction techniques constituting phone based modelling on connected words for Punjabi automatic speech recognition
Chatterjee et al. Auditory model-based design and optimization of feature vectors for automatic speech recognition
Pradhan et al. Speaker verification in sensor and acoustic environment mismatch conditions