Song et al., 2006 - Google Patents

Speaker attention system for mobile robots using microphone array and face tracking

Song et al., 2006

Document ID
17960305091201117657
Author
Song K
Hu J
Tsai C
Chou C
Cheng C
Liu W
Yang C
Publication year
Publication venue
Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006.

External Links

Snippet

This paper presents a real-time human-robot interface system (HRIS), which processes both speech and vision information to improve the quality of communication between human and an autonomous mobile robot. The HRIS contains a real-time speech attention system and a …
Continue reading at ieeexplore.ieee.org (other versions)

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search

Similar Documents

Publication Publication Date Title
US7536029B2 (en) Apparatus and method performing audio-video sensor fusion for object localization, tracking, and separation
CN111370014B (en) System and method for multi-stream object-speech detection and channel fusion
Okuno et al. Robot audition: Its rise and perspectives
Sasaki et al. Multiple sound source mapping for a mobile robot by self-motion triangulation
EP1643769B1 (en) Apparatus and method performing audio-video sensor fusion for object localization, tracking and separation
Chung et al. Who said that?: Audio-visual speaker diarisation of real-world meetings
US8538751B2 (en) Speech recognition system and speech recognizing method
Yamamoto et al. Enhanced robot speech recognition based on microphone array source separation and missing feature theory
Grondin et al. Sound event localization and detection using CRNN on pairs of microphones
Nakamura et al. Intelligent sound source localization and its application to multimodal human tracking
Ince et al. Ego noise suppression of a robot using template subtraction
Yamamoto et al. Improvement of robot audition by interfacing sound source separation and automatic speech recognition with missing feature theory
WO2007138503A1 (en) Method of driving a speech recognition system
Song et al. Speaker attention system for mobile robots using microphone array and face tracking
KR100822880B1 (en) Speaker Recognition System and Method through Audio-Video Based Sound Tracking in Intelligent Robot Environment
JP2006251266A (en) Audiovisual linkage recognition method and apparatus
Ince et al. Assessment of general applicability of ego noise estimation
Brueckmann et al. Adaptive noise reduction and voice activity detection for improved verbal human-robot interaction using binaural data
CN110992971A (en) Method for determining voice enhancement direction, electronic equipment and storage medium
KR20190059381A (en) Method for Device Control and Media Editing Based on Automatic Speech/Gesture Recognition
Kim et al. Auditory and visual integration based localization and tracking of humans in daily-life environments
Abutalebi et al. Performance improvement of TDOA-based speaker localization in joint noisy and reverberant conditions
Kim et al. A real-time sound source localization system for robotic vacuum cleaners with a microphone array
Wang et al. Real-time automated video and audio capture with multiple cameras and microphones
Petsatodis et al. Voice activity detection using audio-visual information