KR102093030B1

KR102093030B1 - Smart projector and method for controlling thereof

Info

Publication number: KR102093030B1
Application number: KR1020180087688A
Authority: KR
Inventors: 박성흠; 김영훈; 강승원
Original assignee: (주)휴맥스
Priority date: 2018-07-27
Filing date: 2018-07-27
Publication date: 2020-03-24
Anticipated expiration: 2038-07-27
Also published as: WO2020022573A1; KR20200012414A

Abstract

본 발명은 스마트 디바이스를 이용한 미디어 콘텐츠 제어 방법에 관한 것으로, 본 발명의 일 양상에 따른 미디어 콘텐츠 제어 방법은, 외부로부터 수신되는 사운드 신호(sound signal)가 웨이크업 워드를 포함하는지 여부를 판단하는 웨이크업 워드 검출 상태(wake-up word detection state) 및 상기 사운드 신호에 포함된 보이스 커맨드를 인식하기 위한 리스닝 상태(listening state)를 포함하는 작동 상태를 가지는 스마트 디바이스에 의해 수행되는 미디어 콘텐츠 제어 방법으로, 미디어 콘테츠를 재생하는 중에 상기 리스닝 상태로 진입하는 때, 상기 미디어 콘텐츠의 음량을 조절하는 단계 및 상기 미디어 콘텐츠의 상기 음량이 조절된 오디오 데이터에 대응되는 텍스트 데이터를 상기 미디어 콘텐츠의 비디오 데이터와 함께 디스플레이하는 단계를 포함한다. The present invention relates to a method for controlling media content using a smart device. In the media content control method according to an aspect of the present invention, a sound signal received from the outside includes a wake-up word. A media content control method performed by a smart device having an operating state including a wake-up word detection state and a listening state for recognizing a voice command included in the sound signal, When entering the listening state while playing media content, adjusting the volume of the media content and text data corresponding to the volume adjusted audio data of the media content together with video data of the media content And displaying.

Description

Smart device and its control method {SMART PROJECTOR AND METHOD FOR CONTROLLING THEREOF}

본 발명은 스마트 디바이스 및 그 제어 방법에 관한 것으로, 보다 상세하게는 리스닝 모드에서 피드백의 출력을 제어하는 스마트 디바이스 및 그 제어 방법에 관한 것이다.The present invention relates to a smart device and a control method thereof, and more particularly, to a smart device and a control method for controlling the output of feedback in a listening mode.

음성 인식 기술의 인식 정확도가 향상됨에 따라 점차 '인공지능 음성비서(보이스 어시스턴트, voice assistant) 기능이 다양한 스마트 디바이스에 탑재되고 있다. 이러한 추세 속에서 보이스 어시스턴트 기능이 탑재된 스마트 스피커는 사물 인터넷(IoT: Internet of Things)의 확산과 맞물려 스마트 홈 분야의 핵심 디바이스로 자리잡아가고 있다. As the recognition accuracy of the speech recognition technology has improved, the 'artificial intelligence voice assistant (voice assistant) function' is gradually being mounted on various smart devices. In this trend, smart speakers equipped with the voice assistant function are becoming a key device in the smart home field in connection with the spread of the Internet of Things (IoT).

스마트 스피커 분야는 2014년 아마존이 최초의 스마트 스피커인 에코를 출시한 이후 구글, 애플, 페이스북과 같은 IT 공룡들뿐 아니라 다음카카오나 네이버 등의 국내 기업들도 자사 소프트웨어를 탑재한 스마트 스피커를 앞다퉈 출시하며 치열한 경쟁 양상을 보이고 있다. In the smart speaker field, since Amazon released the first smart speaker Echo in 2014, not only IT dinosaurs such as Google, Apple, and Facebook, but also domestic companies such as Daum Kakao and Naver have outsmarted the smart speaker with its own software. It is showing fierce competition with the launch.

스마트 스피커는 기본적으로 음성을 매개로 사용자로부터 명령을 받아 수행하거나 사용자와 대화를 나누므로 사용자가 가정이나 사무실 등의 실내 공간에서 자유로이 활동하면서 이용할 수 있는 장점을 가지지만, 오디오-타입의 정보가 가지는 정보량의 제약, 시각 정보 처리의 곤란성, 정보 출력의 비지속성 등으로 인해 몇몇 상황에서 사용자 편의성이 떨어질 수 있다. Since the smart speaker basically receives commands from the user via voice or performs a conversation with the user, the smart speaker has an advantage that the user can freely use in an indoor space such as a home or office, but the audio-type information has User convenience may be degraded in some situations due to limitations in information amount, difficulty in processing visual information, and non-persistence of information output.

본 발명의 일 과제는 미디어 콘텐츠 재생 중 노이즈 없이 사용자 음성을 수신하는 스마트 디바이스 및 그 제어 방법에 관한 것이다.One object of the present invention relates to a smart device and a control method for receiving a user voice without noise during media content playback.

본 발명의 일 과제는 재생 중인 미디어 콘텐츠에 방해없이 리스닝 모드를 수행하는 스마트 디바이스 및 그 제어 방법에 관한 것이다.One object of the present invention relates to a smart device and a control method for performing a listening mode without disturbing the media content being played.

본 발명이 해결하고자 하는 과제가 상술한 과제들로 제한되는 것은 아니며, 언급되지 아니한 과제들은 본 명세서 및 첨부된 도면으로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The problem to be solved by the present invention is not limited to the above-described problems, and the problems not mentioned can be clearly understood by those having ordinary knowledge in the technical field to which the present invention pertains from this specification and the accompanying drawings. will be.

본 발명의 일 양상에 따르면, 수신되는 사운드 신호(sound signal)가 웨이크업 워드를 포함하는지 여부를 판단하는 웨이크업 워드 검출 상태(wake-up word detection state) 및 상기 사운드 신호에 포함된 보이스 커맨드를 인식하기 위한 리스닝 상태(listening state)를 포함하는 작동 상태를 가지는 스마트 디바이스에 의해 수행되는 미디어 콘텐츠 제어 방법으로, 사용자에 의해 입력된 제1 보이스 커맨드에 관련된 미디어 콘텐츠를 재생하되, 상기 스마트 디바이스의 음성 출력 모듈을 통해 상기 미디어 콘텐츠의 오디오 데이터를 출력하고 상기 스마트 디바이스의 영상 출력 모듈을 통해 상기 미디어 콘텐츠의 비디오 데이터를 디스플레이함으로써 상기 미디어 콘텐츠를 재생하는 단계; 상기 미디어 콘텐츠의 재생을 개시한 후 상기 미디어 콘텐츠의 재생을 유지하면서 상기 웨이크업 워드 검출 상태로 진입하는 단계; 상기 웨이크업 워드 검출 상태인 상기 스마트 디바이스가 상기 미디어 콘텐츠의 재생 중 수신되는 제1 사운드 신호에 상기 웨이크업 워드의 포함 여부를 판단하는 단계; 상기 제1 사운드 신호에 상기 웨이크업 워드가 포함된 경우, 상기 미디어 콘텐츠의 재생을 유지하면서 상기 리스닝 상태로 진입하는 단계; 및 상기 리스닝 상태로 진입하는 때, 상기 미디어 콘텐츠와 관련된 제1 동작을 수행하는 단계;를 포함하고, 상기 제1 동작을 수행하는 단계는, 상기 오디오 데이터의 음량을 감소시키거나 또는 제거함으로써 상기 오디오 데이터의 음량을 조절하는 단계 및 상기 미디어 콘텐츠의 상기 음량이 조절된 오디오 데이터에 대응되는 텍스트 데이터를 상기 미디어 콘텐츠의 비디오 데이터와 함께 디스플레이하는 단계를 포함하는 스마트 디바이스의 제어 방법이 제공될 수 있다. According to an aspect of the present invention, a wake-up word detection state and a voice command included in the sound signal are determined to determine whether the received sound signal includes a wake-up word. A media content control method performed by a smart device having an operating state including a listening state for recognizing, playing media content related to a first voice command input by a user, wherein the voice of the smart device Playing the media content by outputting audio data of the media content through an output module and displaying video data of the media content through an image output module of the smart device; Entering the wakeup word detection state while maintaining the playback of the media content after starting the playback of the media content; Determining whether the wakeup word is included in a first sound signal received during playback of the media content by the smart device in the wakeup word detection state; If the wakeup word is included in the first sound signal, entering the listening state while maintaining playback of the media content; And when entering the listening state, performing a first operation related to the media content, wherein performing the first operation comprises: reducing or removing the volume of the audio data to reduce the audio content. A control method of a smart device may be provided, comprising adjusting the volume of data and displaying text data corresponding to the volume-adjusted audio data of the media content together with video data of the media content.

본 발명의 다른 양상에 따르면, 외부로부터 수신되는 사운드 신호(sound signal)가 웨이크업 워드를 포함하는지 여부를 판단하는 웨이크업 워드 검출 상태(wake-up word detection state) 및 상기 사운드 신호에 포함된 보이스 커맨드를 인식하기 위한 리스닝 상태(listening state)를 포함하는 작동 상태를 가지는 스마트 디바이스에 의해 수행되는 미디어 콘텐츠 제어 방법으로, 사용자에 의해 입력된 제1 보이스 커맨드에 관련된 미디어 콘텐츠를 재생하되, 상기 스마트 디바이스의 음성 출력 모듈을 통해 상기 미디어 콘텐츠의 오디오 데이터를 출력하고 상기 스마트 디바이스의 영상 출력 모듈을 통해 상기 미디어 콘텐츠의 비디오 데이터를 디스플레이함으로써 상기 미디어 콘텐츠를 재생하는 단계; 상기 미디어 콘텐츠의 재생을 개시한 후 상기 미디어 콘텐츠의 재생을 유지하면서 상기 웨이크업 워드 검출 상태로 진입하는 단계; 상기 웨이크업 워드 검출 상태인 상기 스마트 디바이스가 상기 미디어 콘텐츠의 재생 중 수신되는 제1 사운드 신호에 상기 웨이크업 워드의 포함 여부를 판단하는 단계; 상기 제1 사운드 신호에 상기 웨이크업 워드가 포함된 경우, 상기 미디어 콘텐츠의 재생을 유지하면서 상기 리스닝 상태로 진입하는 단계; 및 상기 리스닝 상태로 진입하는 때, 상기 미디어 콘텐츠의 종류에 따라 상기 미디어 콘텐츠의 재생의 일시 정지 또는 상기 미디어 콘텐츠의 오디오 데이터의 음량 조절 중 어느 하나의 동작을 수행하는 단계;를 포함하는 스마트 디바이스의 제어 방법이 제공될 수 있다. According to another aspect of the present invention, a wake-up word detection state to determine whether a sound signal received from the outside includes a wake-up word and a voice included in the sound signal A media content control method performed by a smart device having an operating state including a listening state for recognizing a command, the media content related to a first voice command input by a user is reproduced, Playing the media content by outputting audio data of the media content through the audio output module and displaying video data of the media content through the video output module of the smart device; Entering the wakeup word detection state while maintaining the playback of the media content after starting the playback of the media content; Determining whether the wakeup word is included in a first sound signal received during playback of the media content by the smart device in the wakeup word detection state; If the wakeup word is included in the first sound signal, entering the listening state while maintaining playback of the media content; And when entering the listening state, performing any one of a pause of playback of the media content or a volume control of audio data of the media content according to the type of the media content; A control method can be provided.

본 발명의 또 다른 양상에 따르면, 외부로부터 수신되는 사운드 신호(sound signal)가 웨이크업 워드를 포함하는지 여부를 판단하는 웨이크업 워드 검출 상태(wake-up word detection state) 및 상기 사운드 신호에 포함된 보이스 커맨드를 인식하기 위한 리스닝 상태(listening state)를 포함하는 작동 상태를 가지는 스마트 디바이스로서, 사운드 신호를 수신하는 음성 입력 모듈; 음성을 출력하는 음성 출력 모듈; 영상을 디스플레이하는 영상 출력 모듈; 및 사용자에 의해 입력된 제1 보이스 커맨드에 관련된 미디어 콘텐츠를 재생하되, 상기 스마트 디바이스의 음성 출력 모듈을 통해 상기 미디어 콘텐츠의 오디오 데이터를 출력하고 상기 스마트 디바이스의 영상 출력 모듈을 통해 상기 미디어 콘텐츠의 비디오 데이터를 디스플레이함으로써 상기 미디어 콘텐츠를 재생하고, 상기 미디어 콘텐츠의 재생을 개시한 후 상기 미디어 콘텐츠의 재생을 유지하면서 상기 웨이크업 워드 검출 상태로 진입하고, 상기 웨이크업 워드 검출 상태인 상기 스마트 디바이스가 상기 미디어 콘텐츠의 재생 중 상기 음성 입력 모듈을 통해 수신되는 제1 사운드 신호에 상기 웨이크업 워드의 포함 여부를 판단하고, 상기 제1 사운드 신호에 상기 웨이크업 워드가 포함된 경우, 상기 미디어 콘텐츠의 재생을 유지하면서 상기 리스닝 상태로 진입하고, 상기 리스닝 상태로 진입하는 때, 상기 미디어 콘텐츠와 관련된 제1 동작을 수행하는 콘트롤러;를 포함하되, 상기 콘트롤러는, 상기 오디오 데이터의 음량을 감소시키거나 또는 제거함으로써 상기 오디오 데이터의 음량을 조절하고 상기 미디어 콘텐츠의 상기 음량이 조절된 오디오 데이터에 대응되는 텍스트 데이터를 상기 미디어 콘텐츠의 비디오 데이터와 함께 디스플레이함으로써 상기 제1 동작을 수행하는 스마트 디바이스가 제공될 수 있다. According to another aspect of the present invention, a wake-up word detection state for determining whether a sound signal received from the outside contains a wake-up word and a wake-up word detection state included in the sound signal 1. A smart device having an operating state including a listening state for recognizing a voice command, comprising: a voice input module receiving a sound signal; A voice output module for outputting voice; An image output module for displaying an image; And playing media content related to the first voice command input by the user, outputting audio data of the media content through the audio output module of the smart device, and video of the media content through the video output module of the smart device. Playing the media content by displaying data, and starting to play the media content, enters the wake-up word detection state while maintaining playback of the media content, and the smart device in the wake-up word detection state is the During playback of the media content, it is determined whether the wakeup word is included in the first sound signal received through the voice input module, and when the wakeup word is included in the first sound signal, playback of the media content is performed. Retain while holding A controller that performs a first operation related to the media content when entering a ning state and entering the listening state, wherein the controller includes the audio data by reducing or removing the volume of the audio data. A smart device performing the first operation may be provided by adjusting the volume of and displaying text data corresponding to the volume-adjusted audio data of the media content together with video data of the media content.

본 발명의 과제의 해결 수단이 상술한 해결 수단들로 제한되는 것은 아니며, 언급되지 아니한 해결 수단들은 본 명세서 및 첨부된 도면으로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The solving means of the subject matter of the present invention is not limited to the above-mentioned solving means, and the solving means not mentioned will be clearly understood by those skilled in the art from the present specification and the accompanying drawings. Will be able to.

본 발명에 의하면, 스마트 디바이스가 미디어 콘텐츠 재생 중 리스닝 모드 진입 시 미디어 콘텐츠를 정지시키거나 미디어 콘텐츠의 음성 크기를 제거 또는 감소시킴으로써 미디어 콘텐츠의 재생으로 인한 음성이 사용자 음성에 노이즈로 섞이는 것을 방지할 수 있다. According to the present invention, when a smart device enters a listening mode while playing a media content, the media content is stopped or the voice volume of the media content is eliminated or reduced to prevent the voice of the media content from being mixed into the user's voice as noise. have.

본 발명에 의하면, 스마트 디바이스가 미디어 콘텐츠의 재생 중 리스닝 모드에 진입하면 재생 중인 미디어 콘텐츠의 음성 크기를 제거 또는 감소시키는 대신 자막을 표시함으로써 리스닝 모드에서도 사용자가 미디어 콘텐츠를 감상할 수 있다. According to the present invention, when the smart device enters the listening mode during the playback of the media content, the user can enjoy the media content even in the listening mode by displaying the subtitles instead of removing or reducing the voice volume of the playing media content.

본 발명의 효과가 상술한 효과들로 제한되는 것은 아니며, 언급되지 아니한 효과들은 본 명세서 및 첨부된 도면으로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확히 이해될 수 있을 것이다.The effects of the present invention are not limited to the above-described effects, and effects not mentioned will be clearly understood by those skilled in the art from the present specification and the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 스마트 디바이스의 블록도이다.
도 2 내지 도 6은 본 발명의 일 실시예에 따른 스마트 디바이스의 몇몇 구현예에 관한 도면이다.
도 7 및 도 8은 본 발명의 일 실시예에 따른 스마트 디바이스의 작동 모드의 운용의 예에 관한 도면이다.
도 9는 본 발명의 일 실시예에 따른 스마트 디바이스와 보이스 어시스턴트 서버 간의 통신에 관한 도면이다.
도 10은 본 발명의 일 실시예에 따른 미디어 콘텐츠 재생 중의 스탠바이 모드의 운용에 관한 도면이다.
도 11은 본 발명의 일 실시예에 따른 미디어 콘텐츠 재생 중의 리스닝 모드의 운용에 관한 도면이다.
도 12는 본 발명의 일 실시예에 따른 스마트 디바이스의 제어 방법의 제1 예의 순서도이다.
도 13은 본 발명의 일 실시예에 따른 스마트 디바이스의 제어 방법의 제2 예의 순서도이다.
도 14 및 도 15는 본 발명의 일 실시예에 따른 텍스트 데이터의 디스플레이에 관한 도면이다.
도 16은 본 발명의 일 실시예에 따른 스마트 디바이스의 제어 방법의 제3 예의 순서도이다.
도 17은 본 발명의 일 실시예에 따른 스마트 디바이스의 제어 방법의 제4 예의 순서도이다.
도 18은 본 발명의 일 실시예에 따른 스마트 디바이스의 제어 방법의 제5 예의 순서도이다. 1 is a block diagram of a smart device according to an embodiment of the present invention.
2 to 6 are diagrams for some implementations of a smart device according to an embodiment of the present invention.
7 and 8 are diagrams for an example of operation of an operation mode of a smart device according to an embodiment of the present invention.
9 is a diagram for communication between a smart device and a voice assistant server according to an embodiment of the present invention.
10 is a diagram for operation of a standby mode during media content playback according to an embodiment of the present invention.
11 is a diagram for operation of a listening mode during media content playback according to an embodiment of the present invention.
12 is a flowchart of a first example of a method for controlling a smart device according to an embodiment of the present invention.
13 is a flowchart of a second example of a method for controlling a smart device according to an embodiment of the present invention.
14 and 15 are diagrams for displaying text data according to an embodiment of the present invention.
16 is a flowchart of a third example of a method for controlling a smart device according to an embodiment of the present invention.
17 is a flowchart of a fourth example of a method for controlling a smart device according to an embodiment of the present invention.
18 is a flowchart of a fifth example of a method for controlling a smart device according to an embodiment of the present invention.

본 명세서에 기재된 실시예는 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 본 발명의 사상을 명확히 설명하기 위한 것이므로, 본 발명이 본 명세서에 기재된 실시예에 의해 한정되는 것은 아니며, 본 발명의 범위는 본 발명의 사상을 벗어나지 아니하는 수정예 또는 변형예를 포함하는 것으로 해석되어야 한다.Since the embodiments described in this specification are intended to clearly explain the spirit of the present invention to those skilled in the art to which the present invention pertains, the present invention is not limited by the embodiments described in the present specification. The scope of should be construed as including modifications or variations that do not depart from the spirit of the present invention.

본 명세서에서 사용되는 용어는 본 발명에서의 기능을 고려하여 가능한 현재 널리 사용되고 있는 일반적인 용어를 선택하였으나 이는 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자의 의도, 관례 또는 새로운 기술의 출현 등에 따라 달라질 수 있다. 다만, 이와 달리 특정한 용어를 임의의 의미로 정의하여 사용하는 경우에는 그 용어의 의미에 관하여 별도로 기재할 것이다. 따라서 본 명세서에서 사 용되는 용어는 단순한 용어의 명칭이 아닌 그 용어가 가진 실질적인 의미와 본 명세서의 전반에 걸친 내용을 토대로 해석되어야 한다.The terminology used in the present specification selects a general terminology that is currently widely used in consideration of the functions in the present invention, but this varies depending on the intention, customs, or the emergence of new technologies, etc. of those skilled in the art to which the present invention pertains. You can. However, if a specific term is defined and used in an arbitrary way, the meaning of the term will be described separately. Therefore, the terms used in this specification should be interpreted based on the actual meaning of the terms and the contents of the entire specification, not simply the names of the terms.

본 명세서에 첨부된 도면은 본 발명을 용이하게 설명하기 위한 것으로 도면에 도시된 형상은 본 발명의 이해를 돕기 위하여 필요에 따라 과장되어 표시된 것일 수 있으므로 본 발명이 도면에 의해 한정되는 것은 아니다.The drawings attached to the present specification are intended to easily describe the present invention, and the shapes illustrated in the drawings may be exaggerated and displayed as necessary to aid understanding of the present invention, and the present invention is not limited by the drawings.

본 명세서에서 본 발명에 관련된 공지의 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에 이에 관한 자세한 설명은 필요에 따라 생략하기로 한다.In the present specification, when it is determined that detailed descriptions of known configurations or functions related to the present invention may obscure the subject matter of the present invention, detailed descriptions thereof will be omitted as necessary.

또, 상기 리스닝 상태에 진입한 후 상기 웨이크업 워드 검출 상태로 복귀하는 때, 상기 미디어 콘텐츠와 관련된 제2 동작을 수행하는 단계;를 더 포함하고, 상기 제2 동작을 수행하는 단계는, 상기 미디어 콘텐츠의 상기 오디오의 데이터의 음량을 상기 제1 동작을 수행하기 전으로 조절하는 단계 및 상기 텍스트 데이터의 디스플레이를 종료하는 단계를 포함할 수 있다. In addition, when entering the listening state and then returning to the wake-up word detection state, performing a second operation related to the media content; and further comprising performing the second operation, the media The method may include adjusting the volume of the audio data of the content before performing the first operation and ending the display of the text data.

또 상기 리스닝 상태로 진입한 후 수신된 제2 사운드 신호에 포함된 제2 보이스 커맨드에 관련된 피드백을 출력하는 단계; 및 상기 피드백을 출력한 후 상기 미디어 콘텐츠와 관련된 제2 동작을 수행하는 단계;를 더 포함하고, 상기 제2 동작을 수행하는 단계는, 상기 미디어 콘텐츠의 상기 오디오의 데이터의 음량을 상기 제1 동작을 수행하기 전으로 조절하는 단계 및 상기 텍스트 데이터의 디스플레이를 종료하는 단계를 포함할 수 있다. In addition, after entering the listening state, outputting feedback related to a second voice command included in the received second sound signal; And after outputting the feedback, performing a second operation related to the media content. The performing of the second operation may further include determining the volume of the audio data of the media content in the first operation. It may include the step of adjusting before performing and the step of ending the display of the text data.

또 상기 리스닝 상태로 진입한 후 수신된 제2 사운드 신호에 포함된 제2 보이스 커맨드에 관련된 토크-백을 출력하는 단계; 및 상기 토크-백의 출력이 종료된 후 상기 미디어 콘텐츠와 관련된 제2 동작을 수행하는 단계;를 더 포함하고, 상기 제2 동작을 수행하는 단계는, 상기 미디어 콘텐츠의 상기 오디오의 데이터의 음량을 상기 제1 동작을 수행하기 전으로 조절하는 단계 및 상기 텍스트 데이터의 디스플레이를 종료하는 단계를 포함할 수 있다. Also, after entering the listening state, outputting a talk-back related to a second voice command included in the received second sound signal; And after the output of the talk-back ends, performing a second operation related to the media content. The performing of the second operation may further include determining the volume of the audio data of the media content. It may include adjusting before performing the first operation and ending displaying the text data.

또 상기 텍스트 데이터 중 최초로 디스플레이되는 텍스트 데이터는, 상기 음량이 조절되는 시점의 오디오 데이터에 대응되는 텍스트 데이터 및 상기 음량이 조절되는 시점으로부터 미리 정해진 시간 이전의 오디오 데이터에 대응되는 텍스트 데이터를 포함할 수 있다. In addition, the text data displayed first among the text data may include text data corresponding to audio data at a time when the volume is adjusted and text data corresponding to audio data before a predetermined time from a time when the volume is adjusted. have.

또 상기 텍스트 데이터 중 마지막으로 디스플레이되는 텍스트 데이터는, 상기 음량이 복귀하는 시점의 오디오 데이터에 대응되는 텍스트 데이터 및상기 음량이 복귀하는 시점으로부터 미리 정해진 시간 이후의 오디오 데이터에 대응되는 텍스트 데이터를 포함할 수 있다. In addition, the text data that is displayed last among the text data may include text data corresponding to audio data at the time when the volume returns, and text data corresponding to audio data after a predetermined time from the time when the volume returns. You can.

또 상기 어느 하나의 동작을 수행하는 단계에서, 상기 미디어 콘텐츠가 라이브 스트리밍인 경우 상기 오디오 데이터의 음량을 감소시키거나 또는 제거함으로써 상기 오디오 데이터의 음량을 조절할 수 있다. In addition, in the step of performing any one operation, when the media content is live streaming, the volume of the audio data can be adjusted by reducing or removing the volume of the audio data.

또 상기 오디오 데이터의 음량을 조절하는 때, 상기 미디어 콘텐츠의 상기 음량이 조절된 오디오 데이터에 대응되는 텍스트 데이터를 상기 미디어 콘텐츠의 비디오 데이터와 함께 디스플레이하는 단계;를 더 포함할 수 있다.Also, when adjusting the volume of the audio data, displaying text data corresponding to the audio data of which the volume of the media content is adjusted together with video data of the media content may further include.

또 상기 콘트롤러는, 상기 리스닝 상태에 진입한 후 상기 웨이크업 워드 검출 상태로 복귀하는 때, 상기 미디어 콘텐츠와 관련된 제2 동작을 수행하되, 상기 콘트롤러는, 상기 미디어 콘텐츠의 상기 오디오의 데이터의 음량을 상기 제1 동작을 수행하기 전으로 조절하고 상기 텍스트 데이터의 디스플레이를 종료함으로써 상기 제2 동작을 수행할 수 있다.In addition, when the controller enters the listening state and returns to the wake-up word detection state, the controller performs a second operation related to the media content, wherein the controller controls the volume of the audio data of the media content. The second operation may be performed by adjusting before performing the first operation and ending the display of the text data.

또 상기 콘트롤러는, 상기 리스닝 상태로 진입한 후 수신된 제2 사운드 신호에 포함된 제2 보이스 커맨드에 관련된 피드백을 출력하고, 상기 피드백을 출력한 후 상기 미디어 콘텐츠와 관련된 제2 동작을 수행하되, 상기 콘트롤러는, 상기 미디어 콘텐츠의 상기 오디오의 데이터의 음량을 상기 제1 동작을 수행하기 전으로 조절하고 상기 텍스트 데이터의 디스플레이를 종료함으로써 상기 제2 동작을 수행할 수 있다.In addition, the controller, after entering the listening state, outputs feedback related to a second voice command included in the received second sound signal, outputs the feedback, and then performs a second operation related to the media content, The controller may perform the second operation by adjusting the volume of the audio data of the media content before performing the first operation and ending the display of the text data.

또 상기 콘트롤러는, 상기 리스닝 상태로 진입한 후 수신된 제2 사운드 신호에 포함된 제2 보이스 커맨드에 관련된 토크-백을 출력하고, 상기 토크-백의 출력이 종료된 후 상기 미디어 콘텐츠와 관련된 제2 동작을 수행하되, 상기 콘트롤러는, 상기 미디어 콘텐츠의 상기 오디오의 데이터의 음량을 상기 제1 동작을 수행하기 전으로 조절하고 상기 텍스트 데이터의 디스플레이를 종료함으로써 상기 제2 동작을 수행할 수 있다.In addition, the controller outputs a talk-back related to the second voice command included in the received second sound signal after entering the listening state, and after the output of the talk-back ends, the second related to the media content. An operation may be performed, but the controller may perform the second operation by adjusting the volume of the audio data of the media content before performing the first operation and ending the display of the text data.

또 상기 텍스트 데이터 중 최초로 디스플레이되는 텍스트 데이터는, 상기 음량이 조절되는 시점의 오디오 데이터에 대응되는 텍스트 데이터 및 상기 음량이 조절되는 시점으로부터 미리 정해진 시간 이전의 오디오 데이터에 대응되는 텍스트 데이터를 포함할 수 있다.In addition, the text data displayed first among the text data may include text data corresponding to audio data at a time when the volume is adjusted and text data corresponding to audio data before a predetermined time from a time when the volume is adjusted. have.

또 상기 텍스트 데이터 중 마지막으로 디스플레이되는 텍스트 데이터는, 상기 음량이 복귀하는 시점의 오디오 데이터에 대응되는 텍스트 데이터 및상기 음량이 복귀하는 시점으로부터 미리 정해진 시간 이후의 오디오 데이터에 대응되는 텍스트 데이터를 포함할 수 있다.In addition, the text data that is displayed last among the text data may include text data corresponding to audio data at the time when the volume returns, and text data corresponding to audio data after a predetermined time from the time when the volume returns. You can.

1. 스마트 디바이스 1. Smart device

1.1. 개요 1.1. summary

이하에서는 본 발명의 일 실시예에 따른 스마트 디바이스(1000)에 관하여 설명한다. Hereinafter, a smart device 1000 according to an embodiment of the present invention will be described.

본 발명의 일 실시예에 따른 스마트 디바이스(1000)는 음성을 매체로 사용자와 상호작용할 수 있다. 구체적으로 스마트 디바이스(1000)는 사용자 음성을 수신하고, 사용자 음성에 포함된 보이스 커맨드에 의해 요청받은 피드백을 출력할 수 있다. 여기서, 스마트 디바이스(1000)는 오디오-타입 또는 비디오-타입의 피드백을 출력할 수 있다.The smart device 1000 according to an embodiment of the present invention may interact with a user through voice. Specifically, the smart device 1000 may receive a user voice and output feedback requested by a voice command included in the user voice. Here, the smart device 1000 may output audio-type or video-type feedback.

1.2. 용어 1.2. Terms

여기서는 본 발명의 일 실시예에 따른 스마트 디바이스(1000)에 대하여 더 설명하기에 앞서 본 명세서에서 사용되는 몇몇 용어들에 대해 정의하기로 한다. Here, before further describing the smart device 1000 according to an embodiment of the present invention, some terms used in the specification will be defined.

1.2.1. 스마트 디바이스 1.2.1. Smart device

본 명세서에서 스마트 디바이스(1000)는 보이스 어시스턴트 기능을 이용해 사용자와 음성을 매체로 상호작용할 수 있는 모든 종류의 디바이스를 포함할 수 있다. 스마트 디바이스(1000)는 대표적으로 스마트 스피커의 형태로 제공될 수 있으나, 이외에도 스마트 디바이스(1000)는 스마트 폰, 스마트 태블릿, 노트북, 스마트 텔레비전, 스마트 셋탑 박스, 스마트 디스플레이, 스마트 프로젝터 등의 형태로 제공되는 것도 가능하다. In the present specification, the smart device 1000 may include any kind of device capable of interacting with a user and a voice as a medium using a voice assistant function. The smart device 1000 may be representatively provided in the form of a smart speaker, but in addition, the smart device 1000 is provided in the form of a smart phone, smart tablet, laptop, smart television, smart set-top box, smart display, smart projector, etc. It is also possible.

구성 요소적인 측면에서 살펴보면, 스마트 디바이스(1000)는 음성을 매체로 한 사용자와의 상호작용을 위해 사용자 음성을 입력받기 위한 음성 입력 모듈(1200) 및 오디오-타입의 피드백을 출력하기 위한 음성 출력 모듈(1300)을 포함할 수 있다. 또 스마트 디바이스(1000)는 비디오-타입의 피드백을 출력하기 위한 디스플레이 모듈(1400)을 선택적으로 더 포함할 수 있다. 이외에도 스마트 디바이스(1000)는 보이스 어시스턴트 기능을 실현하기 위해 외부 기기(예를 들어, 보이스 어시스턴트 서버 등)와 통신하기 위한 통신 모듈(1020)를 비롯한 몇몇 구성 요소를 더 포함할 수 있으나, 이에 대한 구체적인 설명은 후술하기로 한다. Looking at the component aspect, the smart device 1000 includes a voice input module 1200 for receiving user voice for interaction with a user using voice as a medium, and a voice output module for outputting audio-type feedback. It may include (1300). Also, the smart device 1000 may optionally further include a display module 1400 for outputting video-type feedback. In addition, the smart device 1000 may further include some components including a communication module 1020 for communicating with an external device (eg, a voice assistant server, etc.) in order to realize the voice assistant function, but specifics thereof The description will be described later.

1.2.2. 보이스 어시스턴트 1.2.2. Voice assistant

본 발명의 일 실시예에 따른 스마트 디바이스(1000)는 보이스 어시스턴트 기능을 구현할 수 있다. 여기서, 보이스 어시스턴트 기능은 음성을 매체로 사용자와 스마트 디바이스(1000) 간의 상호작용을 가능케 하는 모든 기능을 포괄하는 개념이다.The smart device 1000 according to an embodiment of the present invention may implement a voice assistant function. Here, the voice assistant function is a concept encompassing all functions that enable interaction between the user and the smart device 1000 using voice as a medium.

본 명세서에서 보이스 어시스턴트 기능이 구현되는 일 예는 다음과 같을 수 있다. 스마트 디바이스(1000)가 수신한 사용자 음성을 보이스 어시스턴트 서버(10)로 전달한다. 보이스 어시스턴트 서버(10)는 사용자 음성에 포함된 보이스 커맨드를 해석하고 보이스 커맨드에 의해 요청받은 피드백에 관한 피드백 데이터를 획득하고 이를 스마트 디바이스(1000)에 전달한다. 스마트 디바이스(1000)는 보이스 어시스턴트 서버(10)로부터 전달받은 피드백 데이터에 기초하여 피드백을 출력한다. 이로써, 스마트 디바이스(1000)가 보이스 어시스턴트 기능을 구현할 수 있다. 다만, 이상에서는 스마트 디바이스(1000)가 보이스 어시스턴트 서버(10)와 협업하여 보이스 어시스턴트 기능을 구현하는 것으로 설명하였으나, 경우에 따라서는 스마트 디바이스(1000)가 로컬에서 스탠드 얼론으로 보이스 어시스턴트 기능을 실행하는 것도 가능할 수 있다.An example in which the voice assistant function is implemented in the present specification may be as follows. The user voice received by the smart device 1000 is delivered to the voice assistant server 10. The voice assistant server 10 interprets the voice command included in the user's voice, obtains feedback data regarding feedback requested by the voice command, and transmits it to the smart device 1000. The smart device 1000 outputs feedback based on feedback data received from the voice assistant server 10. In this way, the smart device 1000 can implement the voice assistant function. However, in the above, it has been described that the smart device 1000 implements the voice assistant function in cooperation with the voice assistant server 10, but in some cases, the smart device 1000 executes the voice assistant function locally with a standalone. It may also be possible.

1.2.3. 보이스 어시스턴트 서버 1.2.3. Voice assistant server

보이스 어시스턴트 서버(10)는 본 발명의 일 실시예에 따른 스마트 디바이스(1000)와 협업하여 보이스 어시스턴트 기능을 구현하는 서버를 총칭할 수 있다. 보이스 어시스턴트 서버(10)는 스마트 디바이스(1000)로부터 사용자 음성을 전달받아 사용자 음성에 포함된 보이스 커맨드를 추출하고, 보이스 커맨드를 해석하여 스마트 디바이스(1000)가 보이스 커맨드에 대한 응답으로 출력할 피드백에 관한 피드백 데이터를 생성하고, 이를 스마트 디바이스(1000)에 전달하는 기능을 담당할 수 있다. The voice assistant server 10 may collectively refer to a server that implements the voice assistant function in cooperation with the smart device 1000 according to an embodiment of the present invention. The voice assistant server 10 receives the user's voice from the smart device 1000, extracts the voice command included in the user's voice, interprets the voice command, and provides feedback to the smart device 1000 to output in response to the voice command. Responsible for generating feedback data and delivering it to the smart device 1000.

본 명세서에서 보이스 어시스턴트 서버(10)는 필요에 따라 물리적으로 및/또는 기능적으로 단일한 서버로 구현되거나 복수의 서버로 구현되는 것이 가능하다. 예를 들어, 보이스 어시스턴트 서버(10)는 스마트 디바이스로부터 전달받은 음성으로부터 보이스 커맨드를 추출하는 음성 인식 서버, 추출된 보이스 커맨드를 해석하는 인공 지능 서버, 피드백으로 제공하기 위한 멀티미디어 콘텐츠를 관리하는 서버 등과 같은 여러 개의 서버의 집합체일 수 있다. 즉, 보이스 어시스턴트 서버(10)는 위에 언급된 기능 및 그 외의 보이스 어시스턴트 기능의 구현에 필요한 기능들을 모두 구현하는 단일 서버 형태는 물론, 각 기능을 분담하는 서버들의 집합체 형태일 수 있는 것이다. In the present specification, the voice assistant server 10 may be physically and / or functionally implemented as a single server or a plurality of servers as needed. For example, the voice assistant server 10 includes a voice recognition server that extracts voice commands from voices received from a smart device, an artificial intelligence server that interprets the extracted voice commands, a server that manages multimedia content for providing feedback, and the like. It can be a collection of multiple servers that are the same. That is, the voice assistant server 10 may be in the form of a single server that implements all of the above-mentioned functions and functions necessary for the implementation of other voice assistant functions, as well as a collective form of servers sharing each function.

1.2.4. 사용자 음성 1.2.4. User voice

본 명세서에서 스마트 디바이스(1000)는 음성 입력 모듈(1200)을 통해 사용자 음성을 획득할 수 있다. 여기서, 사용자 음성이란 스마트 디바이스(1000)를 이용하는 사용자가 발화한 음성을 의미할 수 있다. 예를 들어, 스마트 디바이스(1000)의 음성 입력 모듈(1200)은 후술될 스탠바이 모드나 리스닝 모드에서 음성 입력 모듈(1200)을 통해 음성을 수신하므로, 스마트 디바이스(1000)의 작동 모드가 스탠바이 모드나 리스닝 모드인 때 사용자가 발화하면 스마트 디바이스(1000)는 사용자 음성을 획득할 수 있다. In this specification, the smart device 1000 may acquire a user voice through the voice input module 1200. Here, the user voice may mean a voice spoken by a user using the smart device 1000. For example, the voice input module 1200 of the smart device 1000 receives voice through the voice input module 1200 in a standby mode or a listening mode, which will be described later, so that the operating mode of the smart device 1000 is a standby mode or When the user speaks while in the listening mode, the smart device 1000 may acquire a user voice.

한편, 본 발명의 일 실시예에 따른 스마트 디바이스(1000)는 사용자 음성을 다른 음성과 구분하여 선택적으로 입력받을 수는 없으므로, 스마트 디바이스(1000)가 수신하는 음성에는 사용자 음성 이외의 음성이 포함되어 있을 수도 있다. 다른 음성의 예로는 스마트 디바이스(1000)에서 출력되는 피드백으로 인한 음성이나 기타 주변에서 발생하는 노이즈로 인한 음성 등이 포함될 수 있는데, 이하에서는 이들 다른 음성을 사용자 음성과 구분하여 기타 음성으로 지칭하기로 한다. 또 특히 기타 음성 중 스마트 디바이스(1000)에서 출력되는 피드백으로 인한 음성은 피드백 음성으로 지칭하기로 한다. On the other hand, since the smart device 1000 according to an embodiment of the present invention cannot selectively receive a user's voice separately from other voices, the voice received by the smart device 1000 includes voices other than the user's voice. It may be. Examples of other voices may include voices due to feedback output from the smart device 1000 or voices caused by noise generated in the surroundings, etc. Hereinafter, these other voices will be distinguished from user voices and referred to as other voices. do. In addition, among other voices, voices due to feedback output from the smart device 1000 will be referred to as feedback voices.

1.2.5. 보이스 커맨드 1.2.5. Voice command

사용자 음성에는 사용자가 스마트 디바이스(1000)에 특정한 동작의 수행/처리를 요구하는 보이스 커맨드가 포함될 수 있다. 여기서, 사용자 음성은 음향학적 관점에서 정의되는 용어로 해석될 수 있으며, 보이스 커맨드는 정보처리적 관점에서 정의되는 용어로 해석될 수 있다. 따라서, 본 명세서에서는 사용자 음성과 보이스 커맨드를 구분되는 개념으로 이용할 것이다. 그러나, 사용자 음성과 보이스 커맨드 간의 구별이 항상 명확한 것은 아니며 경우에 따라서는 그 구별의 실익이 실질적으로 없을 수 있으므로, 본 명세서에서 후술되는 몇몇 기재들과 특허 청구 범위에서는 당업자가 이해 가능한 범위 내에서 사용자 음성과 보이스 커맨드의 두 용어가 혼용될 수도 있음을 미리 밝혀둔다. The user voice may include a voice command for the user to perform / process a specific operation on the smart device 1000. Here, the user voice may be interpreted as a term defined from an acoustic point of view, and the voice command may be interpreted as a term defined from an information processing point of view. Therefore, in this specification, the user voice and the voice command will be used as distinct concepts. However, since the distinction between the user's voice and the voice command is not always clear, and in some cases, the practical benefit of the distinction may not be practical in some cases, some descriptions and patent claims to be described later in this specification are within the scope of understanding by those skilled in the art. It should be noted in advance that the two terms voice and voice command may be used interchangeably.

한편, 사용자가 반드시 보이스 커맨드를 통해서만 스마트 디바이스(1000)에 지시를 내려야만 하는 것은 아니다. 예를 들어, 사용자는 버튼 입력이나 터치, 제스처 등의 다양한 형태로 스마트 디바이스(1000)와 상호 작용하는 것도 가능하다.Meanwhile, the user does not necessarily have to give an instruction to the smart device 1000 only through a voice command. For example, the user may interact with the smart device 1000 in various forms such as button input, touch, and gesture.

1.2.6. 웨이크업 워드 1.2.6. Wake up word

본 발명의 일 실시예에 따른 스마트 디바이스(1000)는 수신되는 음성이 사용자가 스마트 디바이스(1000)를 이용하기 위해 발화한 사용자 음성인지 여부를 사전에 알 수 없거나 어렵기 때문에, 사용자로부터 특정한 동작의 수행/처리를 요구하는 사용자 음성을 수신하는 리스닝 모드로 진입하기 위한 예비적인 단계(phase)로 스탠바이 모드를 가질 수 있다. Since the smart device 1000 according to an embodiment of the present invention is difficult or unable to know in advance whether the received voice is a user voice spoken by the user to use the smart device 1000, it is difficult to perform a specific operation from the user. A standby mode may be used as a preliminary phase for entering a listening mode that receives a user voice requiring performance / processing.

웨이크업 워드(wake-up word)는 스탠바이 모드에서 리스닝 모드로 진입하기 위한 트리거를 의미할 수 있다. 구체적으로, 스마트 디바이스(1000)는 스탠바이 모드에서 수신되는 사용자 음성으로부터 웨이크업 워드를 검출할 수 있으며, 웨이크업 워드가 검출되면 리스닝 모드로 진입할 수 있다. 따라서, 웨이크업 워드란 스마트 디바이스(1000)가 스탠바이 모드로부터 리스닝 모드로 진입할 것을 지시하는 특수한 보이스 커맨드라고 볼 수 있다. 웨이크업 워드를 리스닝 모드로 진입하기 위한 트리거(trigger)로 이용하면, 사용자가 스마트 디바이스(1000)를 이용하려는 의도 없이 발화하는 사용자 음성에 대해서도 스마트 디바이스(1000)가 반응하는 오작동을 방지할 수 있다. The wake-up word may mean a trigger for entering the listening mode from the standby mode. Specifically, the smart device 1000 may detect a wake-up word from the user's voice received in the standby mode, and enter a listening mode when the wake-up word is detected. Accordingly, the wake-up word can be regarded as a special voice command instructing the smart device 1000 to enter the listening mode from the standby mode. When the wake-up word is used as a trigger for entering the listening mode, a malfunction that the smart device 1000 reacts to a user's voice spoken without the intention of the user to use the smart device 1000 can be prevented. .

관점에 따라서는 웨이크업 워드를 보이스 커맨드의 일종으로 해석하는 것도 가능하지만, 본 명세서에서는 설명의 편의를 위해 웨이크업 워드와 보이스 커맨드를 가급적 구분하여 언급할 것이다. 다만, 경우에 따라 본 명세서에서 웨이크업 워드가 보이스 커맨드의 한 종류인 것으로 설명할 수도 있음을 미리 밝혀둔다. 또한, 이하의 기재에서 웨이크업 워드는 때때로 핫 워드라는 용어로 대체될 수 있다. Depending on the viewpoint, it is also possible to interpret the wake-up word as a kind of voice command, but in this specification, for convenience of description, the wake-up word and the voice command will be referred to separately. However, in some cases, it is revealed in advance that the wakeup word may be described as one type of voice command in this specification. Also, in the following description, the wakeup word may sometimes be replaced with the term hot word.

본 발명의 일 실시예에 따른 스마트 디바이스(1000)에는 단일한 혹은 몇몇의 단어 내지는 어구(phrase)가 웨이크업 워드로 설정될 수 있다. 일반적인 스마트 디바이스(1000)의 이용 환경에서 웨이크업 워드는 사용자가 스마트 디바이스(1000)를 부르는 호칭(call name) 등으로 이용될 수 있다. 예를 들어, 웨이크업 워드는 스마트 디바이스(1000)의 세팅 프로세스에서 결정될 수 있으며, 'computer', 'hey speaker' 등과 같이 정해질 수 있다. In the smart device 1000 according to an embodiment of the present invention, a single word or a phrase or a phrase may be set as a wake-up word. In a general usage environment of the smart device 1000, the wake-up word may be used as a call name or the like that the user calls the smart device 1000. For example, the wakeup word may be determined in the setting process of the smart device 1000, and may be determined as 'computer', 'hey speaker', and the like.

후술하겠지만, 보이스 커맨드는 임의의 단어, 어구 내지는 문장의 형태를 취할 수 있으므로 보이스 어시스턴트 기능을 수행하기 위해 사용자 음성으로부터 보이스 커맨드를 인식하고 해석하는 과정에는 복잡한 연산이 요구된다. 따라서, 스마트 디바이스(1000)가 자체적으로 임의의 보이스 커맨드를 인식하기 어렵기 때문에 대개 사용자 음성으로부터 보이스 커맨드를 처리하는 과정은 보이스 어시스턴트 서버(10)에서 이루어진다. 이에 반해 웨이크업 워드는 단일한 단어 또는 몇몇 단어의 집합에 불과하므로 사용자 음성으로부터 웨이크업 워드를 검출하는 과정은 스마트 디바이스(1000)에서 로컬로 처리될 수 있다. As will be described later, since the voice command may take the form of an arbitrary word, phrase, or sentence, a complicated operation is required in the process of recognizing and interpreting the voice command from the user's voice in order to perform the voice assistant function. Therefore, since the smart device 1000 is difficult to recognize any voice command itself, the process of processing the voice command from the user's voice is usually performed in the voice assistant server 10. On the other hand, since the wakeup word is only a single word or a set of several words, the process of detecting the wakeup word from the user's voice may be processed locally in the smart device 1000.

1.2.7. 피드백 1.2.7. feedback

본 명세서에서 피드백이란 스마트 디바이스(1000)가 사용자로부터 요청받은 지시 내지는 요구, 요청에 대하여 스마트 디바이스(1000)가 출력하는 응답을 의미할 수 있다. In the present specification, the feedback may mean a response output by the smart device 1000 in response to an instruction, request, or request received from the user.

본 명세서에서 피드백은 오디오-타입 피드백과 비디오-타입 피드백을 포함할 수 있다. 여기서, 오디오-타입 피드백은 음성 출력 모듈(1300)을 통해 출력되는 청각적 피드백을 의미할 수 있으며 이하에서는 설명의 편의를 위해 오디오-타입의 피드백을 토크 백이라고 지칭하기로 한다. 또 여기서, 비디오-타입 피드백은 디스플레이 모듈(1400)을 통해 출력되는 시각적 피드백을 의미할 수 있으며, 이하에서는 설명의 편의를 위해 비디오-타입의 피드백을 디스플레이 백이라고 지칭하기로 한다. In this specification, feedback may include audio-type feedback and video-type feedback. Here, the audio-type feedback may mean audible feedback output through the voice output module 1300, and the audio-type feedback will be referred to as a talkback for convenience of description. In addition, the video-type feedback may mean visual feedback output through the display module 1400, and the video-type feedback will be referred to as a display back in the following description.

한편, 토크 백(talk-back)이라는 용어가 '대화(talk)'라는 단어를 포함하고 있지만, 토크 백이 반드시 대화 형태의 피드백만을 의미하는 것은 아니며 음악이나 효과음 등의 청각적 피드백을 모두 아우르는 것으로 해석되어야 한다. On the other hand, although the term talk-back includes the word 'talk', it is interpreted that the talk-back does not necessarily mean conversational feedback, but encompasses both auditory feedback such as music and sound effects. Should be.

또한, 디스플레이 백(talk-back)이 비디오-타입 피드백을 지칭하는 것이지만, 반드시 동화상 형태여야만 하는 것은 아니며 정지 영상까지 포함할 수 있다. 나아가 디스플레이 백은 시각적 피드백으로 토크 백은 청각적 피드백으로 설명하였으나, 이하에서는 설명의 편의를 위해 디스플레이 백이 시각적 피드백에 청각적 피드백이 더해진 것까지도 포괄하는 것으로 해석될 수 있다. 예를 들어, 본 명세서에서는 스마트 디바이스(1000)가 영화나 게임과 같이 시청각적 경험을 제공하는 멀티미디어 콘텐츠를 재생하는 것에 대해 스마트 디바이스(1000)가 디스플레이 백을 출력하는 것으로 설명할 수도 있다. 예를 들어, 디스플레이 백에는 TV 프로그램, 영화나 뮤직 비디오, 유튜브(youtube) 스트리밍 서비스 등이 포함될 수 있다. In addition, although the display-back refers to video-type feedback, it is not necessarily in the form of a moving image and may include still images. Furthermore, although the display back is described as visual feedback and the talk back is audible feedback, for convenience of description, the display back may be interpreted as including visual feedback plus auditory feedback. For example, in the present specification, the smart device 1000 may describe that the smart device 1000 outputs a display back for playing multimedia content that provides an audio-visual experience such as a movie or a game. For example, the display back may include a TV program, a movie or music video, a YouTube streaming service, and the like.

1.3. 스마트 디바이스의 구성 요소 1.3. Components of smart devices

이하에서는 본 발명의 일 실시예에 따른 스마트 디바이스(1000)의 구성 요소에 관하여 설명한다. Hereinafter, components of the smart device 1000 according to an embodiment of the present invention will be described.

도 1은 본 발명의 일 실시예에 따른 스마트 디바이스(1000)의 블록도이다.1 is a block diagram of a smart device 1000 according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 스마트 디바이스(1000)는 음성 입력 모듈(1200), 음성 출력 모듈(1300), 통신 모듈(1020), 메모리(1040) 및 콘트롤러(1060)를 포함할 수 있다. 스마트 디바이스(1000)는 음성 입력 모듈(1200)을 통해 사용자 음성을 수신하고, 통신 모듈(1020)을 통해 보이스 어시스턴트 서버(10)에 사용자 음성을 송신하고, 사용자 음성에 포함된 보이스 커맨드에 의해 요청된 피드백 데이터를 수신하고, 음성 출력 모듈(1300)을 통해 토크 백을 출력할 수 있으며, 콘트롤러(1060)는 상술한 과정에 필요한 각 모듈을 제어하거나 각종 정보를 처리할 수 있으며, 메모리(1040)에는 각종 정보가 저장될 수 있다. 또 스마트 디바이스(1000)는 디스플레이 백을 출력하기 위해 디스플레이 모듈(1400)을 선택적으로 더 포함할 수 있다. 또 스마트 디바이스(1000)는 디스플레이 백의 방향을 조절하기 위한 구동 모듈(1500)을 선택적으로 더 포함할 수 있다. The smart device 1000 according to an embodiment of the present invention may include a voice input module 1200, a voice output module 1300, a communication module 1020, a memory 1040, and a controller 1060. The smart device 1000 receives the user voice through the voice input module 1200, transmits the user voice to the voice assistant server 10 through the communication module 1020, and requests by the voice command included in the user voice The received feedback data can be received, and a talkback can be output through the voice output module 1300, and the controller 1060 can control each module required for the above-described process or process various information, and the memory 1040 Various information can be stored in. In addition, the smart device 1000 may optionally further include a display module 1400 to output a display back. In addition, the smart device 1000 may optionally further include a driving module 1500 for adjusting the direction of the display back.

음성 입력 모듈(1200)은 사용자 음성을 비롯한 다양한 음성을 수신할 수 있다. 음성 입력 모듈(1200)은 단일한 또는 복수의 마이크(1202)를 제공될 수 있다. 음성 입력 모듈(1200)이 복수의 마이크(1202)로 제공되는 경우, 음성 입력 모듈(1200)은 복수의 마이크(1202)가 소정의 형태를 이루고 배치되는 마이크 어레이(1204)로 제공될 수 있다. The voice input module 1200 may receive various voices including a user voice. The voice input module 1200 may be provided with a single or multiple microphones 1202. When the voice input module 1200 is provided as a plurality of microphones 1202, the voice input module 1200 may be provided as a microphone array 1204 in which a plurality of microphones 1202 are formed and disposed in a predetermined shape.

음성 출력 모듈(1300)은 토크 백을 비롯한 각종 소리를 출력할 수 있다. 음성 출력 모듈(1300)은 단일한 또는 복수의 스피커(1302)로 제공될 수 있다. 음성 출력 모듈(1300)은 필요에 따라 무지향성(omnidireictional)을 갖는 구조로 배치될 수 있다. 또는 음성 출력 모듈(1300)은 필요에 따라 지향성으로 소리를 출력하는 구조로 배치될 수도 있다. The audio output module 1300 may output various sounds including a talkback. The audio output module 1300 may be provided as a single or multiple speakers 1302. The audio output module 1300 may be arranged in a structure having omnidireictional as necessary. Alternatively, the audio output module 1300 may be arranged in a structure that outputs sound in a directional manner as needed.

디스플레이 모듈(1400)은 디스플레이 백을 비롯한 각종 영상을 출력할 수 있다. 디스플레이 모듈(1400)은 디스플레이 패널(1420) 또는 프로젝터(1440) 등의 형태로 구현될 수 있다. The display module 1400 may output various images including a display back. The display module 1400 may be implemented in the form of a display panel 1420 or a projector 1440.

또 디스플레이 모듈(1400)의 방향 또는 디스플레이 모듈(1400)에 의해 출력되는 디스플레이 백의 방향은 구동 모듈(1500)에 의해 조절될 수 있다. 일 예로, 스마트 디바이스(1000)는 프로젝터(1440)를 통해 디스플레이 백을 출력하며, 디스플레이 백이 출력되는 디스플레이 영역 내지는 프로젝션 방향은 구동 모듈(1500)에 의해 사용자 주변으로 동적으로 조절될 수 있다. 한편, 구동 모듈(1500)은 현재 디스플레이 백의 방향을 감지하기 위한 방향 감지 센서(1560)를 더 포함할 수도 있다. 예를 들어, 방향 감지 센서(1560)는 프로젝터(1440)가 배치된 방향을 센싱할 수 있다. In addition, the direction of the display module 1400 or the direction of the display back output by the display module 1400 may be adjusted by the driving module 1500. For example, the smart device 1000 outputs a display back through the projector 1440, and the display area or projection direction in which the display back is output may be dynamically adjusted around the user by the driving module 1500. Meanwhile, the driving module 1500 may further include a direction sensing sensor 1560 for sensing the direction of the current display back. For example, the direction detection sensor 1560 may sense the direction in which the projector 1440 is disposed.

통신 모듈(1020)은 외부 기기와 통신을 수행한다. 예를 들어, 스마트 디바이스(1000)는 통신 모듈(1020)을 통해 보이스 어시스턴트 서버(10)와 정보를 송수신할 수 있는데, 보다 구체적으로는 스마트 디바이스(1000)는 통신 모듈(1020)을 통해 보이스 어시스턴트 서버(10)로 사용자 음성을 전송하고, 통신 모듈(1020)을 통해 보이스 어시스턴트 서버(10)로부터 피드백 데이터를 수신할 수 있다. The communication module 1020 communicates with an external device. For example, the smart device 1000 may transmit and receive information to and from the voice assistant server 10 through the communication module 1020, and more specifically, the smart device 1000 through the communication module 1020 voice assistant The user voice may be transmitted to the server 10, and feedback data may be received from the voice assistant server 10 through the communication module 1020.

통신 모듈(1020)은 크게 유선 타입과 무선 타입으로 나뉠 수 있다. 유선 타입과 무선 타입은 각각의 장단점을 가지며, 스마트 디바이스(1000)는 유선 타입의 통신 모듈(1020) 및/또는 무선 타입의 통신 모듈(1020)이 구비될 수도 있다. The communication module 1020 can be roughly divided into a wired type and a wireless type. The wired type and the wireless type have advantages and disadvantages, and the smart device 1000 may be provided with a wired type communication module 1020 and / or a wireless type communication module 1020.

유선 타입의 경우에는 유선 LAN(Local Area Network), USB(Universal Serial Bus) 통신 등이 대표적인 예이나 그 외의 다른 방식도 가능하다. 무선 타입의 경우에는 이동 통신 방식, 블루투스(Bluetooth)나 직비(Zigbee)와 같은 WPAN(Wireless Personal Area Network) 계열의 통신 방식, 와이파이(Wi-Fi) 같은 WLAN(Wireless Local Area Network) 계열의 통신 방식 및 그 외의 알려진 다른 통신 방식을 이용하는 것도 가능하다. 물론, 유/무선 통신 방식이 상술한 예로 한정되는 것은 아님을 미리 밝혀둔다. In the case of the wired type, a wired local area network (LAN), universal serial bus (USB) communication, and the like are representative examples, but other methods are also possible. In the case of the wireless type, a mobile communication method, a wireless personal area network (WPAN) communication method such as Bluetooth or Zigbee, and a wireless local area network (WLAN) communication method such as Wi-Fi And other known communication methods. Of course, it is revealed in advance that the wired / wireless communication method is not limited to the above-described example.

메모리(1040)는 각종 정보를 저장할 수 있다. 메모리(1040)는 데이터를 임시적으로 또는 반영구적으로 저장할 수 있다. 메모리(1040)의 예로는 하드 디스크(HDD: Hard Disk Drive), SSD(Solid State Drive), 플래쉬 메모리(flash memory), 롬(ROM: Read-Only Memory), 램(RAM: Random Access Memory) 등이 있을 수 있다. 메모리(1040)는 스마트 디바이스(1000)에 내장되는 형태나 피드백 디바이스에 탈부착 가능한 형태로 제공될 수 있다. The memory 1040 can store various types of information. The memory 1040 may temporarily or semi-permanently store data. Examples of the memory 1040 include a hard disk drive (HDD), solid state drive (SSD), flash memory, read-only memory (ROM), and random access memory (RAM). This can be. The memory 1040 may be provided in a form built into the smart device 1000 or detachably attached to a feedback device.

메모리(1040)에는 스마트 디바이스(1000)를 구동하기 위한 운용 프로그램(OS: Operating System)이나 스마트 디바이스(1000)에 인스톨되는 각종 어플리케이션, 스마트 디바이스(1000)의 동작에 필요하거나 이용되는 각종 데이터가 저장될 수 있다.The memory 1040 stores an operating program (OS: Operating System) for driving the smart device 1000, various applications installed in the smart device 1000, and various data necessary or used for the operation of the smart device 1000. Can be.

콘트롤러(1060)는 스마트 디바이스(1000)의 전반적인 동작을 제어를 수행할 수 있다. 예를 들어, 스마트 디바이스(1000)가 디스플레이 백을 출력하는 것은 콘트롤러(1060)가 디스플레이 모듈(1400)을 제어함에 따라 수행될 수 있으며, 스마트 디바이스(1000)가 보이스 어시스턴트 서버(10)와 통신하는 것은 콘트롤러(1060)가 통신 모듈(1020)을 제어함에 따라 수행될 수 있다. The controller 1060 may control the overall operation of the smart device 1000. For example, the smart device 1000 outputting the display back may be performed as the controller 1060 controls the display module 1400, and the smart device 1000 communicates with the voice assistant server 10. This may be performed as the controller 1060 controls the communication module 1020.

콘트롤러(1060)의 제어 동작은 콘트롤러(1060)가 각종 정보의 연산 및 처리를 수행함에 따라 이루어질 수 있다. 이를 위해 콘트롤러(1060)는 하드웨어나 소프트웨어 또는 이들의 조합에 따라 컴퓨터나 이와 유사한 장치로 구현될 수 있다. 하드웨어적으로 콘트롤러(1060)는 전기적인 신호를 처리하여 제어 기능을 수행하는 전자 회로 형태로 제공될 수 있으며, 소프트웨어적으로는 하드웨어적 회로를 구동시키는 프로그램이나 코드 형태로 제공될 수 있다. The control operation of the controller 1060 may be performed as the controller 1060 performs calculation and processing of various information. To this end, the controller 1060 may be implemented as a computer or similar device according to hardware or software or a combination thereof. In hardware, the controller 1060 may be provided in the form of an electronic circuit that processes an electrical signal and performs a control function. In software, the controller 1060 may be provided in the form of a program or code that drives a hardware circuit.

콘트롤러(1060)는 단일한 물리적 구성을 가질 수 있지만, 경우에 따라서는 물리적으로 분리된 형태로 제공될 수도 있다. 다시 말해, 콘트롤러(1060)는 단일한 칩으로 제조되는 것도 가능하지만, 물리적으로 분산 배치되는 복수의 칩 내지는 기판으로 제공될 수도 있으며 이때에는 각 분리된 콘트롤러(1060) 간의 통신 인터페이스가 연결되어 있을 수도 있다. The controller 1060 may have a single physical configuration, but in some cases, may be provided in a physically separated form. In other words, the controller 1060 may be made of a single chip, but may be provided as a plurality of chips or substrates that are physically distributed, and at this time, a communication interface between each separated controller 1060 may be connected. have.

한편, 이하의 설명에서 스마트 디바이스(1000)가 수행하는 동작들은 별도의 언급이 없는 경우 콘트롤러(1060)에 의해 수행되는 것으로 해석될 수 있음을 밝혀둔다. Meanwhile, in the following description, it is revealed that operations performed by the smart device 1000 may be interpreted as being performed by the controller 1060 unless otherwise specified.

1.4. 스마트 디바이스의 구현예 1.4. Smart device implementation

이하에서는 본 발명의 일 실시예에 따른 스마트 디바이스(1000)의 몇몇 구현예에 관하여 설명한다. 도 2 내지 8은 본 발명의 일 실시예에 따른 스마트 디바이스(1000)의 구현예들에 관한 사시도이다.Hereinafter, some implementations of the smart device 1000 according to an embodiment of the present invention will be described. 2 to 8 are perspective views of implementations of the smart device 1000 according to an embodiment of the present invention.

일 예에 따르면, 스마트 디바이스(1000)는 도 2에 도시된 바와 같이 테이블이나 플로어 등과 같은 수평면에 거치되어 이용되는 스마트 스피커 형태(1000a)로 제공될 수 있다. According to an example, the smart device 1000 may be provided in a form of a smart speaker 1000a mounted on a horizontal surface, such as a table or floor, as illustrated in FIG. 2.

본 예에서, 하우징(1100)은 수평면에 놓이는 하면(1101), 하면(1101)과 대응하는 상면(1102) 및 상기 하면(1101)과 상면(1102)을 연결하는 측면(1103)을 포함할 수 있다. 도 2는 하우징(1100)을 원 기둥 형상으로 도시하고 있으나, 하우징(1100)은 이외에도 다각 기둥, 상면(1102)이 경사진 테이퍼면인 원 또는 다각 기둥, 원 또는 다각 뿔 등 다양한 형상일 수 있다. In this example, the housing 1100 may include a lower surface 1101 lying on a horizontal surface, an upper surface 1102 corresponding to the lower surface 1101, and a side surface 1103 connecting the lower surface 1101 and the upper surface 1102. have. 2 shows the housing 1100 in a circular column shape, but the housing 1100 may have various shapes other than a polygonal column, a circular or polygonal column having an inclined top surface 1102, a circle or polygonal horn, and the like. .

본 예에서 스마트 디바이스(1000)에는 스마트 디바이스(1000)의 작동 모드를 지시하는 인디케이터(1106)가 구비될 수 있다. 에를 들어, 인디케이터(1106)는 작동 모드에 따라 특정 색상이나 특정 패턴을 표시하는 램프 등일 수 있다. 도 2에는 인디케이터(1106)가 하우징(1100)의 측면(1103)의 테두리를 둘러싸도록 배치되는 것으로 도시했으나, 이로 인해 인디케이터(1106)의 형상이나 위치가 도 2로 제한되는 것은 아니다.In this example, the smart device 1000 may be provided with an indicator 1106 indicating the operation mode of the smart device 1000. For example, the indicator 1106 may be a lamp or the like that displays a specific color or a specific pattern depending on the operation mode. In FIG. 2, the indicator 1106 is shown to be arranged to surround the rim of the side surface 1103 of the housing 1100, but this does not limit the shape or position of the indicator 1106 to FIG. 2.

본 예에서 스마트 디바이스(1000)에 복수의 마이크(1202)를 포함하는 마이크 어레이(1204)가 구비될 수 있다. 예를 들어, 복수의 마이크(1202)는 도 2에 도시된 바와 같이 하우징(1100)의 상면(1102)에 방사형으로 배치되거나 하우징(1100)의 측면(1103)을 따라 배치될 수 있을 것이다. 물론, 스마트 디바이스(1000)에 단일한 마이크(1202)가 구비되는 것도 가능하다. 예를 들어, 단일한 마이크(1202)를 음성 입력 모듈(1200)로 이용하는 경우에는 마이크(1202)는 하우징(1100)의 측면(1103) 중 스마트 디바이스(1000)가 주로 사용되는 방향 쪽의 지점 또는 하우징(1100)의 상면(1102)에 배치될 수 있다. In this example, a microphone array 1204 including a plurality of microphones 1202 may be provided in the smart device 1000. For example, the plurality of microphones 1202 may be disposed radially on the top surface 1102 of the housing 1100 or may be disposed along the side surface 1103 of the housing 1100 as illustrated in FIG. 2. Of course, it is also possible that a single microphone 1202 is provided in the smart device 1000. For example, when a single microphone 1202 is used as the voice input module 1200, the microphone 1202 is a point on the side 1103 of the housing 1100 where the smart device 1000 is mainly used, or It may be disposed on the upper surface 1102 of the housing 1100.

본 예에서 스마트 디바이스(1000)에는 단일한 스피커(1302)가 무지향성으로 음성을 출력하도록 마련될 수 있다. 예를 들어, 스피커(1302)는 하우징(1100)의 내부에 하우징(1100)의 하면(1101)을 향해 음성을 출력하도록 배치되고, 하우징(1100)의 하면(1101)에는 콘 형태의 돌출부를 마련하여 사운드 패스가 하우징(1100)의 외측 전방향으로 출력되도록 할 수 있다. 물론, 스피커(1302)가 복수로 제공되는 것도 가능하며 이때에는 스마트 디바이스(1000)가 지향성 음성 출력을 하거나 멀티 채널(예를 들어, 스테레오 사운드나 5.1 채널 등)로 음성 출력하는 것이 가능할 수 있다. In this example, a single speaker 1302 may be provided in the smart device 1000 to output voice in an omni-directional manner. For example, the speaker 1302 is disposed inside the housing 1100 to output sound toward the lower surface 1101 of the housing 1100, and a cone-shaped protrusion is provided on the lower surface 1101 of the housing 1100 By doing so, the sound path can be output in the omni-directional direction of the housing 1100. Of course, it is also possible to provide a plurality of speakers 1302, and at this time, the smart device 1000 may be capable of directional voice output or multi-channel (eg, stereo sound or 5.1 channel) voice output.

본 예에 따른 스마트 디바이스(1000)의 구현예에서는 스마트 디바이스(1000)에 디스플레이 모듈(1400)이 탑재되지 않으므로, 스마트 디바이스(1000)가 디스플레이 백을 출력할 수는 없다. 후술될 본 발명의 일 실시예에 따른 스마트 디바이스(1000)와 그 제어 방법 중 일부는 디스플레이 모듈(1400)이 탑재된 스마트 디바이스(1000)에 한하여 적용될 수 있지만, 다른 일부는 디스플레이 모듈(1400)이 없는 스마트 디바이스(1000)에도 적용될 수 있음을 미리 밝혀둔다. In the implementation of the smart device 1000 according to the present example, since the display module 1400 is not mounted on the smart device 1000, the smart device 1000 cannot output a display back. Some of the smart device 1000 and a control method according to an embodiment of the present invention to be described later may be applied only to the smart device 1000 on which the display module 1400 is mounted, while the other part may include a display module 1400. It is revealed in advance that it can also be applied to the missing smart device 1000.

일 예에 따르면, 스마트 디바이스(1000)는 도 3에 도시된 바와 같이 테이블이나 플로어 등과 같은 수평면에 거치되어 이용되며 디스플레이 패널(1420)을 구비하는 형태(1000b)로 제공될 수 있다. According to an example, the smart device 1000 is mounted on a horizontal surface, such as a table or a floor, as illustrated in FIG. 3, and may be provided in a form 1000b having a display panel 1420.

본 예에서, 스마트 디바이스(1000)에 디스플레이 패널(1420)이 구비될 수 있다. 디스플레이 패널(1420)은 주로 하우징(1100)의 일면에 구비될 수 있다. 스마트 디바이스(1000)는 디스플레이 패널(1420)을 통해 각종 영상(예를 들어, 디스플레이 백)을 출력할 수 있다. 한편, 디스플레이 패널(1420)이 터치 패널로 제공됨에 따라 디스플레이 패널(1420)이 터치 입력 인터페이스로 기능할 수도 있다. In this example, the display panel 1420 may be provided in the smart device 1000. The display panel 1420 may be mainly provided on one surface of the housing 1100. The smart device 1000 may output various images (eg, display back) through the display panel 1420. Meanwhile, as the display panel 1420 is provided as a touch panel, the display panel 1420 may function as a touch input interface.

또 디스플레이 패널(1420)이 인디케이터(1106)의 역할을 대체할 수 있으므로, 본 예에서 스마트 디바이스(1000)에서 인디케이터(1106)는 선택적으로 구비될 수 있다. 또 본 예의 스마트 디바이스(1000)에서 음성 출력 모듈(1300)과 음성 입력 모듈(1200)은 다양한 배치 형태로 제공될 수 있다. In addition, since the display panel 1420 may replace the role of the indicator 1106, the indicator 1106 in the smart device 1000 may be selectively provided in this example. In addition, in the smart device 1000 of this example, the voice output module 1300 and the voice input module 1200 may be provided in various arrangements.

이상에서는 본 예의 스마트 디바이스(1000)에 대해 거치형인 것으로 설명했으나, 이와 달리 스마트 디바이스(1000)가 벽걸이형으로 제공될 수도 있다. 이때에는 하우징(1100)에 스마트 디바이스(1000)가 벽에 걸릴 수 있도록 하는 마운팅 수단이 마련되어 있을 수 있다. 예를 들어, 마운팅 수단은 접착층이나 브라켓, 리세스 등으로 제공될 수 있다. In the above, the smart device 1000 of this example has been described as being stationary, but the smart device 1000 may be provided in a wall-mounted manner. At this time, the housing 1100 may be provided with a mounting means for the smart device 1000 to be hung on the wall. For example, the mounting means may be provided as an adhesive layer, bracket, recess, or the like.

일 예에 따르면, 스마트 디바이스(1000)는 도 4에 도시된 바와 같이 프로젝터(1440)를 구비하는 형태(1000c)로 제공될 수 있다. 이에 따라 스마트 디바이스(1000)는 프로젝터(1440)를 통해 각종 영상을 출력할 수 있다. 도 4에 도시된 스마트 디바이스(1000)는 수평면에 거치되어 이용되거나 또는 천장이나 벽면 등에 설치되어 이용될 수 있다. 프로젝션 거리가 짧은 사용 환경을 목적하는 경우에는 스마트 디바이스(1000)의 프로젝터(1440)로는 초단초점(UST: Ultra Short Throw) 프로젝터가 이용될 수 있다. According to an example, the smart device 1000 may be provided in a form 1000c having a projector 1440 as illustrated in FIG. 4. Accordingly, the smart device 1000 may output various images through the projector 1440. The smart device 1000 illustrated in FIG. 4 may be mounted on a horizontal surface or used by being installed on a ceiling or a wall surface. When a usage environment with a short projection distance is desired, an ultra short throw (UST) projector may be used as the projector 1440 of the smart device 1000.

본 예에서 스마트 디바이스(1000)는 선택적으로 터치 입력 인터페이스, 제스처 입력 인터페이스, 시선 인식 인터페이스 및/또는 공간 인지 인터페이스 구비할 수 있다. In this example, the smart device 1000 may optionally include a touch input interface, a gesture input interface, a gaze recognition interface, and / or a spatial recognition interface.

터치 입력 인터페이스는 스마트 디바이스(1000)가 놓인 면이나 스마트 디바이스(1000)가 디스플레이 백을 프로젝션하는 면에 대한 사용자의 터치 입력을 감지할 수 있다. 터치 입력 인터페이스의 예로는 적외선 터치 센서를 들 수 있다. 적외선 터치 센서는 적외선을 조사하는 출광 수단과 적외선을 수신하는 수광 수단을 포함하고, 출광 수단에 의해 출사된 적외선이 사용자의 신체에 반사되어 수광 수단으로 수신되는 것을 이용하여 터치 입력을 획득할 수 있다. 적외선 터치 센서에서는 수광 수단 대신 일반 카메라나 적외선 카메라를 이용할 수도 있다. 적외선 카메라를 이용하는 경우에는 출광 수단이 소정의 패턴을 형성하도록 적외선을 출사하고, 적외선 카메라에서 사용자 신체에 의해 패턴이 변형되는 것을 감지하는 방식으로 터치 입력을 획득할 수 있다. The touch input interface may detect a user's touch input on a surface on which the smart device 1000 is placed or a surface on which the smart device 1000 projects a display back. An example of a touch input interface is an infrared touch sensor. The infrared touch sensor includes a light-emitting means for irradiating infrared rays and a light-receiving means for receiving infrared rays, and a touch input may be obtained using infrared rays emitted by the light-emitting means reflected by a user's body and received as light-receiving means. . In the infrared touch sensor, an ordinary camera or an infrared camera may be used instead of the light receiving means. In the case of using an infrared camera, a touch input may be obtained by emitting infrared rays so that the light emitting means forms a predetermined pattern, and detecting a pattern deformation by the user's body in the infrared camera.

또 제스처 입력 인터페이스는 각종 이미지에 기초하여 사용자의 신체로 표현되는 제스처(예를 들어, 팔동작이나 손가락 동작, 손가락 형태 등)을 인식할 수 있다. 제스처 입력 인터페이스는 터치 인터페이스와 다르게 물리적 면에 대한 터치만을 입력받지 않고 공간에 대한 제스처를 입력받을 수 있는 장점이 있다. In addition, the gesture input interface may recognize gestures (eg, arm movements, finger movements, finger shapes, etc.) expressed by the user's body based on various images. Unlike the touch interface, the gesture input interface has an advantage of being able to receive a gesture for a space without receiving only a touch on a physical surface.

또 시선 인식 인터페이스는 스마트 디바이스(1000)의 사용자의 시선을 인식할 수 있다. 예를 들어, 시선 인식 인터페이스는 사용자가 바라보는 방향 내지 사용자가 바라보는 지점을 2차원 또는 3차원 정보로써 인식할 수 있다. In addition, the gaze recognition interface may recognize the gaze of the user of the smart device 1000. For example, the gaze recognition interface may recognize a direction viewed by the user or a point viewed by the user as 2D or 3D information.

또 공간 인지 인터페이스는 스마트 디바이스(1000)가 있는 주변의 공간과 사물을 인지할 수 있다. 예를 들어, 공간 인지 인터페이스는 스마트 디바이스(1000)가 놓인 방의 구조, 주변에 놓인 사물의 위치, 형태 등을 인식할 수 있다. In addition, the spatial recognition interface may recognize spaces and objects around the smart device 1000. For example, the spatial recognition interface may recognize the structure of the room in which the smart device 1000 is placed, the location and shape of objects placed around it.

물론, 스마트 디바이스(1000)의 터치 입력 인터페이스, 제스처 입력 인터페이스, 시선 인식 인터페이스 및/또는 공간 인지 인터페이스가 상술한 예로 한정되는 것은 아니며, 당업자에게 자명한 다양한 변형이 가능함은 물론이다. 또한, 터치 입력 인터페이스, 제스처 입력 인터페이스, 시선 인식 인터페이스 및/또는 공간 인지 인터페이스는 본 예의 스마트 디바이스(1000) 뿐만 아니라 스마트 디바이스(1000) 다른 구현예들에도 적용될 수 있음을 밝혀둔다. Of course, the touch input interface, the gesture input interface, the gaze recognition interface, and / or the space recognition interface of the smart device 1000 are not limited to the above-described examples, and various modifications apparent to those skilled in the art are possible. In addition, it is revealed that the touch input interface, the gesture input interface, the gaze recognition interface, and / or the spatial recognition interface can be applied to the smart device 1000 of this example as well as other implementations of the smart device 1000.

본 예에서, 스마트 디바이스(1000)는 그 거치 방향이 조절됨에 따라 프로젝션 방향이 조절되는 형태(1000d)로 제공될 수 있다. 예를 들어 도 5을 살펴보면, 스마트 디바이스(1000)의 하우징(1100)은 적어도 두 개의 거치면(1104, 1105)을 가질 수 있으며, 두 개의 거치면(1104, 1105) 중 어느 하나를 통해 하우징(1100)이 거치되는지 여부에 따라 프로젝션 방향이 벽면을 향하는지 또는 바닥면을 향하는지의 여부가 결정될 수 있다. 다시 말해, 사용자가 수동으로 스마트 디바이스(1000)의 거치 자세를 조절함으로써 스마트 디바이스(1000)의 프로젝션 방향 또는 프로젝션 영역이 조절될 수 있다. In this example, the smart device 1000 may be provided in a form (1000d) in which the projection direction is adjusted as its mounting direction is adjusted. For example, referring to FIG. 5, the housing 1100 of the smart device 1000 may have at least two mounting surfaces 1104 and 1105, and the housing 1100 through any one of the two mounting surfaces 1104 and 1105. Whether the projection direction faces the wall surface or the floor surface may be determined according to whether or not it is mounted. In other words, the projection direction or projection area of the smart device 1000 may be adjusted by the user manually adjusting the posture of the smart device 1000.

한편, 수동으로 거치 자세가 조절됨에 따라 프로젝션 방향 내지는 영역이 조절되는 것과 달리 본 예에서, 스마트 디바이스(1000)는 프로젝션 방향을 자동적으로 조절할 수도 있다. 이를 위해 스마트 디바이스(1000)는 프로젝션의 방향이나 프로젝션의 영역을 조절하는 구동 모듈(1500)을 더 포함할 수도 있다. 예를 들어, 스마트 디바이스(1000)에는 프로젝터를 회전시키는 회전 모터(1520)가 구동 모듈(1500)로 구비될 수 있다. 다른 예를 들어, 스마트 디바이스(1000)에는 프로젝션의 광 경로를 조절하는 반사 미러 등이 구동 모듈(1500)로 구비될 수 있다. 또 다른 예로, 구동 모듈(1500)은 회전 모터(1520)와 반사 미러가 조합된 형태로 제공될 수도 있다. 여기서, 반사 미러는 MEMS 미러, 레조넌스 미러 등과 같이 고정 상태 타입(solid-state)로 구현되거나 물리적으로 방향이 조절되는 노딩 미러나 다각(polygonal) 미러 등으로 제공될 수 있다. 도 6을 살펴보면, 스마트 디바이스(1000)는 테이블 상에 거치되어 테이블 위로 디스플레이 백을 출력하는데, 이때 스마트 디바이스(1000)는 구동 모듈(1500)을 이용하여 디스플레이 백의 프로젝션 방향을 사용자 방향으로 이동시킬 수 있다. 또 스마트 디바이스(1000)는 구동 모듈(1500)을 이용하여 프로젝션 방향을 필요에 따라 벽면 또는 테이블면을 적절히 조절할 수 있다. 프로젝션 방향 내지는 프로젝션 영역을 자동으로 조절하는 스마트 디바이스(1000)가 프로젝터(1440)를 구비하는 형태로 한정되는 것은 아니다. 예를 들어, 도 에 도시된 형태의 스마트 디바이스(1000)에 디스플레이 패널(1420)의 방향을 조절하는 구동 모듈(1500)이 탑재해 스마트 디바이스(1000)가 구동 모듈(15000)을 통해 자동적으로 디스플레이 방향을 조절하는 것도 가능하다. On the other hand, unlike the projection direction or the area is adjusted as the mounting posture is manually adjusted, in this example, the smart device 1000 may automatically adjust the projection direction. To this end, the smart device 1000 may further include a driving module 1500 that controls the direction of the projection or the area of the projection. For example, the smart device 1000 may be provided with a rotation motor 1520 for rotating the projector as a driving module 1500. For another example, the smart device 1000 may be provided with a reflection mirror or the like for adjusting the light path of the projection as the driving module 1500. As another example, the driving module 1500 may be provided in the form of a combination of a rotating motor 1520 and a reflection mirror. Here, the reflective mirror may be implemented in a solid-state type such as a MEMS mirror or a resonance mirror, or may be provided as a nodeing mirror or a polygonal mirror that is physically oriented. Referring to FIG. 6, the smart device 1000 is mounted on a table and outputs a display bag over the table. At this time, the smart device 1000 may use the driving module 1500 to move the projection direction of the display bag to the user direction. have. In addition, the smart device 1000 may appropriately adjust a wall surface or a table surface according to the projection direction by using the driving module 1500. The smart device 1000 that automatically adjusts the projection direction or the projection area is not limited to the type having the projector 1440. For example, the smart device 1000 of the type shown in FIG. 1 is equipped with a driving module 1500 that adjusts the direction of the display panel 1420 so that the smart device 1000 automatically displays through the driving module 15000. It is also possible to adjust the direction.

1.5. 스마트 디바이스의 작동 모드 1.5. Operating mode of smart device

이하에서는 스마트 디바이스(1000)의 작동 모드에 관하여 설명한다. Hereinafter, an operation mode of the smart device 1000 will be described.

본 발명의 일 실시예에 따른 스마트 디바이스(1000)는 주로 사용자 음성을 수신하여 사용자 음성에 포함된 보이스 커맨드에 응답하여 피드백을 제공할 수 있다. 여기서, 스마트 디바이스(1000)가 상술한 동작, 즉 사용자 음성을 수신하고 이에 응답하여 피드백을 제공하는 동작을 수행하는 과정과 관련하여 고려되는 몇몇 기술적 사항들 중 일부는 다음과 같을 수 있다. The smart device 1000 according to an embodiment of the present invention may mainly receive a user voice and provide feedback in response to a voice command included in the user voice. Here, some of some of the technical factors considered in connection with the process of the smart device 1000 performing the above-described operation, that is, receiving the user's voice and providing feedback in response thereto may be as follows.

첫째는 사용자의 발화 의도를 파악하는 것이다. 비록 사용자가 스마트 디바이스(1000)를 이용할 수 있는 환경에 있다 하더라도, 사용자의 발화가 반드시 스마트 디바이스(1000)를 이용하기 위한 것은 아닐 수 있다. 스마트 디바이스(1000)를 이용하려는 의도 없이 발화한 사용자 음성에까지 스마트 디바이스(1000)가 반응하면, 스마트 디바이스(1000)가 오작동할 수 있으며 이에 따라 사용자 편의성이 저하될 수 있다. The first is to grasp the user's intention to speak. Even if the user is in an environment where the smart device 1000 can be used, the user's speech may not necessarily be to use the smart device 1000. If the smart device 1000 reacts to the user's voice that is spoken without the intention to use the smart device 1000, the smart device 1000 may malfunction, and accordingly, user convenience may deteriorate.

둘째는 개인 정보의 보호이다. 스마트 디바이스(1000)는 주로 보이스 커맨드를 해석하기 위하여 수신되는 사용자 음성을 보이스 어시스턴트 서버(10)에 전송하는데, 스마트 디바이스(1000)의 음성 입력 모듈(1200)이 수신되는 모든 사용자 음성이 모두 보이스 어시스턴트 서버(10)에 송신된다면 사용자가 원치않는 개인 정보까지 보이스 어시스턴트 서버(10)의 운영 주체에 전달될 가능성이 있다. 이는 사용자 입장에서 개인 정보 유출로 받아들여질 수 있다. The second is the protection of personal information. The smart device 1000 mainly transmits the received user voice to the voice assistant server 10 in order to interpret the voice command, and all user voices received by the voice input module 1200 of the smart device 1000 are all voice assistants. If it is transmitted to the server 10, there is a possibility that personal information that the user does not want is transmitted to the operating subject of the voice assistant server 10. This can be accepted as a leak of personal information from the user's perspective.

적어도 위의 두 가지 사항과 관련하여, 스마트 디바이스(1000)는 음성 입력 모듈(1200)을 통해 수신되는 음성이 스마트 디바이스(1000)를 이용하기 위한 의도에서 발화된 것인지 그렇지 않은 것인지 여부를 구별하여 처리할 수 있어야 한다. In relation to at least the above two points, the smart device 1000 distinguishes and processes whether or not the voice received through the voice input module 1200 is uttered or not intended to use the smart device 1000. You should be able to.

그런데, 일반적으로 스마트 디바이스(1000)는 수신되는 자연어 형태의 사용자 음성을 자체적으로 해독할 능력이 없는 경우가 대부분이므로, 음성 입력 모듈(1200)을 통해 입력되는 음성이 사용자가 스마트 디바이스(1000)를 이용하기 위해 발화한 것인지 아닌지 여부를 판단하기 어려울 수 있다.However, in general, since the smart device 1000 does not have the ability to decode the received user's voice in the natural language form on its own, the voice input through the voice input module 1200 may prevent the user from using the smart device 1000. It can be difficult to determine whether or not you have spoken to use it.

따라서, 본 발명의 일 실시예에 따른 스마트 디바이스(1000)는 보이스 커맨드의 해석을 위해 수신된 사용자 음성을 보이스 어시스턴트 서버(10)로 전달하는 페이즈(phase)에 앞서 사용자 음성으로부터 스마트 디바이스(1000)를 이용하려는 의도를 반영하는 특정 단어의 검출 여부에 기반하여 후속되는 사용자 음성이 스마트 디바이스(1000)를 이용하려는 의도에서 발화된 것인지 아닌지 여부에 관한 사용자 의도를 판단하는 단계(phase)를 운용함으로써, 스마트 디바이스(1000)를 이용하기 위한 의도로 발화된 사용자 음성과 그렇지 않은 음성을 구별하여 처리할 수 있다. Accordingly, the smart device 1000 according to an embodiment of the present invention is a smart device 1000 from the user voice prior to the phase of transmitting the received user voice to the voice assistant server 10 for interpretation of the voice command. By operating a phase of determining a user's intention as to whether or not a subsequent user voice is uttered from the intention to use the smart device 1000 based on whether a specific word reflecting the intention to use is detected, The user's voice uttered with the intention to use the smart device 1000 may be distinguished and processed.

기본적으로 본 발명의 일 실시예에 따른 스마트 디바이스(1000)의 작동 모드에는 스탠바이 모드 및 리스닝 모드가 포함될 수 있다. 여기서, 스탠바이 모드는 스마트 디바이스(1000)를 이용하려는 사용자 의도를 판단하는 모드이고, 리스닝 모드는 사용자가 스마트 디바이스(1000)를 이용하려는 의도를 가진 것을 전제로 보이스 커맨드가 담긴 사용자 음성을 수신하는 모드이다. Basically, an operation mode of the smart device 1000 according to an embodiment of the present invention may include a standby mode and a listening mode. Here, the standby mode is a mode for determining a user's intention to use the smart device 1000, and a listening mode is a mode for receiving a user's voice containing a voice command on the premise that the user has an intention to use the smart device 1000. to be.

이하에서는 본 발명의 일 실시예에 따른 스마트 디바이스(1000)의 다양한 작동 모드들에 관하여 구체적으로 설명하기로 한다. Hereinafter, various operating modes of the smart device 1000 according to an embodiment of the present invention will be described in detail.

1.5.1. 오프 상태 1.5.1. Off state

스마트 디바이스(1000)는 기기가 동작하지 않는 오프 상태를 가질 수 있다. 오프 상태에서는 전원이 공급되지 않아 기기가 꺼져있는 상태이거나 최소한의 전원만을 이용하는 휴면(hibernation) 상태일 수 있다. The smart device 1000 may have an off state in which the device does not operate. In the off state, the device may be turned off because no power is supplied, or may be in a hibernation state using only minimal power.

1.5.2. 스탠바이 모드 1.5.2. Standby mode

스마트 디바이스(1000)는 전원이 인가되면 사용자로부터 특별히 지시를 받지 않은 상태에서 스탠바이 모드로 동작할 수 있다. 예를 들어, 스마트 디바이스(1000)는 전원이 인가되면 스탠바이 모드로 진입할 수 있다. When the power is applied, the smart device 1000 may operate in a standby mode without being specifically instructed by the user. For example, the smart device 1000 may enter a standby mode when power is applied.

스탠바이 모드는 사용자로부터 스마트 디바이스(1000)를 이용하겠다는 의도를 전달받는 단계(phase)이다. 예를 들어, 스마트 디바이스(1000)는 스탠바이 모드에서 사용자 음성으로부터 웨이크업 워드를 검출하는 동작을 수행하고, 웨이크업 워드의 검출 여부에 따라 스마트 디바이스(1000)를 이용하려는 사용자의 의도를 판단할 수 있다. The standby mode is a phase in which the user intends to use the smart device 1000. For example, the smart device 1000 may perform an operation of detecting a wake-up word from a user's voice in standby mode, and determine a user's intention to use the smart device 1000 according to whether the wake-up word is detected. have.

스탠바이 모드에서 스마트 디바이스(1000)는 음성 입력 모듈(1200)을 통해 음성을 수신할 수 있다. 이때, 음성 입력 모듈(1200)은 음성을 비선별적으로 수신할 수 있다. 아직 스마트 디바이스(1000)는 수신되는 음성이 스마트 디바이스(1000)를 향해 발화된 사용자 음성인지 알 수 없기 때문에 보이스 커맨드의 해석을 위해 수신한 음성을 보이스 어시스턴트 서버(10)로 전송하지 않을 수 있다. In the standby mode, the smart device 1000 may receive voice through the voice input module 1200. At this time, the voice input module 1200 may receive the voice non-selectively. Since the smart device 1000 does not yet know whether the received voice is a user's voice uttered toward the smart device 1000, the smart device 1000 may not transmit the received voice to the voice assistant server 10 for interpretation of the voice command.

스탠바이 모드에서 스마트 디바이스(1000)는 수신되는 음성으로부터 웨이크업 워드를 검출할 수 있다. 웨이크업 워드는 스마트 디바이스(1000)의 제조사에 의해 미리 정해진 특정 단어 또는 어구이거나 스마트 디바이스(1000)의 제조사에 의해 미리 정해진 특정 단어군 또는 어구군 중 사용자에 의해 선택된 것일 수 있다. In the standby mode, the smart device 1000 may detect a wakeup word from the received voice. The wakeup word may be a specific word or phrase predetermined by the manufacturer of the smart device 1000 or a specific word group or phrase group predetermined by the manufacturer of the smart device 1000 by the user.

상술한 바와 같이 웨이크업 워드는 임의적인 자연어 형태가 아니라 최대 몇몇 개의 미리 정해지는 단어 내지는 짧은 어구에 불과하므로, 입력되는 음성에서 웨이크업 워드를 검출하는 것은 많은 연산을 필요로 하지 않는다. 따라서, 스마트 디바이스(1000)의 콘트롤러(1060)는 보이스 어시스턴트 서버(10)와의 협업없이 직접 로컬에서 수신되는 음성으로부터 웨이크업 워드를 검출할 수 있다. As described above, since the wake-up word is not an arbitrary natural language form, but only a few predetermined words or short phrases, detecting the wake-up word from the input voice does not require a lot of computation. Accordingly, the controller 1060 of the smart device 1000 can detect the wakeup word from the voice received directly locally without collaboration with the voice assistant server 10.

스탠바이 모드에서 스마트 디바이스(1000)는 수신된 음성으로부터 웨이크업 워드가 검출되는지 여부에 기초하여 스마트 디바이스(1000)를 이용하려는 사용자 의도를 판단할 수 있다. 수신되는 음성으로부터 웨이크업 워드가 검출되는 경우 스마트 디바이스(1000)는 사용자가 스마트 디바이스(1000)를 이용하려는 의도를 가진 것으로 이해할 수 있다. In the standby mode, the smart device 1000 may determine a user's intention to use the smart device 1000 based on whether a wakeup word is detected from the received voice. When a wakeup word is detected from the received voice, the smart device 1000 may be understood as having a user's intention to use the smart device 1000.

스마트 디바이스(1000)는 사용자가 스마트 디바이스(1000)를 이용하려는 의도를 가진 것으로 판단하면, 사용자로부터 보이스 커맨드가 담긴 사용자 음성이 수신될 것을 대비해 리스닝 모드로 진입할 수 있다. 반대로 수신되는 음성으로부터 웨이크업 워드가 검출되지 않는 경우에는 스마트 디바이스(1000)는 사용자가 스마트 디바이스(1000)를 이용하려는 의도가 없는 것으로 이해할 수 있으며, 스마트 디바이스(1000)는 리스닝 모드로 진입하는 대신 스탠바이 모드를 유지할 수 있다. If the smart device 1000 determines that the user has an intention to use the smart device 1000, the smart device 1000 may enter a listening mode in preparation for receiving a user voice containing a voice command from the user. Conversely, when a wake-up word is not detected from the received voice, the smart device 1000 may understand that the user does not intend to use the smart device 1000, and the smart device 1000 may instead enter the listening mode. Standby mode can be maintained.

한편, 스탠바이 모드는 웨이크업 워드를 검출/인식하는 모드이므로 경우에 따라서는 웨이크업 워드 검출 상태 또는 모드로 지칭할 수도 있다.Meanwhile, the standby mode is a mode for detecting / recognizing wake-up words, and may be referred to as a wake-up word detection state or mode in some cases.

1.5.3. 리스닝 모드 1.5.3. Listening mode

상술한 바와 같이 스마트 디바이스(1000)는 스탠바이 모드에서 웨이크업 워드의 입력이 검출되면 리스닝 모드로 진입할 수 있다. 한편, 스마트 디바이스(1000)에 리스닝 모드로의 진입을 지시하는 버튼이 마련되어 있고 스마트 디바이스(1000)가 해당 버튼에 대한 사용자 입력에 응해 리스닝 모드에 진입하는 것도 가능하다. As described above, when the wake-up word input is detected in the standby mode, the smart device 1000 may enter the listening mode. Meanwhile, a button for instructing the smart device 1000 to enter the listening mode is provided, and the smart device 1000 may enter the listening mode in response to a user input for the corresponding button.

리스닝 모드에서 스마트 디바이스(1000)는 음성 입력 모듈(1200)을 통해 사용자 음성을 입력받을 수 있다. 리스닝 모드에서 입력되는 사용자 음성은 스마트 디바이스(1000)에 특정한 동작을 지시하는 보이스 커맨드를 포함할 수 있다. In the listening mode, the smart device 1000 may receive a user voice through the voice input module 1200. The user voice input in the listening mode may include a voice command instructing a specific operation to the smart device 1000.

여기서, 반드시 그러한 것은 아니나 보이스 커맨드는 주로 자연어 형태를 취할 수 있다. 자연어 형태의 보이스 커맨드를 인식하는 것은 높은 연산량으로 인해 로컬에서 처리되기 어려울 수 있으므로, 스마트 디바이스(1000)는 보이스 어시스턴트 서버(10)와 협업하여 사용자 음성으로부터 보이스 커맨드를 해석할 수 있다. Here, although not necessarily, the voice command may mainly take a natural language form. Recognizing the voice command in the form of a natural language may be difficult to be processed locally due to a high amount of computation, so the smart device 1000 may interpret the voice command from the user's voice in cooperation with the voice assistant server 10.

따라서, 리스닝 모드에서 스마트 디바이스(1000)는 사용자 음성에 포함된 보이스 커맨드의 해석을 위해 입력된 사용자 음성을 보이스 어시스턴트 서버(10)로 전송할 수 있다. 스마트 디바이스(1000)는 몇몇 방식으로 사용자 음성을 보이스 어시스턴트 서버(10)에 전송할 수 있다. 일 예로, 스마트 디바이스(1000)는 리스닝 모드에서 수신되는 사용자 음성을 실시간으로 보이스 어시스턴트 서버(10)에 전송할 수 있다. 다른 예로, 스마트 디바이스(1000)는 수신되는 사용자 음성을 취합한 뒤 취합된 사용자 음성을 일괄적으로 보이스 어시스턴트 서버(10)에 전송할 수도 있다. 보이스 어시스턴트 서버(10)는 스마트 디바이스(1000)로부터 전달받은 사용자 음성으로부터 보이스 커맨드를 해석하고, 해석된 보이스 커맨드에 대응하는 피드백 데이터를 생성하여 스마트 디바이스(1000)로 반환할 수 있다.Accordingly, in the listening mode, the smart device 1000 may transmit the input user voice to the voice assistant server 10 for interpretation of the voice command included in the user voice. The smart device 1000 may transmit the user's voice to the voice assistant server 10 in several ways. For example, the smart device 1000 may transmit the user voice received in the listening mode to the voice assistant server 10 in real time. As another example, the smart device 1000 may collect the received user voices and then transmit the collected user voices to the voice assistant server 10 collectively. The voice assistant server 10 may interpret the voice command from the user voice received from the smart device 1000, generate feedback data corresponding to the analyzed voice command, and return it to the smart device 1000.

리스닝 모드에서 스마트 디바이스(1000)가 사용자 음성을 보이스 어시스턴트 서버(10)로 전송하기 전에, 스마트 디바이스(1000)는 사용자 음성에 대한 전처리(pre-processing)을 수행할 수 있다. 전처리의 예로는 노이즈 캔슬링, 음성 데이터 압축 등이 있다. 예를 들어, 음성 출력 모듈(1300)을 통해 토크 백을 출력 중인 스마트 디바이스(1000)가 리스닝 모드에서 사용자 음성을 수신한 경우 스마트 디바이스(1000)에는 토크 백과 보이스 커맨드를 포함한 사용자 음성을 함께 수신할 수 있다. 이때, 스마트 디바이스(1000)는 스스로 출력하는 토크 백에 대한 정보를 이용하여 수신된 음성 중 토크 백 부분을 제거함으로써 수신된 음성에서 사용자 음성 부분을 추출해 낼 수 있다. In the listening mode, before the smart device 1000 transmits the user voice to the voice assistant server 10, the smart device 1000 may perform pre-processing of the user voice. Examples of preprocessing include noise canceling and voice data compression. For example, when the smart device 1000 outputting the talkback through the voice output module 1300 receives the user voice in the listening mode, the smart device 1000 receives the user voice including the talkback and voice commands together. You can. At this time, the smart device 1000 may extract the user's voice portion from the received voice by removing the talk back portion of the received voice by using the information about the talk bag output by itself.

또 리스닝 모드에서 스마트 디바이스(1000)는 깨끗한 사용자 음성을 입력받기 위해 조용한 주변 환경을 조성하기 위한 동작을 수행할 수 있다. 일 예로, 스마트 디바이스(1000)는 리스닝 모드에서 진입하면 기 출력 중이던 토크 백의 출력을 중단하거나 토크 백의 오디오 볼륨을 감소시킬 수 있다. 예를 들어, 라디오 뉴스를 재생 중인 스마트 디바이스(1000)는 리스닝 모드에 진입하면 재생 중이던 라이브 뉴스를 일시 정지시킬 수 있다. In addition, in the listening mode, the smart device 1000 may perform an operation for creating a quiet surrounding environment in order to receive a clean user voice. For example, when entering the listening mode, the smart device 1000 may stop the output of the talkback that was previously being output or reduce the audio volume of the talkback. For example, the smart device 1000 playing radio news may pause the live news being played when entering the listening mode.

스마트 디바이스(1000)는 리스닝 모드를 일정한 시간 간격(time period) 동안 유지할 수 있다. 예를 들어, 스마트 디바이스(1000)는 보이스 커맨드를 담은 사용자 입력을 보이스 어시스턴트 서버(10)에 전송하고 보이스 어시스턴트 서버(10)로부터 피드백 데이터를 수신하면, 리스닝 모드를 종료할 수 있다. 다른 예를 들어, 스마트 디바이스(1000)는 사용자 음성의 입력이 완료된 경우 또는 사용자 음성 입력된 후 미리 정해진 시간 동안 음성이 입력되지 않는 경우에, 리스닝 모드를 종료할 수 있다. 또 다른 예를 들어, 스마트 디바이스(1000)는 리스닝 모드에 진입한 뒤 미리 정해진 시간 동안 사용자 음성의 입력이 없는 경우에 리스닝 모드를 종료할 수 있다. 이하에서는 스마트 디바이스(1000)가 리스닝 모드를 유지하는 시간 간격을 '리스닝 윈도우'로 지칭하기로 한다. 또한 이와 관련하여 스마트 디바이스(1000)가 리스닝 모드로 진입하는 동작을 '리스닝 윈도우를 연다(opening a listening window)'고 지칭하고, 반대로 스마트 디바이스(1000)가 리스닝 모드를 종료하는 동작을 '리스닝 윈도우를 닫는다(closing a listening window)'로 지칭하며, 스마트 디바이스(1000)가 리스닝 모드를 유지하는 상태를 '리스닝 모드가 열린' 상태로 지칭하기로 한다. The smart device 1000 may maintain the listening mode for a certain time period. For example, when the smart device 1000 transmits a user input containing a voice command to the voice assistant server 10 and receives feedback data from the voice assistant server 10, the listening mode may be terminated. For another example, the smart device 1000 may terminate the listening mode when the input of the user's voice is completed or when the voice is not input for a predetermined time after the user's voice is input. As another example, after entering the listening mode, the smart device 1000 may terminate the listening mode when there is no user's voice input for a predetermined time. Hereinafter, a time interval in which the smart device 1000 maintains the listening mode will be referred to as a 'listening window'. In addition, in this connection, the operation in which the smart device 1000 enters the listening mode is referred to as an 'opening a listening window', and conversely, the operation in which the smart device 1000 exits the listening mode is referred to as the 'listening window'. It is referred to as closing (closing a listening window), and the state in which the smart device 1000 maintains the listening mode will be referred to as the 'listening a listening window' state.

한편, 몇몇 경우에 스마트 디바이스(1000)는 리스닝 모드에서도 사용자 음성으로부터 웨이크업 워드를 검출할 수 있다. 리스닝 모드에서 웨이크업 워드가 검출되면 스마트 디바이스(1000)는 리스닝 모드를 다시 시작할 수 있으며, 리스닝 윈도우를 초기화시킬 수 있다. Meanwhile, in some cases, the smart device 1000 may detect a wakeup word from the user's voice even in listening mode. When the wakeup word is detected in the listening mode, the smart device 1000 may restart the listening mode and initialize the listening window.

1.5.4. 응답 모드 1.5.4. Response mode

리스닝 모드에서 스마트 디바이스(1000)가 입력된 사용자 음성을 보이스 어시스턴트 서버(10)로 전달하면, 보이스 어시스턴트 서버(10)는 사용자 음성으로부터 보이스 커맨드를 추출하고 추출된 보이스 커맨드를 해석하고, 해석된 보이스 커맨드에 기초하여 피드백 데이터를 생성하고, 이를 스마트 디바이스(1000)에 전달할 수 있다. In the listening mode, when the smart device 1000 delivers the input user voice to the voice assistant server 10, the voice assistant server 10 extracts the voice command from the user voice, interprets the extracted voice command, and interprets the analyzed voice The feedback data may be generated based on the command and transmitted to the smart device 1000.

피드백 데이터를 전달받은 스마트 디바이스(1000)는 피드백 데이터를 이용하여 피드백을 출력할 수 있다. 이처럼 피드백을 전달받아 출력하는 단계(phase)가 응답 모드이다. 응답 모드에서 스마트 디바이스(1000)는 피드백 데이터를 수신해 그에 따라 피드백을 출력하는데, 스마트 디바이스(1000)는 토크 백 및/또는 디스플레이 백을 출력할 수 있다. The smart device 1000 receiving the feedback data may output feedback using the feedback data. As such, the phase of receiving and outputting feedback is a response mode. In response mode, the smart device 1000 receives feedback data and outputs feedback accordingly. The smart device 1000 may output a talk back and / or a display back.

한편, 스마트 디바이스(1000)는 피드백 데이터를 수신하여 피드백의 출력을 개시함과 동시에 또는 피드백의 출력을 개시한 뒤 곧이어 스탠바이 모드로 진입할 수 있다. 이 경우에는, 스마트 디바이스(1000)는 스탠바이 모드 또는 리스닝 모드인 상태에서 피드백을 출력할 수 있다. 관점에 따라서는, 스마트 디바이스(1000)가 스탠바이 모드나 리스닝 모드에서 피드백의 출력을 지속한다고 규정하는 대신, 응답 모드와 스탠바이 모드가 동시에 운용되는 것으로 해석할 수도 있다. On the other hand, the smart device 1000 may receive feedback data and start outputting the feedback at the same time or immediately after starting outputting the feedback, and then enter the standby mode. In this case, the smart device 1000 may output feedback in a standby mode or a listening mode. Depending on the viewpoint, instead of prescribing that the smart device 1000 continues to output the feedback in the standby mode or the listening mode, it may be interpreted that the response mode and the standby mode are operated simultaneously.

일 예로, 사용자로부터 라이브 뉴스의 스트리밍을 요구받은 스마트 디바이스(1000)는 응답 모드에서 보이스 어시스턴트 서버(10)로부터 라이브 뉴스에 대한 피드백 데이터를 수신해 라이브 뉴스의 재생을 개시한 뒤 스탠바이 모드로 진입하여 라이브 뉴스의 스트리밍을 지속할 수 있다. 다른 예로, 스탠바이 모드에서 라이브 뉴스 스트리밍 중 웨이크업 워드가 검출되면, 스마트 디바이스(1000)는 리스닝 모드로 진입하되, 여전히 라이브 뉴스 스트리밍을 유지하는 것도 가능하다. For example, the smart device 1000 that is requested to stream live news from a user receives feedback data for live news from the voice assistant server 10 in response mode, starts playback of live news, and then enters standby mode. Stream live news. As another example, if a wake-up word is detected during live news streaming in the standby mode, the smart device 1000 enters a listening mode, but it is also possible to still maintain live news streaming.

1.5.5. 작동 상태의 운용 1.5.5. Operation in operation

본 발명의 일 실시예에 따른 스마트 디바이스(1000)가 상술한 모드들을 모두 운용해야 하는 것은 아니다. 예를 들어, 스마트 디바이스(1000)의 작동 모드에서 응답 모드가 생략되는 것도 가능하다. 또 본 발명의 일 실시예에 따른 스마트 디바이스(1000)에서 상술한 모두들이 모두 독립적으로 운용되어야만 하는 것은 아니다. 예를 들어, 스마트 디바이스(1000)는 스탠바이 모드, 리스닝 모드 및 응답 모드는 그 전부 또는 일부가 중복된 작동 모드를 가질 수도 있다. The smart device 1000 according to an embodiment of the present invention does not have to operate all the above-described modes. For example, it is also possible that the response mode is omitted in the operation mode of the smart device 1000. In addition, not all of the above described in the smart device 1000 according to an embodiment of the present invention should be operated independently. For example, the standby mode, the listening mode, and the response mode of the smart device 1000 may have an operation mode in which all or a part thereof is overlapped.

도 7 및 도 8은 본 발명의 일 실시예에 따른 스마트 디바이스(1000)의 작동 모드의 운용의 예시에 관한 도면이고, 도 9는 본 발명의 일 실시예에 따른 스마트 디바이스(1000)와 보이스 어시스턴트 서버(10) 간의 통신에 관한 도면이다. 7 and 8 are diagrams for an example of operation of an operation mode of the smart device 1000 according to an embodiment of the present invention, and FIG. 9 is a smart device 1000 and a voice assistant according to an embodiment of the present invention This is a diagram of communication between the servers 10.

도 7 및 9를 살펴보면, 스마트 디바이스(1000)는 전원이 인가되면, 오프 상태로부터 스탠바이 모드로 진입할 수 있다. 스탠바이 모드인 스마트 디바이스(1000)는 음성 입력 모듈(1200)을 통해 지속적으로 음성을 수신하며 수신되는 음성으로부터 웨이크업 워드를 검출할 수 있다. 7 and 9, the smart device 1000 may enter a standby mode from an off state when power is applied. The smart device 1000 in standby mode continuously receives voice through the voice input module 1200 and detects a wakeup word from the received voice.

웨이크업 워드가 검출되면 스마트 디바이스(1000)는 리스닝 모드로 진입할 수 있다. 또는 리스닝 모드를 지시하는 터치/버튼/제스처 등의 사용자 입력이 입력되어도 스마트 디바이스(1000)는 리스닝 모드에 진입할 수 있다. 리스닝 모드에서 스마트 디바이스(1000)는 보이스 커맨드가 담긴 사용자 음성을 수신할 수 있다.When the wakeup word is detected, the smart device 1000 may enter a listening mode. Alternatively, even when a user input such as a touch / button / gesture indicating a listening mode is input, the smart device 1000 may enter the listening mode. In the listening mode, the smart device 1000 may receive a user voice containing a voice command.

스마트 디바이스(1000)는 수신된 사용자 음성을 보이스 어시스턴트 서버(10)로 전송할 수 있다. 스마트 디바이스(1000)는 보이스 어시스턴트 서버(10)로부터 보이스 커맨드에 대한 피드백 데이터를 수신하고, 피드백 데이터에 기초하여 피드백을 출력할 수 있다. The smart device 1000 may transmit the received user voice to the voice assistant server 10. The smart device 1000 may receive feedback data for the voice command from the voice assistant server 10 and output feedback based on the feedback data.

리스닝 모드인 스마트 디바이스(1000)는 리스닝 모드로부터 스탠바이 모드로 복귀할 수 있다. 예를 들어, 리스닝 모드인 스마트 디바이스(1000)는 리스닝 모드 진입 후 사용자 음성이 입력되지 않으면 스탠바이 모드로 복귀할 수 있다. 또 예를 들어, 리스닝 모드인 스마트 디바이스(1000)는 사용자 음성이 입력된 후 추가적인 사용자 음성이 입력되지 않으면 스탠바이 모드로 복귀할 수 있다. 또 예를 들어, 리스닝 모드인 스마트 디바이스(1000)는 사용자 음성을 보이스 어시스턴트로 전달한 경우, 보이스 어시스턴트 서버(10)로부터 보이스 커맨드에 관련된 피드백의 피드백 데이터를 수신한 경우 또는 피드백 데이터에 기초하여 피드백의 출력(또는 출력의 개시)를 수행한 경우 스탠바이 모드로 복귀할 수 있다. The smart device 1000 in the listening mode may return from the listening mode to the standby mode. For example, the smart device 1000 in the listening mode may return to the standby mode if the user's voice is not input after entering the listening mode. Also, for example, the smart device 1000 in the listening mode may return to the standby mode if no additional user voice is input after the user voice is input. In addition, for example, when the smart device 1000 in the listening mode transmits the user's voice to the voice assistant, receives feedback data of feedback related to the voice command from the voice assistant server 10 or based on the feedback data. When the output (or start of output) is performed, the standby mode may be returned.

도 8 및 도 9을 살펴보면, 스마트 디바이스(1000)는 전원이 인가되면, 오프 상태로부터 스탠바이 모드로 진입할 수 있다. 스탠바이 모드인 스마트 디바이스(1000)는 음성 입력 모듈(1200)을 통해 지속적으로 음성을 수신하며 수신되는 음성으로부터 웨이크업 워드를 검출할 수 있다Referring to FIGS. 8 and 9, when power is applied, the smart device 1000 may enter a standby mode from an off state. The smart device 1000 in the standby mode continuously receives voice through the voice input module 1200 and detects a wakeup word from the received voice.

웨이크업 워드가 검출되면 스마트 디바이스(1000)는 리스닝 모드로 진입할 수 있다.When the wakeup word is detected, the smart device 1000 may enter a listening mode.

리스닝 모드에서 스마트 디바이스(1000)는 보이스 커맨드가 담긴 사용자 음성을 수신할 수 있다. 스마트 디바이스(1000)는 수신된 사용자 음성을 보이스 어시스턴트 서버(10)로 전송할 수 있다. In the listening mode, the smart device 1000 may receive a user voice containing a voice command. The smart device 1000 may transmit the received user voice to the voice assistant server 10.

수신된 사용자 음성을 보이스 어시스턴트 서버(10)로 전송한 스마트 디바이스(1000)는 응답 모드로 진입하여 피드백을 출력할 수 있다. 리스닝 모드인 스마트 디바이스(1000)는 사용자 음성을 보이스 어시스턴트 서버(10)에 전송한 경우 또는 보이스 어시스턴트 서버(10)로부터 피드백 데이터를 수신한 경우 응답 모드로 진입할 수 있다.The smart device 1000 that transmits the received user voice to the voice assistant server 10 may enter a response mode and output feedback. The listening device smart device 1000 may enter a response mode when the user voice is transmitted to the voice assistant server 10 or when feedback data is received from the voice assistant server 10.

스마트 디바이스(1000)는 보이스 어시스턴트 서버(10)로부터 보이스 커맨드에 대한 피드백 데이터를 수신하고, 피드백 데이터에 기초하여 피드백을 출력할 수 있다. 피드백의 출력을 종료하거나 또는 피드백의 출력을 개시한 스마트 디바이스(1000)는 스탠바이 모드로 복귀할 수 있다. The smart device 1000 may receive feedback data for the voice command from the voice assistant server 10 and output feedback based on the feedback data. The smart device 1000 that ends the output of the feedback or initiates the output of the feedback may return to the standby mode.

한편, 리스닝 모드인 스마트 디바이스(1000)는 리스닝 모드인 동안 사용자 음성을 수신하지 못한 경우에는 응답 모드로 진입하는 대신 스탠바이 모드로 복귀할 수 있다. On the other hand, the smart device 1000 in the listening mode may return to the standby mode instead of entering the response mode when the user voice is not received while in the listening mode.

2. 미디어 콘텐츠 재생 중의 음성 처리 2. Voice processing during media content playback

2.1. 개요 2.1. summary

이상에서는 본 발명의 일 실시예에 따른 스마트 디바이스(1000)가 보이스 어시스턴트 기능을 탑재하여, 사용자 음성을 수신하고 수신된 사용자 음성에 포함된 보이스 커맨드에 대응하는 피드백을 출력하는 것으로 설명하였다. In the above, it has been described that the smart device 1000 according to an embodiment of the present invention is equipped with a voice assistant function to receive a user voice and output feedback corresponding to a voice command included in the received user voice.

보이스 커맨드를 인식하는 과정을 다시 한번 자세히 설명하면 다음과 같다. The process of recognizing the voice command in detail is as follows.

먼저 스마트 디바이스(1000)는 리스닝 상태에서 보이스 커맨드를 포함하는 사용자 음성을 입력받고, 입력된 사용자 음성를 보이스 어시스턴트 서버(10)로 전송할 수 있다. 이때, 보이스 어시스턴트 서버(10)로 전송하는 사용자 음성은 스마트 디바이스(1000)에서 노이즈 캔슬링 등의 전처리를 거친 사용자 음성일 수 있다. 보이스 어시스턴트 서버(10)는 사용자 음성을 수신하면 음성 텍스트 인식(STT: Speech-To-Text)를 수행해 음성 신호를 텍스트화하고, 텍스트화된 내용에 기초하여 보이스 커맨드를 인식할 수 있다. 보이스 커맨드가 인식되면 보이스 어시스턴트 서버(10)는 인식된 보이스 커맨드에 기초하여 적절한 피드백 데이터를 생성하여 스마트 디바이스(1000)로 전송할 수 있다. 스마트 디바이스(1000)는 피드백 데이터를 전송받으면 이를 음성 형태의 토크 백이나 영상 형태의 디스플레이 백으로 출력할 수 있다. First, the smart device 1000 may receive a user voice including a voice command in a listening state, and transmit the input user voice to the voice assistant server 10. At this time, the user voice transmitted to the voice assistant server 10 may be a user voice that has undergone pre-processing such as noise canceling in the smart device 1000. When receiving the user's voice, the voice assistant server 10 performs speech-to-text (STT) to textize the voice signal and recognize the voice command based on the textualized content. When the voice command is recognized, the voice assistant server 10 may generate appropriate feedback data based on the recognized voice command and transmit it to the smart device 1000. When the feedback data is transmitted, the smart device 1000 may output it as a voice-type talkback or a video-type displayback.

본 발명의 일 실시예에 따른 스마트 디바이스(1000)는 사용자에 의해 이루어지는 발화와 스피커를 통해 출력되는 토크 백을 통해 사용자와 상호 작용하는 보이스 인터페이스 기기로서의 성격을 가지는 동시에 동영상 또는 음악 등을 재생하는 콘텐츠 커슈밍 디바이스(contents consuming device)로서의 성격을 동시에 가진다. 그러므로, 스마트 디바이스(1000)는 일상적으로 영화나 음악 등의 멀티미디어 콘텐츠를 재생하는데 이용될 수 있는 것이다. 따라서, 본 발명의 일 실시예에 따른 스마트 디바이스(1000)는 미디어 콘텐츠의 재생 중에도 사용자로부터 보이스 커맨드가 담긴 사용자 음성을 수신할 수 있는 것이 유리할 수 있다. The smart device 1000 according to an embodiment of the present invention has the characteristics of a voice interface device that interacts with a user through talk made by a user and a talkback output through a speaker, and at the same time plays content such as video or music. It has a personality as a content consuming device. Therefore, the smart device 1000 can be used to play multimedia content such as movies and music on a daily basis. Accordingly, it may be advantageous for the smart device 1000 according to an embodiment of the present invention to receive a user voice containing a voice command from a user even while playing media content.

스마트 디바이스(1000)가 미디어 콘텐츠의 재생 중에 사용자 음성을 수신하는 경우에는, 사용자의 발화에 따른 음성 이외에도 미디어 콘텐츠의 재생에 따른 오디오 데이터의 출력이 스마트 디바이스(1000)의 음성 입력 모듈(1200)에 함께 입력될 수 있다. 이러한 경우 오디오 데이터에 의한 소리가 사용자의 발화에 따른 음성의 해석을 어렵게 만드는 요인으로 작용할 수 있다. 특히, 스마트 디바이스(1000)가 스탠바이 모드에서 수신된 음성 신호로부터 정해진 몇몇 단어에 해당하는 웨이크업 워드를 검출하는 것에 무리가 없다하더라도 리스닝 상태에서 입력된 음성 신호로부터 임의의 보이스 커맨드를 인식할 시에 입력되는 음성 신호에 미디어 콘텐츠의 오디오 출력이 섞이는 경우에는 보이스 커맨드의 인식률이 극적으로 저하될 가능성이 높다. When the smart device 1000 receives the user's voice during the playback of the media content, in addition to the voice according to the user's speech, the output of audio data according to the playback of the media content is transmitted to the voice input module 1200 of the smart device 1000. Can be entered together. In this case, the sound by audio data may act as a factor that makes it difficult to interpret the voice according to the user's speech. Particularly, when the smart device 1000 recognizes a certain voice command from the voice signal input in the listening state even if it is not difficult to detect the wake-up words corresponding to some predetermined words from the voice signal received in the standby mode. When the audio output of the media content is mixed with the input voice signal, there is a high possibility that the recognition rate of the voice command is dramatically reduced.

따라서, 본 발명의 일 실시예에 따른 스마트 디바이스(1000)는 미디어 콘텐츠의 재생 중 리스닝 상태로 진입하여 보이스 커맨드를 포함하는 음성 신호를 수신하려는 경우 기 재생 중인 미디어 콘텐츠의 오디오 출력이 사용자의 발화에 의한 음성 신호에 포함된 보이스 커맨드의 인식률을 저하시키는 것을 방지하기 위하여 미디어 콘텐츠에 대하여 소정의 처리를 수행할 수 있다. Accordingly, when the smart device 1000 according to an embodiment of the present invention enters a listening state during playback of media content and receives a voice signal including a voice command, the audio output of the media content being played is transmitted to the user's speech. In order to prevent the recognition rate of the voice command included in the voice signal from being lowered, a predetermined process may be performed on the media content.

2.2. 미디어 콘텐츠 재생 중 스탠바이 모드의 운용 2.2. Standby mode during media content playback

도 10은 본 발명의 일 실시예에 따른 미디어 콘텐츠 재생 중의 스탠바이 모드의 운용에 관한 도면이다. 10 is a diagram for operation of a standby mode during media content playback according to an embodiment of the present invention.

도 10을 참조하면 스마트 디바이스(1000)는 미디어 콘텐츠를 재생할 수 있다. 미디어 콘텐츠를 재생 중인 스마트 디바이스(1000)는 그 작동 모드가 스탠바이 모드일 수 있다. 다시 말해, 스마트 디바이스(1000)는 미디어 콘텐츠 재생 중에도 음성 신호를 수신하고, 수신된 음성 신호로부터 웨이크업 워드를 인식할 수 있다. 웨이크업 워드는 사용자에 의해 임의적으로 발화되는 다른 보이스 커맨드와 달리 미리 정해진 단어로 한정되어 있으므로, 스마트 디바이스(1000)는 사용자의 음성에 재생 중인 미디어 콘텐츠의 음성 출력이 섞이더라도 입력된 음성 신호로부터 웨이크업 워드를 검출할 수 있다. Referring to FIG. 10, the smart device 1000 may play media content. The operation mode of the smart device 1000 playing media content may be a standby mode. In other words, the smart device 1000 may receive a voice signal even while playing media content, and recognize a wakeup word from the received voice signal. Since the wake-up word is limited to a predetermined word unlike other voice commands that are randomly spoken by the user, the smart device 1000 wakes up from the input voice signal even if the voice output of the media content being played is mixed with the user's voice. Up word can be detected.

2.3. 미디어 콘텐츠 재생 중 리스닝 모드의 운용 2.3. Listening mode during media content playback

도 11은 본 발명의 일 실시예에 따른 미디어 콘텐츠 재생 중의 리스닝 모드의 운용에 관한 도면이다. 11 is a diagram for operation of a listening mode during media content playback according to an embodiment of the present invention.

도 11을 참조하면, 스마트 디바이스(1000)가 웨이크업 워드를 검출하면, 스마트 디바이스(1000)의 작동 상태는 스탠바이 모드로부터 리스닝 상태로 천이할 수 있다. 미디어 콘텐츠를 재생 중이던 스마트 디바이스(1000)는 그 작동 모드가 리스닝 모드로 전환되면, 미디어 콘텐츠의 재생을 제어할 수 있다. Referring to FIG. 11, when the smart device 1000 detects a wakeup word, the operating state of the smart device 1000 may transition from the standby mode to the listening state. The smart device 1000 that is playing the media content may control the playback of the media content when its operation mode is switched to the listening mode.

일 예로, 스마트 디바이스(1000)는 미디어 콘텐츠의 재생을 유지하되, 미디어 콘텐츠의 음성 크기를 낮추거나 제거할 수 있다. 다른 예로, 스마트 디바이스(1000)는 미디어 콘텐츠의 재생을 중단할 수 있다.For example, the smart device 1000 may maintain the playback of the media content, but may lower or remove the voice volume of the media content. As another example, the smart device 1000 may stop playing the media content.

이와 같이 미디어 콘텐츠의 재생을 제어함에 따라, 사용자가 발화 시 스마트 디바이스(1000)는 미디어 콘텐츠의 오디오 데이터가 섞이지 않은 사용자의 음성 신호만을 입력받을 수 있다. 따라서, 스마트 디바이스(1000)의 작동 상태가 리스닝 상태일 때, 사용자의 보이스 커맨드 인식률은 향상될 수 있다. By controlling the reproduction of the media content as described above, when the user speaks, the smart device 1000 may receive only the voice signal of the user whose audio data of the media content is not mixed. Accordingly, when the operating state of the smart device 1000 is in the listening state, the recognition rate of the voice command of the user can be improved.

리스닝 모드로 진입 시 미디어 콘텐츠의 재생을 중단하는 경우에는 사용자가 미디어 콘텐츠의 감상을 중단해야 하지만, 리스닝 모드로 진입 시 미디어 콘텐츠의 재생을 유지한 상태에서 볼륨을 낮추거나 없애는 경우에는 사용자가 보이스 커맨드를 발화하면서도 미디어 콘텐츠의 감상을 이어나갈 수 있는 장점이 있다. 다만, 미디어 콘텐츠의 음성 크기가 없거나 낮은 상태에서 미디어 콘텐츠가 재생되므로, 스마트 디바이스(1000)는 미디어 콘텐츠의 볼륨이 소거 또는 감소된 것을 보완하기 위해 오디오 데이터에 대응되는 자막을 출력할 수 있다. 이외에도 스마트 디바이스(1000)는 사용자가 리스닝 모드에서도 기존 재생 중이던 미디어 콘텐츠의 감상을 이어갈 수 있도록 다양한 동작을 수행할 수 있는데, 이에 대한 구체적인 설명은 후술하기로 한다.When entering the listening mode, if the media content stops playing, the user must stop watching the media content, but when entering the listening mode, if the volume is lowered or eliminated while maintaining the media content playing, the user can use the voice command. It has the advantage of continuing the appreciation of media content while igniting. However, since the media content is played in a state in which there is no or low voice volume of the media content, the smart device 1000 may output subtitles corresponding to audio data to compensate for the volume of the media content being erased or reduced. In addition, the smart device 1000 may perform various operations so that the user can continue to enjoy the media content that was being played in the listening mode, and detailed description thereof will be described later.

3. 스마트 디바이스의 제어 방법들 3. Control methods of smart devices

이하에서는 본 발명의 일 실시예에 따른 미디어 콘텐츠의 재생 중 리스닝 모드의 운용 방법 또는 리스닝 모드 진입 시 기 재생 중인 미디어 콘텐츠의 처리 방법의 예시들에 관하여 설명하기로 한다. 한편, 후술되는 방법의 예시들에 관한 설명에서 각 예시들이 상술한 스마트 디바이스(1000)에 의해 수행되는 것으로 설명하며, 따라서 이하에서 설명되는 방법들은 스마트 디바이스(1000)의 제어 방법으로서 구현되는 것이 가능하다. 하지만, 이는 단순히 설명의 편의를 위한 것에 불과하므로 본 발명의 일 실시예에 따른 지향성 피드백을 출력하는 방법들이 반드시 상술한 스마트 디바이스(1000)에 의해 한정되는 것은 아님을 미리 밝혀둔다. Hereinafter, examples of a method of operating a listening mode or a method of processing a media content that is currently playing when entering the listening mode will be described with reference to an embodiment of the present invention. On the other hand, in the description of the examples of the method to be described later, each example is described as being performed by the above-described smart device 1000, and thus the methods described below can be implemented as a control method of the smart device 1000. Do. However, since this is merely for convenience of description, it is revealed in advance that methods for outputting directional feedback according to an embodiment of the present invention are not necessarily limited by the above-described smart device 1000.

3.1. 제1 예 3.1. Example 1

도 12는 본 발명의 일 실시예에 따른 스마트 디바이스의 제어 방법의 제1 예의 순서도이다. 12 is a flowchart of a first example of a method for controlling a smart device according to an embodiment of the present invention.

도 12를 참조하면, 본 발명의 일 실시예에 따른 스마트 디바이스의 제어 방법은 미디어 콘텐츠 재생 단계(S1110), 미디어 콘텐츠 재생 중 음성 신호를 수신하는 단계(S1120), 수신된 음성 신호에 웨이크업 워드가 포함되면 리스닝 상태로 진입하는 단계(S1130) 및 리스닝 상태로 진입 시 미디어 콘텐츠의 음량을 조절하는 단계(S1140)를 포함할 수 있다. Referring to FIG. 12, a method for controlling a smart device according to an embodiment of the present invention includes a media content playback step (S1110), a step of receiving a voice signal during media content playback (S1120), and a wake-up word to the received voice signal If included, it may include entering a listening state (S1130) and adjusting the volume of media content when entering the listening state (S1140).

이하에서는 상술한 각 단계에 관하여 보다 상세하게 설명한다.Hereinafter, each step will be described in more detail.

스마트 디바이스(1000)는 미디어 콘텐츠를 재생할 수 있다(S1110). The smart device 1000 may play the media content (S1110).

여기서, 미디어 콘텐츠는 적어도 오디오 데이터를 포함할 수 있다. 일 예로, 미디어 콘텐츠는 비디오 데이터 및 오디오 데이터를 포함할 수 있다. 다른 예로, 미디어 콘텐츠는 비디오 데이터를 제외한 오디오 데이터만 포함할 수 있다. 콘트롤러(1060)는 음성 출력 모듈(1300)을 통해 미디어 콘텐츠의 오디오 데이터를 출력함으로써 미디어 콘텐츠를 재생할 수 있다. Here, the media content may include at least audio data. For example, the media content may include video data and audio data. As another example, the media content may include only audio data excluding video data. The controller 1060 may play media content by outputting audio data of the media content through the audio output module 1300.

또 여기서, 미디어 콘텐츠는 사용자의 보이스 커맨드에 대한 피드백으로 제공되는 것일 수 있다. 따라서, 미디어 콘텐츠가 오디오 데이터만 포함하는 경우에는 토크 백일 수 있으며, 비디오 데이터를 더 포함하는 경우에는 디스플레이 백일 수 있다. 다만, 미디어 콘텐츠가 반드시 보이스 커맨드에 의한 피드백이어야만 하는 것은 아니다.In addition, the media content may be provided as feedback for the voice command of the user. Therefore, when the media content includes only audio data, it may be a talk back, and when the media content further includes video data, it may be a display back. However, the media content does not necessarily have to be feedback by voice commands.

스마트 디바이스(1000)는 미디어 콘텐츠 재생 중 음성 신호를 수신할 수 있다(S1120). 예를 들어, 음성 신호는 스마트 디바이스(1000)의 외부로부터 수신되는 사용자의 보이스를 포함할 수 있다. 이 때, 상기 사용자의 보이스는 웨이크업 워드를 포함할 수 있다. 스마트 디바이스(1000)는 사용자의 음성으로부터 웨이크업 워드를 인식할 수 있다. The smart device 1000 may receive a voice signal during media content playback (S1120). For example, the voice signal may include a user's voice received from the outside of the smart device 1000. At this time, the voice of the user may include a wake-up word. The smart device 1000 may recognize a wakeup word from the user's voice.

스마트 디바이스(1000)는 웨이크업 워드가 포함된 사용자 음성을 수신하면 리스닝 상태에 진입할 수 있다(S1130). The smart device 1000 may enter a listening state upon receiving a user voice including a wake-up word (S1130).

스마트 디바이스(1000)의 작동 상태가 상기 리스닝 상태로 전환되면, 스마트 디바이스(1000)는 사용자의 보이스 커맨드의 인식률을 높이기 위해 미디어 콘텐츠의 음량을 조절할 수 있다(S1140). 예를 들면, 상기 스마트 디바이스(1000)는 사용자의 보이스를 제외한 사운드를 제거하기 위해, 재생중인 미디어 콘텐츠의 오디오 데이터의 음량을 낮추거나 제거함으로써 조절할 수 있다.When the operating state of the smart device 1000 is switched to the listening state, the smart device 1000 may adjust the volume of the media content to increase the recognition rate of the user's voice command (S1140). For example, the smart device 1000 may adjust by lowering or removing the volume of audio data of the media content being played in order to remove the sound excluding the user's voice.

예를 들어, 스마트 디바이스(1000)는 미디어 콘텐츠의 오디오 데이터의 음량을 다양한 방법으로 조절할 수 있는데, 일 예로, 오디오 데이터의 음량을 미리 정해진 크기까지 낮출 수 있다. 여기서, 상기 미리 정해진 크기는 0이 될 수 있다.For example, the smart device 1000 may adjust the volume of the audio data of the media content in various ways. For example, the volume of the audio data may be reduced to a predetermined size. Here, the predetermined size may be 0.

다른 일 예로, 스마트 디바이스(1000)는 오디오 데이터의 출력을 종료할 수 있다. 이에 따라 재생중인 미디어 콘텐츠의 음향은 제거될 수 있다.As another example, the smart device 1000 may end the output of audio data. Accordingly, the sound of the media content being played can be removed.

또 다른 일 예로, 스마트 디바이스(1000)는 음성 출력 모듈(1300)을 비활성화시킴으로써 미디어 콘텐츠의 소리가 출력되지 않도록 할 수 있다.As another example, the smart device 1000 may prevent the sound of media content from being output by deactivating the audio output module 1300.

이처럼 스마트 디바이스(1000)가 오디오 데이터의 음량을 조절할 때, 스마트 디바이스(1000)는 음량을 점진적으로 또는 급격하게 조절할 수 있다. As described above, when the smart device 1000 adjusts the volume of audio data, the smart device 1000 may gradually or rapidly adjust the volume.

이에 따라 스마트 디바이스(1000)는 사용자에게 미디어 콘텐츠를 지속적으로 제공하면서도 보이스 커맨드를 자체 재생 중인 미디어 콘텐츠의 오디오 데이터로 인한 노이즈 없이 수신할 수 있다. Accordingly, the smart device 1000 may continuously receive the media content to the user while receiving the voice command without noise due to the audio data of the media content being played back.

한편, 스마트 디바이스(1000)는 특정 시점에 미디어 콘텐츠의 음량을 원래대로 복귀시킬 수 있다. Meanwhile, the smart device 1000 may restore the volume of the media content to a certain point in time.

스마트 디바이스(1000)는 리스닝 모드가 종료되면 미디어 콘텐츠의 음량을 원래대로 조절할 수 있다. When the listening mode ends, the smart device 1000 may adjust the volume of the media content as it is.

예를 들어, 스마트 디바이스(1000)는 리스닝 모드에서 보이스 커맨드의 입력이 없이 소정의 시간이 경과되면 스탠바이 모드로 돌아갈 수 있는데, 스탠바이 모드로 복귀할 때 미디어 콘텐츠의 음량을 원래대로 조절할 수 있다. 다른 예를 들어, 스마트 디바이스(1000)는 리스닝 모드에서 수신된 보이스 커맨드에 대해 피드백을 출력할 때에 미디어 콘텐츠의 음량을 원래대로 조절할 수 있다. 이때, 피드백은 오디오 데이터가 없는 디스플레이 백일 수 있다. 또 다른 예를 들어, 스마트 디바이스(1000)는 리스닝 모드에서 수신된 보이스 커맨드에 대해 피드백을 출력이 종료된 때에 미디어 콘텐츠의 음량을 원래대로 조절할 수 있다. 이때, 피드백은 오디오 데이터를 포함하는 토크 백 또는 디스플레이 백일 수 있다. For example, the smart device 1000 may return to the standby mode when a predetermined time elapses without inputting a voice command in the listening mode. When returning to the standby mode, the volume of media content may be adjusted as it is. For another example, the smart device 1000 may adjust the volume of the media content as it is when outputting feedback on the voice command received in the listening mode. In this case, the feedback may be a display back without audio data. For another example, the smart device 1000 may adjust the volume of the media content to the original when output of the feedback is terminated for the voice command received in the listening mode. In this case, the feedback may be a talk back or a display back including audio data.

3.2. 제2 예 3.2. Example 2

도 13은 본 발명의 일 실시예에 따른 스마트 디바이스의 제어 방법의 제2 예의 순서도이다. 13 is a flowchart of a second example of a method for controlling a smart device according to an embodiment of the present invention.

도 13을 참조하면, 본 발명의 일 실시예에 따른 스마트 디바이스의 제어 방법은 미디어 콘텐츠 재생 단계(S1210), 미디어 콘텐츠 재생 중 음성 신호를 수신하는 단계(S1220), 수신된 음성 신호에 웨이크업 워드가 포함되면 리스닝 상태로 진입하는 단계(S1230), 리스닝 상태로 진입 시 미디어 콘텐츠의 음량을 조절하는 단계(S1240) 및 리스닝 상태로 진입 시 텍스트 데이터를 디스플레이하는 단계(S1250)를 더 포함할 수 있다. 여기서, 단계 S1210 내지 단계 S1240은 상술한 단계 S1110 내지 단계 S1140과 유사하게 수행될 수 있으므로 이하에서는 이에 대한 설명은 생략하기로 한다. Referring to FIG. 13, a method for controlling a smart device according to an embodiment of the present invention includes a media content playback step (S1210), a step of receiving a voice signal during media content playback (S1220), and a wake-up word to the received voice signal When included, the step of entering into the listening state (S1230), adjusting the volume of the media content when entering the listening state (S1240) and displaying the text data when entering the listening state may further include (S1250). . Here, since steps S1210 to S1240 may be performed similarly to steps S1110 to S1140 described above, a description thereof will be omitted below.

스마트 디바이스(1000)가 리스닝 모드에서 재생 중인 미디어 콘텐츠의 오디오 데이터의 음량을 낮추거나 없앤 경우 사용자는 미디어 콘텐츠의 충분히 감상하지 못할 수 있다. 예를 들어, 라이브 뉴스 등이 제공되는 경우, 사용자는 리스닝 모드에서 라이브 뉴스의 내용을 잘 모를 수 있다. 따라서, 스마트 디바이스(1000)는 감소 또는 제거된 미디어 콘텐츠의 음성을 보완하기 위해 미디어 콘텐츠의 오디오 데이터에 대응되는 텍스트 데이터를 출력할 수 있다(S1250). When the smart device 1000 lowers or removes the volume of the audio data of the media content being played in the listening mode, the user may not be able to fully appreciate the media content. For example, when live news or the like is provided, the user may not know the content of the live news in the listening mode. Accordingly, the smart device 1000 may output text data corresponding to the audio data of the media content in order to supplement the voice of the reduced or removed media content (S1250).

여기서, 스마트 디바이스(1000)는 텍스트 데이터를 보이스 어시스턴트 서버(10)로부터 수신할 수 있다. 예를 들어, 스마트 디바이스(1000)는 리스닝 모드에 진입하면 보이스 어시스턴트 서버(10)에 텍스트 데이터를 요청할 수 있다. Here, the smart device 1000 may receive text data from the voice assistant server 10. For example, when entering the listening mode, the smart device 1000 may request text data from the voice assistant server 10.

도 14 및 도 15는 본 발명의 일 실시예에 따른 텍스트 데이터의 디스플레이에 관한 도면이다. 14 and 15 are diagrams for displaying text data according to an embodiment of the present invention.

도 14를 참조하면, 스마트 디바이스(1000)는 미디어 콘텐츠 재생 중 웨이크업 워드를 인식하고 리스닝 상태로 진입하고, 작동 상태가 리스닝 상태임을 나타내는 표시를 디스플레이할 수 있다. 스마트 디바이스(1000)의 작동 상태가 리스닝 상태로 전환되면, 스마트 디바이스(1000)는 재생 중인 미디어 콘텐츠의 오디오 데이터의 음량을 조절하고 음량이 조절된 오디오 데이터에 대응되는 텍스트 데이터를 출력할 수 있다. 이 때, 스마트 디바이스(1000)는 오디오 데이터의 음량 상태를 나타내는 표시를 디스플레이할 수 있다.Referring to FIG. 14, the smart device 1000 may recognize a wakeup word during media content playback, enter a listening state, and display an indication that the operating state is a listening state. When the operating state of the smart device 1000 is switched to the listening state, the smart device 1000 may adjust the volume of audio data of the media content being played and output text data corresponding to the volume-adjusted audio data. At this time, the smart device 1000 may display an indication indicating the volume status of the audio data.

도 15를 참조하면, 스마트 디바이스(1000)는 음량이 조절된 오디오 데이터에 대응되는 텍스트 데이터 이외에도 음량이 조절되지 전의 오디오 데이터에 대응되는 텍스트 데이터까지 함께 디스플레이 할 수 있다. 구체적으로 도 15에서 "최고기온은 서울이 6도 부산이 8도…"라는 음성 데이터 중 "최고기온은 서울이 6도" 부분은 웨이크업 워드가 입력되지 전에 출력된 오디오 데이터로 음량이 조절되지 않은 부분이고 "이 6도 부산이 8도" 부분은 웨이크업 워드가 입력된 후에 음량이 조절된 부분이라고 할 때, 스마트 디바이스(1000)는 음량이 조절되지 않은 부분에 대해서도 텍스트 데이터를 출력할 수 있다. 즉, 스마트 디바이스(1000)는 음량이 조절된 오디오 데이터에 대응되는 텍스트 데이터와 음량이 조절된 오디오 데이터와 관련된 오디오 데이터에 대응되는 텍스트 데이터를 출력할 수 있는 것이다. Referring to FIG. 15, the smart device 1000 may display text data corresponding to audio data before the volume is adjusted as well as text data corresponding to the audio data whose volume is adjusted. Specifically, in FIG. 15, the volume of the "highest temperature is 6 degrees in Seoul and 8 degrees in Busan ..." part of the "highest temperature is 6 degrees in Seoul" is the audio data output before the wake-up word is input. When the wake-up word is input and the volume is adjusted after the wake-up word is input, the smart device 1000 may output text data even for the portion where the volume is not adjusted. have. That is, the smart device 1000 may output text data corresponding to audio data having a volume control and text data corresponding to audio data associated with a volume-controlled audio data.

여기서, 음량이 조절된 오디오 데이터와 관련된 오디오 데이터란 음량이 조절된 오디오 데이터와 문맥적으로 관련된 부분일 수 있다. 예를 들어, 음량이 조절된 부분과 한 문장이나 한 구절을 이루는 부분일 수 있다. 또는 음량이 조절된 오디오 데이터와 관련된 오디오 데이터란 음량이 조절되기 전 미리 정해진 시간 전의 시점부터 음량이 조절되는 시점까지의 오디오 데이터일 수 있다. 예를 들어, 음량이 조절되는 시점으로부터 이전 2초 동안에 출력되는 오디오 데이터일 수 있다. Here, the audio data related to the volume-controlled audio data may be a part contextually related to the volume-adjusted audio data. For example, it may be a portion in which the volume is adjusted and a sentence or a phrase. Alternatively, the audio data related to the volume-adjusted audio data may be audio data from a point in time before a predetermined time before the volume is adjusted to a point in time when the volume is adjusted. For example, it may be audio data output for the previous 2 seconds from the time when the volume is adjusted.

한편, 스마트 디바이스(1000)가 미디어 콘텐츠의 음량을 원래대로 복귀시키면, 텍스트 데이터의 디스플레이 역시 종료될 수 있다. On the other hand, when the smart device 1000 returns the volume of the media content to the original, the display of text data may also be ended.

일 예로, 스마트 디바이스(1000)는 미디어 콘텐츠의 음량을 원래대로 복귀시키는 시점에 텍스트 데이터의 디스플레이를 종료할 수 있다. 다른 예를 들어, 스마트 디바이스(1000)는 디스플레이되고 있던 텍스트 데이터에 대응하는 오디오 데이터의 출력이 종료되는 시점까지 텍스트 데이터를 유지한 뒤 텍스트 데이터의 디스플레이를 종료할 수 있다.For example, the smart device 1000 may end the display of text data at a time point when the volume of the media content is restored. For another example, the smart device 1000 may maintain the text data until the output of audio data corresponding to the displayed text data ends and then display the text data.

3.3. 제3 예 3.3. Example 3

도 16은 본 발명의 일 실시예에 따른 스마트 디바이스의 제어 방법의 제3 예의 순서도이다. 16 is a flowchart of a third example of a method for controlling a smart device according to an embodiment of the present invention.

도 16을 참조하면, 본 발명의 일 실시예에 따른 스마트 디바이스의 제어 방법은 미디어 콘텐츠를 재생하는 단계(S1310), 미디어 콘텐츠 재생 중 음성 신호를 수신하는 단계(S1320), 리스닝 상태에 진입하는 단계(S1330), 미디어 콘텐츠의 음량을 조절하는 단계(S1340), 텍스트 데이터를 디스플레이하는 단계(S1350), 피드백을 출력하는 단계(S1360), 미디어 콘텐츠 재생을 중단하는 단계(S1370) 및 미디어 콘텐츠 재생을 재개하는 단계(S1380)를 포함할 수 있다.Referring to FIG. 16, a method for controlling a smart device according to an embodiment of the present invention includes playing a media content (S1310), receiving a voice signal while playing the media content (S1320), and entering a listening state. (S1330), adjusting the volume of media content (S1340), displaying text data (S1350), outputting feedback (S1360), stopping playing media content (S1370) and playing media content It may include the step of resuming (S1380).

이하에서는 상술한 각 단계에 대하여 보다 구체적으로 설명한다. 다만, 단계 S1310 내지 S1360은 이미 상술한 다른 단계들과 유사하게 수행될 수 있으므로 이에 대한 자세한 설명은 생략하기로 한다. Hereinafter, each step will be described in more detail. However, since steps S1310 to S1360 may be performed similarly to the other steps described above, a detailed description thereof will be omitted.

스마트 디바이스(1000)는 리스닝 모드로 진입 시 기 재생 중이던 미디어 콘텐츠의 재생을 유지하되 미디어 콘텐츠의 오디오 데이터의 음량을 조절하고 음량이 조절된 오디오 데이터에 대응하는 텍스트 데이터를 출력할 수 있다. 이 상태에서, 스마트 디바이스(1000)는 리스닝 모드에서 보이스 커맨드를 입력받고, 이에 응하여 피드백을 출력할 수 있다(S1360). When entering the listening mode, the smart device 1000 maintains playback of the media content that was previously playing, but controls the volume of the audio data of the media content and outputs text data corresponding to the volume-controlled audio data. In this state, the smart device 1000 may receive a voice command in the listening mode and output feedback in response (S1360).

이때, 스마트 디바이스(1000)는 피드백을 출력하기 위해 미디어 콘텐츠의 재생을 중단할 수 있다(S1370). 예를 들어, 피드백이 오디오 데이터를 포함하는 디스플레이 백 또는 토크 백인 경우에는 스마트 디바이스(1000)는 이를 출력하기 위해 미디어 콘텐츠의 재생을 중지할 수 있다. 그리고 난 후 스마트 디바이스(1000)는 피드백 출력을 종료하면 미디어 콘텐츠의 재생을 재개할 수 있다(S1380). 구체적으로 라이브 뉴스를 재생 중 오늘의 날씨를 묻는 보이스 커맨드를 입력받은 스마트 디바이스(1000)를 가정하면, 스마트 디바이스(1000)는 오늘의 날씨를 묻기 전 사용자가 웨이크업 워드를 입력하는 동안에는 정상적으로 라이브 뉴스를 재생하고, 웨이크업 워드가 인식되고 리스닝 모드로 진입하면 라이브 뉴스의 볼륨을 조절하고, 날씨를 묻는 보이스 커맨드에 대응하는 날씨 정보를 토크 백으로 출력하는 동안에는 라이브 뉴스를 정지하고, 토크 백이 종료되면 라이브 뉴스를 재개할 수 있다. At this time, the smart device 1000 may stop playing the media content in order to output feedback (S1370). For example, if the feedback is a display back or talk white containing audio data, the smart device 1000 may stop playing the media content to output it. Then, when the feedback output ends, the smart device 1000 may resume playback of the media content (S1380). Specifically, assuming a smart device 1000 that receives a voice command that asks for today's weather while playing live news, the smart device 1000 normally displays live news while the user enters a wake-up word before asking for today's weather. When the wake-up word is recognized and enters the listening mode, the volume of live news is adjusted, and while the weather information corresponding to the voice command asking for weather is output as a talkback, the live news is stopped, and when the talkback ends Live news can be resumed.

3.4. 제4 예 3.4. Example 4

도 17은 본 발명의 일 실시예에 따른 스마트 디바이스의 제어 방법의 제4 예의 순서도이다. 17 is a flowchart of a fourth example of a method for controlling a smart device according to an embodiment of the present invention.

도 17을 참조하면, 본 발명의 일 실시예에 따른 스마트 디바이스의 제어 방법은 미디어 콘텐츠를 재생하는 단계(S1410), 미디어 콘텐츠 재생 중 음성 신호를 수신하는 단계(S1420), 리스닝 상태에 진입하는 단계(S1430), 미디어 콘텐츠의 종류를 판단하는 단계(S1440) 및 미디어 콘텐츠의 종류에 따라 미디어 콘텐츠의 재생을 유지하거나 중단하는 단계(S1450)를 포함할 수 있다. Referring to FIG. 17, a method for controlling a smart device according to an embodiment of the present invention includes playing a media content (S1410), receiving a voice signal while playing the media content (S1420), and entering a listening state. (S1430), determining the type of media content (S1440) and maintaining or stopping the playback of the media content according to the type of media content (S1450).

이하에서는 상술한 각 단계에 대하여 보다 구체적으로 설명한다. 다만, 단계 S1410 내지 S1430은 이미 상술한 다른 단계들과 유사하게 수행될 수 있으므로 이에 대한 자세한 설명은 생략하기로 한다. Hereinafter, each step will be described in more detail. However, since steps S1410 to S1430 may be performed similarly to the other steps described above, detailed description thereof will be omitted.

상술한 본 발명의 일 실시예에 따른 스마트 디바이스의 제어 방법의 다른 예들에서는 스마트 디바이스(1000)가 미디어 콘텐츠 재생 중 리스닝 모드에 진입하면 스마트 디바이스(1000)가 미디어 콘텐츠의 재생을 유지하는 것으로 설명하였으나, 반드시 그래야만 하는 것은 아니며 이와 달리 스마트 디바이스(1000)는 미디어 콘텐츠의 재생을 중지할 수 있다.In other examples of the method for controlling a smart device according to an embodiment of the present invention described above, when the smart device 1000 enters a listening mode during media content playback, the smart device 1000 maintains playback of the media content. In other words, the smart device 1000 may stop playing the media content.

이때, 스마트 디바이스(1000)는 재생 중이던 미디어 콘텐츠의 종류에 따라, 리스닝 모드 진입 시 미디어 콘텐츠의 재생의 유지 또는 중단 여부를 판단할 수 있다(S1440). 여기서, 미디어 콘텐츠의 종류는 예를 들어, 라이브 콘텐츠 및 넌-라이브 콘텐츠를 포함할 수 있다. At this time, the smart device 1000 may determine whether to maintain or stop the playback of the media content when entering the listening mode, according to the type of the media content being played (S1440). Here, the type of media content may include, for example, live content and non-live content.

일 예로, 스마트 디바이스(1000)는 재생 중이던 미디어 콘텐츠가 현재 브로드 캐스팅 중인 뉴스나 실시간 스포츠 중계와 같은 라이브 콘텐츠인 경우에는 미디어 콘텐츠의 재생을 유지하는 것으로 판단할 수 있다. 이때에는 스마트 디바이스(1000)는 리스닝 모드 진입 시 미디어 콘텐츠의 재생을 유지하되 그 음량을 조절하고 텍스트 데이터를 출력할 수도 있다(S1450). For example, the smart device 1000 may determine that the media content is being played when the media content being played is live content such as news or real-time sports broadcast. In this case, the smart device 1000 may maintain the playback of the media content when entering the listening mode, but may also adjust the volume and output text data (S1450).

다른 예로, 스마트 디바이스(1000)는 재생 중이던 미디어 콘텐츠가 다시 보기나 영화 등과 같이 넌 라이브 콘텐츠인 경우에는 미디어 콘텐츠의 재생을 중지하는 것으로 판단할 수 있다. 이때에는 스마트 디바이스(1000)는 리스닝 모드 진입 시 미디어 콘텐츠의 재생을 정지하고(S1450), 이후 재생을 재개할 수 있다. As another example, the smart device 1000 may determine to stop playing the media content when the media content being played is non-live content, such as a replay or movie. At this time, when entering the listening mode, the smart device 1000 may stop playback of the media content (S1450), and then resume playback.

3.5. 제5 예 3.5. Example 5

도 18은 본 발명의 일 실시예에 따른 스마트 디바이스의 제어 방법의 제5 예의 순서도이다. 18 is a flowchart of a fifth example of a method for controlling a smart device according to an embodiment of the present invention.

도 18을 참조하면, 본 발명의 일 실시예에 따른 스마트 디바이스의 제어 방법은 스탠바이 모드에서 음성 신호를 수신하는 단계(S1510), 리스닝 상태에 진입하는 단계(S1520), 보이스 커맨드를 수신하는 단계(S1530) 및 스탠바이 모드에서 미디어 콘텐츠의 재생 여부를 고려하여 보이스 커맨드에 대응하는 피드백을 출력하는 단계(S1540)를 포함할 수 있다. Referring to FIG. 18, a method for controlling a smart device according to an embodiment of the present invention includes receiving a voice signal in a standby mode (S1510), entering a listening state (S1520), and receiving a voice command ( S1530) and outputting feedback corresponding to the voice command in consideration of whether the media content is played in the standby mode (S1540).

이하에서는 상술한 각 단계에 대하여 보다 구체적으로 설명한다.Hereinafter, each step will be described in more detail.

먼저 스마트 디바이스(1000)는 스탠바이 모드에서 음성 신호를 수신하고(S1510), 음성 신호에 웨이크업 워드가 포함된 경우 리스닝 모드로 진입할 수 있다(S1520). 리스닝 모드에서 스마트 디바이스(1000)는 보이스 커맨드를 수신하고(S1530), 이를 보이스 어시스턴트 서버(10)로 전달하고 보이스 어시스턴트 서버(10)로부터 피드백 데이터를 수신하고, 수신된 피드백 데이터를 이용해 피드백을 출력할 수 있다.First, the smart device 1000 may receive a voice signal in standby mode (S1510), and may enter a listening mode when a wakeup word is included in the voice signal (S1520). In the listening mode, the smart device 1000 receives a voice command (S1530), delivers it to the voice assistant server 10, receives feedback data from the voice assistant server 10, and outputs feedback using the received feedback data can do.

이때, 스마트 디바이스(1000)는 스탠바이 모드에서 미디어 콘텐츠의 재생 여부가 고려된 피드백을 출력할 수 있다(S1540). At this time, the smart device 1000 may output feedback considering whether to play the media content in the standby mode (S1540).

예를 들어, 스마트 디바이스(1000)는 스탠바이 모드에서 미디어 콘텐츠가 재생 중이던 경우에는 그렇지 않은 경우보다 피드백을 짧게 출력할 수 있다. 이를 위해 스마트 디바이스(1000)는 수신된 피드백 데이터로부터 출력되어야할 피드백 내용을 일부 생략하거나 빠른 속도로 출력할 수 있다. 구체적으로 토크 백의 일부를 디스플레이 백을 변환하여 토크 백 내용을 줄이거나 토크 백의 말속도를 빠르게 할 수 있을 것이다. For example, when the media content is being played in the standby mode, the smart device 1000 may output the feedback shorter than the case where it is not. To this end, the smart device 1000 may omit some of the feedback content to be output from the received feedback data or output it at a high speed. Specifically, a part of the talk bag may be converted into a display bag to reduce the content of the talk bag or to speed up the talk bag.

다른 예를 들어, 스마트 디바이스(1000)는 보이스 커맨드가 담긴 사용자 음성을 보이스 어시스턴트 서버(10)로 전달 시 스탠바이 모드에서 미디어 콘텐츠를 기 재생 중이었는지 여부를 지시하는 정보를 함께 전달하고, 보이스 어시스턴트 서버(10)는 스탠바이 모드에서 미디어 콘텐츠를 기 재생 중이었는지 여부를 지시하는 정보에 기초해 그렇지 않은 경우보다 짧은 피드백 데이터를 생성할 수 있다. For another example, the smart device 1000 delivers information indicating whether media content was already playing in standby mode when the user voice containing the voice command is transmitted to the voice assistant server 10, and the voice assistant server (10) may generate feedback data shorter than that in the case of not based on information indicating whether or not the media content was previously played in the standby mode.

또 다른 예를 들어, 기 재생 중인 미디어 콘텐츠가 있는 경우, 스마트 디바이스(1000)는 기 재생 중인 미디어 콘텐츠의 데이터 형태를 고려하여 출력되는 피드백의 데이터 형태를 조절할 수도 있다. 예를 들어, 기 재생 중인 미디어 콘텐츠가 오디오 데이터 형태인 경우에는 피드백을 디스플레이 백으로 출력하여 사용자에게 기 재생 중인 미디어 콘텐츠는 청각적으로 감상하고 새로이 요청된 피드백은 시각적으로 감상하도록 할 수 있다. 또는 그 반대로, 기 재생 중인 미디어 콘텐츠가 비디오 데이터 형태인 경우에는 피드백을 토크 백으로 출력하여 사용자에게 기 재생 중인 미디어 콘텐츠는 시각적으로 감상하고 새로이 요청된 피드백은 청각적으로 감상하도록 할 수 있다. 이를 위해 스마트 디바이스(1000)는 수신된 피드백 데이터로부터 기 재생 중인 미디어 콘텐츠의 형태를 고려하여 피드백 형태를 결정할 수 있다. 또는 스마트 디바이스(1000)는 보이스 어시스턴트 서버(10)로 기 재생 중인 미디어 콘텐츠의 형태를 전달하여 보이스 어시스턴트 서버가 이를 고려하여 피드백 형태를 결정하도록 한 뒤, 그에 따른 피드백 데이터를 수신하는 것도 가능하다. For another example, when there is media content that is being played, the smart device 1000 may adjust the data type of the output feedback in consideration of the data type of the media content that is being played. For example, when the pre-played media content is in the form of audio data, feedback can be output to the display back so that the user can listen to the pre-played media content audibly and visually view the newly requested feedback. Or vice versa, if the media content that is being played is in the form of video data, the feedback may be output as a talkback to allow the user to visually watch the media content that is currently playing and audibly listen to the newly requested feedback. To this end, the smart device 1000 may determine a feedback type in consideration of the type of media content that is being played from the received feedback data. Alternatively, the smart device 1000 may transmit the form of the media content that is being played to the voice assistant server 10 to allow the voice assistant server to determine the feedback type in consideration of this, and then receive feedback data accordingly.

이상에서 설명한 본 발명의 실시예에 따른 방법들은 단독으로 또는 서로 조합되어 이용될 수 있다. 또 각 방법에서 설명된 각 단계들은 모두 필수적인 것은 아니므로 각 방법들이 그 단계들을 전부 포함하는 것은 물론 일부만 포함하여 수행되는 것도 가능하다. 또 각 단계들이 설명된 순서는 설명의 편의를 위한 것에 불과하므로, 상술한 방법들에서 각 단계들이 반드시 설명된 순서대로 진행되어야 하는 것은 아니다.The methods according to the embodiments of the present invention described above may be used alone or in combination with each other. In addition, since each step described in each method is not essential, it is possible that each method includes all the steps as well as a part. In addition, since the order in which the steps are described is only for convenience of description, the steps in the above methods are not necessarily performed in the order described.

또한 상술한 실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 여기서, 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 또 컴퓨터 판독 가능 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.In addition, the method according to the above-described embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. Here, the computer-readable medium may include program instructions, data files, data structures, or the like alone or in combination. In addition, the program instructions recorded on the computer-readable medium may be specially designed and configured for an embodiment or may be known and available to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs, DVDs, and magnetic media such as floptical disks. -Hardware devices specially configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language code that can be executed by a computer using an interpreter, etc., as well as machine language codes produced by a compiler. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 이상에서 설명한 본 발명의 실시예들은 서로 별개로 또는 조합되어 구현되는 것도 가능하다.The above description is merely illustrative of the technical idea of the present invention, and those skilled in the art to which the present invention pertains may make various modifications and variations without departing from the essential characteristics of the present invention. Therefore, the embodiments of the present invention described above may be implemented separately or in combination with each other.

따라서, 본 발명에 개시된 실시 예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시 예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.Therefore, the embodiments disclosed in the present invention are not intended to limit the technical spirit of the present invention, but to explain, and the scope of the technical spirit of the present invention is not limited by these embodiments. The scope of protection of the present invention should be interpreted by the claims below, and all technical spirits within the scope equivalent thereto should be interpreted as being included in the scope of the present invention.

10: 보이스 어시스턴트 서버
1000: 스마트 디바이스
1020: 통신 모듈
1040: 메모리
1060: 콘트롤러
1100: 하우징
1106: 인디케이터
1120: 하부 프레임
1140: 상부 프레임
1160: 투명 윈도우
1200: 음성 입력 모듈
1202: 마이크
1204: 마이크 어레이
1300: 음성 출력 모듈
1302: 스피커
1400: 디스플레이 모듈
1420: 디스플레이 패널
1440: 프로젝터
1500: 구동 모듈
1520: 회전 모터
1540: 회전 플레이트
1560: 방향 감지 센서
1600: 전원 모듈
1700: 사용자 위치 감지 모듈
1720: (모노/스테레오) 카메라
1800: 방열 모듈10: Voice Assistant server
1000: smart device
1020: communication module
1040: memory
1060: controller
1100: housing
1106: indicator
1120: lower frame
1140: upper frame
1160: transparent window
1200: voice input module
1202: microphone
1204: microphone array
1300: audio output module
1302: speaker
1400: display module
1420: display panel
1440: projector
1500: drive module
1520: rotary motor
1540: rotating plate
1560: Direction sensor
1600: power module
1700: User position detection module
1720: (mono / stereo) camera
1800: heat dissipation module

Claims

A wake-up word detection state for determining whether a sound signal received from the outside includes a wake-up word, and a listening state for recognizing voice commands included in the sound signal ( A media content control method performed by a smart device having an operating state including a listening state,
Play the media content related to the first voice command input by the user, output audio data of the media content through the audio output module of the smart device, and video data of the media content through the video output module of the smart device Playing the media content by displaying;
Entering the wakeup word detection state while maintaining the playback of the media content after starting the playback of the media content;
Determining whether the wakeup word is included in a first sound signal received during playback of the media content by the smart device in the wakeup word detection state;
If the wakeup word is included in the first sound signal, entering the listening state while maintaining playback of the media content;
When entering the listening state, performing a first operation related to the media content;
Outputting a talk-back related to a second voice command included in the received second sound signal after entering the listening state; and
And after the output of the talk-back ends, performing a second operation related to the media content.
The performing of the first operation may include adjusting the volume of the audio data by reducing or removing the volume of the audio data, and text data corresponding to the volume-adjusted audio data of the media content. Displaying with the media content's video data,
The step of performing the second operation includes adjusting the volume of the audio data of the media content before performing the first operation and ending the display of the text data.
Control method of smart devices.

delete

According to claim 1,
The outputting of the talk-back includes outputting feedback related to the second voice command.
Control method of smart devices.

delete

According to claim 1,
The first text data displayed among the text data includes text data corresponding to audio data at a time when the volume is adjusted and text data corresponding to audio data before a predetermined time from a time when the volume is adjusted.
Control method of smart devices.

According to claim 1,
The text data that is displayed last among the text data includes text data corresponding to audio data at the time when the volume returns and text data corresponding to audio data after a predetermined time from the time when the volume returns.
Control method of smart devices.

A wake-up word detection state for determining whether a sound signal received from the outside includes a wake-up word, and a listening state for recognizing voice commands included in the sound signal ( A media content control method performed by a smart device having an operating state including a listening state,
Play the media content related to the first voice command input by the user, output audio data of the media content through the audio output module of the smart device, and video data of the media content through the video output module of the smart device Playing the media content by displaying;
Entering the wakeup word detection state while maintaining the playback of the media content after starting the playback of the media content;
Determining whether the wakeup word is included in a first sound signal received during playback of the media content by the smart device in the wakeup word detection state;
If the wakeup word is included in the first sound signal, entering the listening state while maintaining playback of the media content; And
When entering the listening state, depending on the type of the media content, performing the operation of any one of the pause of the playback of the media content or volume control of the audio data of the media content;
If the media content is real-time streaming content, adjust the volume of the audio data of the media content,
And if the media content is not real-time content, pausing the media content or performing an operation of reducing the volume of audio data of the media content.
Control method of smart devices.

delete

The method of claim 7,
And when adjusting the volume of the audio data, displaying text data corresponding to the volume-adjusted audio data of the media content together with video data of the media content.
Control method of smart devices.

A wake-up word detection state for determining whether a sound signal received from the outside includes a wake-up word, and a listening state for recognizing voice commands included in the sound signal ( Listening state) as a smart device having an operating state,
A voice input module for receiving a sound signal;
A voice output module for outputting voice;
An image output module for displaying an image; And
Play the media content related to the first voice command input by the user, output audio data of the media content through the audio output module of the smart device, and video data of the media content through the video output module of the smart device Play the media content by displaying, enter the wake-up word detection state while maintaining the playback of the media content after starting the playback of the media content, and the smart device in the wake-up word detection state displays the media It is determined whether the wake-up word is included in the first sound signal received through the voice input module during playback of the content, and if the wake-up word is included in the first sound signal, the media content is maintained. While listening When entering the state and entering the listening state, a first operation related to the media content is performed, and after entering the listening state, a talk-back related to a second voice command included in the received second sound signal And a controller that outputs and performs a second operation related to the media content after the output of the talk-back ends.
The controller adjusts the volume of the audio data by reducing or removing the volume of the audio data, and displays text data corresponding to the volume adjusted audio data of the media content together with video data of the media content. By performing the first operation,
The controller performs the second operation by adjusting the volume of the audio data of the media content before performing the first operation and ending the display of the text data,
Smart device.

delete

The method of claim 10,
When outputting the talk-back, the controller outputs feedback related to the second voice command.
Smart device.

delete

The method of claim 10,
The first text data displayed among the text data includes text data corresponding to audio data at a time when the volume is adjusted and text data corresponding to audio data before a predetermined time from a time when the volume is adjusted.
Smart device.

The method of claim 10,
The text data that is displayed last among the text data includes text data corresponding to audio data at the time when the volume returns and text data corresponding to audio data after a predetermined time from the time when the volume returns.
Smart device.