KR20250029622A

KR20250029622A - Apparatus and method for predicting uncertainty of deep neural network

Info

Publication number: KR20250029622A
Application number: KR1020230110754A
Authority: KR
Inventors: 김성태; 김정욱; 김민국; 조은기
Original assignee: 경희대학교 산학협력단
Priority date: 2023-08-23
Filing date: 2023-08-23
Publication date: 2025-03-05

Abstract

Disclosed are an apparatus and a method for predicting uncertainty of a deep neural network. According to an embodiment of the present disclosure, an apparatus for predicting uncertainty of a deep neural network comprises: a first machine learning model trained to perform a preset task based on input data; and a second machine learning model trained to predict uncertainty with respect to a task performance result of the first machine learning model.

Description

{Apparatus and method for predicting uncertainty of deep neural network}

본 발명의 실시예는 심층 신경망의 불확실성 예측 기술과 관련된다. An embodiment of the present invention relates to a technique for predicting uncertainty of a deep neural network.

뉴럴 네트워크의 불확실성 예측을 목적으로 하는 불확실성 예측(Uncertainty Estimation) 기술은 이미지 인식, 의미적 분할, 비디오 객체 분할 등 컴퓨터 비전 분야에서 활발히 연구되고 있다. 불확실성 예측 기술은 특히 뉴럴 네트워크의 결정이 매우 치명적인 영향을 끼칠 수 있는 분야(예를 들어, 의료 수술 비디오 및 자율주행 비디오 등)에서 중요성이 대두되고 있다. Uncertainty estimation technology, which aims to predict uncertainty in neural networks, is being actively studied in computer vision fields such as image recognition, semantic segmentation, and video object segmentation. Uncertainty estimation technology is gaining importance, especially in fields where decisions made by neural networks can have a critical impact (e.g., medical surgery videos and autonomous driving videos).

현재 뉴럴 네트워크의 발전으로, 뉴럴 네트워크의 예측 정확도가 매우 높은 수준으로 향상되었으나, 여전히 뉴럴 네트워크의 블랙박스(black-box) 특징으로 인해 뉴럴 네트워크의 의사 결정 과정이 완벽히 설명되지 않으며, 따라서 그 최종 의사 결정을 완벽히 신뢰할 수는 없다. 이러한 뉴럴 네트워크의 의사 결정의 불확실성을 탐지할 수 있다면, 뉴럴 네트워크의 예측이 불확실하거나 확신도가 낮은 경우 최종 사용자와의 상호 작용을 통해 상황을 개선시킬 수 있다. With the development of neural networks, the prediction accuracy of neural networks has been greatly improved, but due to the black-box nature of neural networks, the decision-making process of neural networks is not completely explained, and therefore the final decision cannot be completely trusted. If we can detect the uncertainty of the decision-making of these neural networks, we can improve the situation through interaction with the end user when the prediction of the neural network is uncertain or has low confidence.

한국등록특허공보 제10-2200212호(2021.01.08)Korean Patent Publication No. 10-2200212 (2021.01.08)

본 발명의 실시예는 심층 신경망의 불확실성 예측을 위한 새로운 기법을 제공하기 위한 것이다.An embodiment of the present invention provides a novel technique for uncertainty prediction of a deep neural network.

개시되는 일 실시예에 따른 심층 신경망의 불확실성 예측 장치는, 입력 데이터를 기반으로 기 설정된 태스크를 수행하도록 학습된 제1 기계 학습 모델; 및 상기 제1 기계 학습 모델의 태스크 수행 결과에 대한 불확실성을 예측하도록 학습되는 제2 기계 학습 모델을 포함한다.An uncertainty prediction device of a deep neural network according to one embodiment of the present disclosure includes: a first machine learning model learned to perform a preset task based on input data; and a second machine learning model learned to predict uncertainty regarding a task performance result of the first machine learning model.

상기 제1 기계 학습 모델은, 입력되는 비디오에서 특징을 추출하도록 마련되는 인코더 및 상기 추출된 특징을 기초로 상기 비디오 내 각 객체들을 분할한 객체 분할 지도를 생성하도록 마련되는 디코더를 포함하며, 상기 제2 기계 학습 모델은, 상기 디코더와 동일한 심층 신경망 구조로 이루어질 수 있다.The first machine learning model may include an encoder configured to extract features from an input video and a decoder configured to generate an object segmentation map that segments each object in the video based on the extracted features, and the second machine learning model may be formed of the same deep neural network structure as the decoder.

상기 불확실성 예측 장치는, 상기 제1 기계 학습 모델의 학습이 완료된 상태에서 상기 제2 기계 학습 모델을 학습하되, 학습 비디오를 생성한 후, 생성한 상기 학습 비디오를 상기 제1 기계 학습 모델의 인코더로 입력하여 특징을 추출하도록 하고, 추출된 상기 특징을 각각 상기 제1 기계 학습 모델의 디코더 및 상기 제2 기계 학습 모델로 입력할 수 있다.The above uncertainty prediction device learns the second machine learning model after the learning of the first machine learning model is completed, generates a learning video, and then inputs the generated learning video into an encoder of the first machine learning model to extract features, and inputs the extracted features into a decoder of the first machine learning model and the second machine learning model, respectively.

상기 학습 비디오는, 원래의 비디오에 노이즈가 추가되어 상기 기 학습된 제1 기계 학습 모델에서 객체 분할 예측이 틀리도록 할 수 있다.The above training video may cause the object segmentation prediction to be incorrect in the first machine learning model trained above due to noise added to the original video.

상기 불확실성 예측 장치는, 상기 원래의 비디오에서 객체 영역 부근에 노이즈에 해당하는 공격 조각을 추가하여 상기 학습 비디오를 생성할 수 있다.The above uncertainty prediction device can generate the training video by adding an attack fragment corresponding to noise near an object area in the original video.

상기 디코더는, 상기 추출된 특징에 기초하여 상기 학습 비디오에서 각 객체들을 분할한 객체 분할 지도를 생성하고, 상기 제2 기계 학습 모델은, 상기 추출된 특징에 기초하여 상기 객체 분할 지도에서 틀린 픽셀들이 어느 부분인지를 나타내는 불확실성 예측 지도를 생성할 수 있다.The decoder can generate an object segmentation map that segments each object in the training video based on the extracted features, and the second machine learning model can generate an uncertainty prediction map that indicates which part of the object segmentation map contains incorrect pixels based on the extracted features.

상기 불확실성 예측 장치는, 상기 디코더에서 생성한 객체 분할 지도와 해당 학습 비디오에 대한 객체 분할 정답 지도를 비교하여 상기 디코더의 예측이 틀린 픽셀들로 구성된 객체 분할 지도를 상기 제2 기계 학습 모델의 정답 지도로 설정할 수 있다.The above uncertainty prediction device can compare the object segmentation map generated by the decoder with the object segmentation correct map for the corresponding training video, and set the object segmentation map composed of pixels for which the decoder's prediction is incorrect as the correct map of the second machine learning model.

상기 불확실성 예측 장치는, 상기 제2 기계 학습 모델에서 생성하는 불확실성 예측 지도가 상기 정답 지도를 닮아가도록 상기 제2 기계 학습 모델을 학습할 수 있다.The above uncertainty prediction device can train the second machine learning model so that the uncertainty prediction map generated by the second machine learning model resembles the correct answer map.

상기 제2 기계 학습 모델의 손실 함수(L_Obs)는 아래 수학식으로 표현될 수 있다.The loss function (L _Obs ) of the above second machine learning model can be expressed by the mathematical formula below.

(수학식)(mathematical formula)

y_Obs : 제2 기계 학습 모델의 정답 지도y _Obs : Correct answer map of the second machine learning model

x : 학습 비디오x: learning videos

Seg(x) : 학습 비디오가 입력되는 경우 제1 기계 학습 모델에서 예측한 객체 분할 지도Seg(x): Object segmentation map predicted by the first machine learning model when a training video is input.

Observer : 제2 기계 학습 모델에서 생성되는 불확실성 예측 지도Observer: Uncertainty prediction map generated from the second machine learning model

개시되는 일 실시예에 따른 심층 신경망의 불확실성 예측 방법은, 하나 이상의 프로세서들, 및 상기 하나 이상의 프로세서들에 의해 실행되는 하나 이상의 프로그램들을 저장하는 메모리를 구비한 컴퓨팅 장치에서 수행되는 방법으로서, 입력 데이터를 기반으로 기 설정된 태스크를 수행하도록 제1 기계 학습 모델을 학습시키는 단계; 및 상기 제1 기계 학습 모델의 태스크 수행 결과에 대한 불확실성을 예측하도록 제2 기계 학습 모델을 학습시키는 단계를 포함한다.A method for predicting uncertainty of a deep neural network according to one embodiment of the present disclosure is a method performed in a computing device having one or more processors and a memory storing one or more programs executed by the one or more processors, the method comprising: a step of training a first machine learning model to perform a preset task based on input data; and a step of training a second machine learning model to predict uncertainty of a task performance result of the first machine learning model.

개시되는 실시예에 의하면, 제1 기계 학습 모델의 객체 분할 결과의 불확실성을 예측하기 위한 제2 기계 학습 모델을 구성함으로써, 제1 기계 학습 모델에 대한 신뢰도를 제공할 수 있게 되고 그로 인해 심층 신경망 모델의 결정이 중요한 분야에서 그 활용도를 높일 수 있게 된다.According to the disclosed embodiment, by configuring a second machine learning model for predicting uncertainty of an object segmentation result of a first machine learning model, reliability of the first machine learning model can be provided, thereby increasing its usability in fields where the decision of a deep neural network model is important.

도 1은 본 발명의 일 실시예에 따른 심층 신경망의 불확실성 예측 장치를 나타낸 도면
도 2는 본 발명의 일 실시예에 따른 심층 신경망의 불확실성 예측 장치의 학습 과정을 나타낸 도면
도 3은 본 발명의 일 실시예에 따른 심층 신경망의 불확실성 예측 장치에서 적대적 학습 기반으로 학습 비디오를 생성하는 과정을 나타낸 도면
도 4는 본 발명의 일 실시예에 따른 심층 신경망의 불확실성 예측 방법을 나타낸 흐름도
도 5는 예시적인 실시예들에서 사용되기에 적합한 컴퓨팅 장치를 포함하는 컴퓨팅 환경을 예시하여 설명하기 위한 블록도FIG. 1 is a diagram showing an uncertainty prediction device of a deep neural network according to one embodiment of the present invention.
Figure 2 is a diagram showing the learning process of an uncertainty prediction device of a deep neural network according to one embodiment of the present invention.
FIG. 3 is a diagram showing a process of generating a learning video based on adversarial learning in an uncertainty prediction device of a deep neural network according to one embodiment of the present invention.
Figure 4 is a flow chart showing a method for predicting uncertainty of a deep neural network according to one embodiment of the present invention.
FIG. 5 is a block diagram illustrating a computing environment including a computing device suitable for use in exemplary embodiments.

이하, 도면을 참조하여 본 발명의 구체적인 실시형태를 설명하기로 한다. 이하의 상세한 설명은 본 명세서에서 기술된 방법, 장치 및/또는 시스템에 대한 포괄적인 이해를 돕기 위해 제공된다. 그러나 이는 예시에 불과하며 본 발명은 이에 제한되지 않는다.Hereinafter, specific embodiments of the present invention will be described with reference to the drawings. The following detailed description is provided to help a comprehensive understanding of the methods, devices, and/or systems described herein. However, this is only an example and the present invention is not limited thereto.

본 발명의 실시예들을 설명함에 있어서, 본 발명과 관련된 공지기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하기로 한다. 그리고, 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다. 상세한 설명에서 사용되는 용어는 단지 본 발명의 실시예들을 기술하기 위한 것이며, 결코 제한적이어서는 안 된다. 명확하게 달리 사용되지 않는 한, 단수 형태의 표현은 복수 형태의 의미를 포함한다. 본 설명에서, "포함" 또는 "구비"와 같은 표현은 어떤 특성들, 숫자들, 단계들, 동작들, 요소들, 이들의 일부 또는 조합을 가리키기 위한 것이며, 기술된 것 이외에 하나 또는 그 이상의 다른 특성, 숫자, 단계, 동작, 요소, 이들의 일부 또는 조합의 존재 또는 가능성을 배제하도록 해석되어서는 안 된다.In describing embodiments of the present invention, if it is judged that a detailed description of a known technology related to the present invention may unnecessarily obscure the gist of the present invention, the detailed description will be omitted. In addition, the terms described below are terms defined in consideration of their functions in the present invention, and may vary depending on the intention or custom of the user or operator. Therefore, the definitions should be made based on the contents throughout this specification. The terms used in the detailed description are only for describing embodiments of the present invention, and should never be limited. Unless clearly used otherwise, the singular form includes the plural form. In this description, expressions such as "comprises" or "comprising" are intended to indicate certain features, numbers, steps, operations, elements, parts or combinations thereof, and should not be construed to exclude the presence or possibility of one or more other features, numbers, steps, operations, elements, parts or combinations thereof other than those described.

또한, 제1, 제2 등의 용어는 다양한 구성 요소들을 설명하는데 사용될 수 있지만, 상기 구성 요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성 요소를 다른 구성 요소로부터 구별하는 목적으로 사용될 수 있다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성 요소는 제2 구성 요소로 명명될 수 있고, 유사하게 제2 구성 요소도 제1 구성 요소로 명명될 수 있다.Also, while the terms first, second, etc. may be used to describe various components, the components should not be limited by the terms. The terms may be used to distinguish one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as the second component, and similarly, the second component may also be referred to as the first component.

도 1은 본 발명의 일 실시예에 따른 심층 신경망의 불확실성 예측 장치를 나타낸 도면이다. FIG. 1 is a diagram illustrating an uncertainty prediction device of a deep neural network according to one embodiment of the present invention.

도 1을 참조하면, 심층 신경망의 불확실성 예측 장치(100)는 제1 기계 학습 모델(102) 및 제2 기계 학습 모델(104)을 포함할 수 있다. Referring to FIG. 1, the uncertainty prediction device (100) of a deep neural network may include a first machine learning model (102) and a second machine learning model (104).

제1 기계 학습 모델(102)은 주어지는 입력 데이터를 기반으로 기 설정된 태스크를 수행하도록 학습된 모델이다. 일 실시예에서, 제1 기계 학습 모델(102)은 입력되는 비디오를 기반으로 비디오 내의 객체 분할(object segmentation)을 수행하도록 학습된 모델(비디오 객체 분할 모델)일 수 있다. 즉, 제1 기계 학습 모델(102)은 비디오가 입력되는 경우, 입력된 비디오에서 객체를 검출하여 분할하도록 기 학습된 모델일 수 있다. The first machine learning model (102) is a model learned to perform a preset task based on given input data. In one embodiment, the first machine learning model (102) may be a model learned to perform object segmentation within a video based on an input video (a video object segmentation model). That is, the first machine learning model (102) may be a model learned to detect and segment objects in an input video when the video is input.

제1 기계 학습 모델(102)은 인코더(102a) 및 디코더(102b)를 포함할 수 있다. 인코더(102a)는 입력되는 비디오에서 특징을 추출할 수 있다. 디코더(102b)는 인코더(102a)로부터 출력되는 특징에 기초하여 비디오 내의 객체 부분을 예측할 수 있다. 디코더(102b)는 인코더(102a)로부터 출력되는 특징에 기초하여 비디오 내의 각 객체들을 분할한 객체 분할 지도를 생성할 수 있다. 여기서, 제1 기계 학습 모델(102)은 비디오의 이전 프레임들에서 중요한 정보들을 저장하는 메모리를 더 포함할 수 있다.The first machine learning model (102) may include an encoder (102a) and a decoder (102b). The encoder (102a) may extract features from an input video. The decoder (102b) may predict an object portion within the video based on the features output from the encoder (102a). The decoder (102b) may generate an object segmentation map that segments each object within the video based on the features output from the encoder (102a). Here, the first machine learning model (102) may further include a memory that stores important information from previous frames of the video.

제2 기계 학습 모델(104)은 제1 기계 학습 모델(102)의 불확실성을 예측하기 위한 모델이다. 즉, 제2 기계 학습 모델(104)은 제1 기계 학습 모델(102)의 태스크 수행 결과가 어느 정도 불확실성을 나타내는지를 예측하기 위한 모델일 수 있다. The second machine learning model (104) is a model for predicting the uncertainty of the first machine learning model (102). That is, the second machine learning model (104) may be a model for predicting the degree of uncertainty exhibited by the task performance result of the first machine learning model (102).

일 실시예에서, 제1 기계 학습 모델(102)이 비디오 객체 분할 모델인 경우, 제2 기계 학습 모델(104)은 제1 기계 학습 모델(102)에서 예측한 객체 분할의 결과가 어느 정도 불확실성을 갖는지(또는 어느 정도 신뢰할 만한지)를 예측하기 위한 모델일 수 있다. In one embodiment, when the first machine learning model (102) is a video object segmentation model, the second machine learning model (104) may be a model for predicting how uncertain (or reliable) the result of object segmentation predicted by the first machine learning model (102) is.

여기서, 제2 기계 학습 모델(104)은 제1 기계 학습 모델(102)의 디코더(102b) 구조를 모방하여 생성될 수 있다. 즉, 제2 기계 학습 모델(104)은 제1 기계 학습 모델(102)의 인코더(102a)는 공유하여 사용하고, 제1 기계 학습 모델(102)의 디코더(102b)와 동일한 구조의 심층 신경망으로 구성될 수 있다.Here, the second machine learning model (104) can be generated by imitating the structure of the decoder (102b) of the first machine learning model (102). That is, the second machine learning model (104) can share and use the encoder (102a) of the first machine learning model (102) and be configured with a deep neural network having the same structure as the decoder (102b) of the first machine learning model (102).

제2 기계 학습 모델(104)은 제1 기계 학습 모델(102)의 인코더(102a)에서 출력되는 특징을 입력(즉, 제1 기계 학습 모델(102)의 인코더(102a)를 공유함)으로 하여 제1 기계 학습 모델(102)의 불확실성을 예측하기 때문에, 불확실성 예측에 따른 시간 비용을 줄일 수 있게 된다. Since the second machine learning model (104) predicts the uncertainty of the first machine learning model (102) by using the features output from the encoder (102a) of the first machine learning model (102) as input (i.e., sharing the encoder (102a) of the first machine learning model (102), the time cost for uncertainty prediction can be reduced.

제2 기계 학습 모델(104)은 제1 기계 학습 모델(102)의 디코더(102b)와 스킵 커넥션(skip-connection) 되어 마련될 수 있다. 이로써, 제2 기계 학습 모델(104)이 제1 기계 학습 모델(102)의 의사 결정 과정을 관찰하도록 할 수 있다. 이때, 제2 기계 학습 모델(104)은 제1 기계 학습 모델(102)의 객체 분할 결과가 틀리는 상황을 관찰하도록 학습될 수 있다. The second machine learning model (104) can be prepared by skip-connecting with the decoder (102b) of the first machine learning model (102). As a result, the second machine learning model (104) can observe the decision-making process of the first machine learning model (102). At this time, the second machine learning model (104) can be trained to observe a situation in which the object segmentation result of the first machine learning model (102) is incorrect.

즉, 제2 기계 학습 모델(104)은 입력되는 비디오에서 객체 분할을 할 때 제1 기계 학습 모델(102)이 틀리게 예측한 픽셀(예를 들어, 객체에 해당하지 않는 픽셀을 객체에 해당하는 픽셀로 예측하거나 객체에 해당하는 픽셀을 객체에 해당하지 않는 픽셀로 예측하는 경우)들을 정답으로 사용하여 학습될 수 있다. 이하, 도 2를 참조하여 심층 신경망의 불확실성 예측 장치(100)의 학습 과정을 살펴보기로 한다. That is, the second machine learning model (104) can be trained by using pixels that the first machine learning model (102) incorrectly predicted (for example, when predicting a pixel that does not correspond to an object as a pixel that corresponds to an object or predicting a pixel that corresponds to an object as a pixel that does not correspond to an object) as correct answers when segmenting objects in an input video. Hereinafter, the learning process of the uncertainty prediction device (100) of the deep neural network will be examined with reference to FIG. 2.

도 2는 본 발명의 일 실시예에 따른 심층 신경망의 불확실성 예측 장치의 학습 과정을 나타낸 도면이다. FIG. 2 is a diagram illustrating a learning process of an uncertainty prediction device of a deep neural network according to one embodiment of the present invention.

도 2를 참조하면, 제2 기계 학습 모델(104)의 학습 과정은 제1 기계 학습 모델(102)의 학습이 완료된 상태에서 수행될 수 있다. 먼저, 불확실성 예측 장치(100)는 적대적 학습 기반으로 학습 데이터를 생성할 수 있다. 제1 기계 학습 모델(102)이 비디오 객체 분할 모델인 경우, 학습 데이터는 비디오 형태일 수 있다. 학습 비디오는 원래의 비디오에 사람 눈에 보이지 않는(invisible) 노이즈가 추가된 것일 수 있다. Referring to FIG. 2, the learning process of the second machine learning model (104) can be performed in a state where the learning of the first machine learning model (102) is completed. First, the uncertainty prediction device (100) can generate learning data based on adversarial learning. If the first machine learning model (102) is a video object segmentation model, the learning data can be in the form of a video. The learning video can be an original video with noise invisible to the human eye added to it.

이에 따라, 학습 비디오는 제1 기계 학습 모델(102)에서 객체 분할이 잘 되지 못하도록 하는 역할을 하게 된다. 즉, 제1 기계 학습 모델(102)은 입력되는 비디오에서 객체들을 잘 분할하도록 학습된 상태이기 때문에, 제1 기계 학습 모델(102)에서 객체 분할에 대한 예측이 잘 안되도록 하기 위해 적대적 학습 기반의 학습 비디오를 생성하여 기 학습된 제1 기계 학습 모델(102)로 입력하게 된다. 제1 기계 학습 모델(102)의 입장에서 학습 비디오는 적대적 공격 샘플이 되게 된다. 학습 비디오를 생성하는 과정에 대한 자세한 설명은 후술하기로 한다. Accordingly, the training video plays a role in preventing the first machine learning model (102) from performing object segmentation well. That is, since the first machine learning model (102) is trained to segment objects well in the input video, an adversarial learning-based training video is generated and input to the first machine learning model (102) that has been trained in order to prevent the first machine learning model (102) from making predictions about object segmentation well. From the perspective of the first machine learning model (102), the training video becomes an adversarial attack sample. A detailed description of the process of generating the training video will be described later.

학습 비디오가 입력되는 경우, 제1 기계 학습 모델(102)의 인코더(102a)는 학습 비디오에서 특징을 추출할 수 있다. 추출된 특징은 각각 제1 기계 학습 모델(102)의 디코더(102b)와 제2 기계 학습 모델(104)로 입력되게 된다. When a learning video is input, the encoder (102a) of the first machine learning model (102) can extract features from the learning video. The extracted features are input to the decoder (102b) of the first machine learning model (102) and the second machine learning model (104), respectively.

제1 기계 학습 모델(102)의 디코더(102b)는 추출된 특징에 기반하여 학습 비디오에서 각 객체들을 분할한 객체 분할 지도를 생성할 수 있다. 불확실성 예측 장치(100)는 디코더(102b)에서 예측하는 객체 분할 지도와 해당 학습 비디오에 대한 객체 분할 정답 지도를 비교하여 디코더(102b)의 예측이 틀린 픽셀과 디코더(102b)의 예측이 맞춘 픽셀들을 구분할 수 있다. The decoder (102b) of the first machine learning model (102) can generate an object segmentation map that segments each object in the learning video based on the extracted features. The uncertainty prediction device (100) can compare the object segmentation map predicted by the decoder (102b) with the object segmentation correct map for the corresponding learning video, and distinguish pixels for which the decoder (102b) predicted incorrectly from pixels for which the decoder (102b) predicted correctly.

일 실시예에서, 불확실성 예측 장치(100)는 디코더(102b)에서 예측한 객체 분할 지도와 객체 분할 정답 지도를 비교하여 디코더(102b)의 예측이 틀린 픽셀은 1로 설정하고, 디코더(102b)의 예측이 맞는 픽셀은 0으로 설정할 수 있다. 이때, 불확실성 예측 장치(100)는 디코더(102b)의 예측이 틀린 픽셀들로 구성된 객체 분할 지도를 제2 기계 학습 모델(104)의 정답 지도로 설정할 수 있다. In one embodiment, the uncertainty prediction device (100) may compare the object segmentation map predicted by the decoder (102b) with the object segmentation correct map, and set pixels for which the decoder's (102b) prediction is incorrect to 1, and set pixels for which the decoder's (102b) prediction is correct to 0. At this time, the uncertainty prediction device (100) may set the object segmentation map composed of pixels for which the decoder's (102b) prediction is incorrect as the correct map of the second machine learning model (104).

한편, 제1 기계 학습 모델(102)의 인코더(102a)에서 추출된 학습 비디오의 특징은 제2 기계 학습 모델(104)로도 입력된다. 제2 기계 학습 모델(104)은 추출된 특징에 기초하여 불확실성 예측 지도를 생성할 수 있다. 여기서, 불확실성 예측 지도는 제1 기계 학습 모델(102)의 디코더(102b)에서 예측한 객체 분할 지도에서 틀린 픽셀들이 어느 부분인지를 나타내는 것일 수 있다. Meanwhile, the features of the learning video extracted from the encoder (102a) of the first machine learning model (102) are also input to the second machine learning model (104). The second machine learning model (104) can generate an uncertainty prediction map based on the extracted features. Here, the uncertainty prediction map can indicate which part of the object segmentation map predicted by the decoder (102b) of the first machine learning model (102) contains incorrect pixels.

즉, 제2 기계 학습 모델(104)은 제1 기계 학습 모델(102)의 디코더(102b)에서 예측한 객체 분할 지도에서 어느 부분이 틀린 픽셀인지를 예측하도록 학습될 수 있으며, 이를 위해 인코더(102a)에서 추출된 특징을 기초로 불확실성 예측 지도를 생성하게 된다. 이에 따라, 디코더(102b)의 예측이 틀린 픽셀들로 구성된 객체 분할 지도가 제2 기계 학습 모델(104)의 정답 지도로 설정되게 된다. That is, the second machine learning model (104) can be trained to predict which part of the object segmentation map predicted by the decoder (102b) of the first machine learning model (102) is an incorrect pixel, and for this purpose, an uncertainty prediction map is generated based on the features extracted from the encoder (102a). Accordingly, the object segmentation map composed of pixels that were incorrectly predicted by the decoder (102b) is set as the correct answer map of the second machine learning model (104).

이때, 불확실성 예측 장치(100)는 제2 기계 학습 모델(104)에서 출력하는 불확실성 예측 지도가 정답 지도(즉, 디코더(102b)의 예측이 틀린 픽셀들로 구성된 객체 분할 지도)를 닮아가도록(그 차이가 최소화 되도록) 제2 기계 학습 모델(104)을 학습할 수 있다. 제2 기계 학습 모델(104)의 손실 함수(L_Obs)는 아래 수학식 1로 나타낼 수 있다. At this time, the uncertainty prediction device (100) can train the second machine learning model (104) so that the uncertainty prediction map output from the second machine learning model (104) resembles the correct map (i.e., the object segmentation map composed of pixels for which the decoder (102b) made a wrong prediction) (so that the difference is minimized). The loss function (L _Obs ) of the second machine learning model (104) can be expressed by the following mathematical expression 1.

(수학식 1)(Mathematical formula 1)

x : 학습 비디오x: learning videos

이러한 과정으로 제2 기계 학습 모델(104)을 학습하게 되면, 학습이 완료된 제2 기계 학습 모델(104)은 소정의 비디오가 제1 기계 학습 모델(102)로 입력되어 제1 기계 학습 모델(102)에서 객체 분할 지도를 생성할 때, 제1 기계 학습 모델(102)의 객체 분할 지도에서 각 픽셀들에 대한 불확실성(즉, 각 픽셀 중 틀리게 예측될 수 있을 확률)을 예측할 수 있게 된다.When the second machine learning model (104) is trained through this process, the second machine learning model (104) that has completed training can predict the uncertainty (i.e., the probability that each pixel can be predicted incorrectly) for each pixel in the object segmentation map of the first machine learning model (102) when a given video is input to the first machine learning model (102) and an object segmentation map is generated in the first machine learning model (102).

도 3은 본 발명의 일 실시예에 따른 심층 신경망의 불확실성 예측 장치에서 적대적 학습 기반으로 학습 비디오를 생성하는 과정을 나타낸 도면이다. FIG. 3 is a diagram illustrating a process of generating a learning video based on adversarial learning in an uncertainty prediction device of a deep neural network according to one embodiment of the present invention.

도 3을 참조하면, 불확실성 예측 장치(100)는 학습 데이터가 되는 비디오(원래 비디오, original video)에서 객체에 해당하는 영역 부근에 노이즈에 해당하는 공격 조각을 추가하여 학습 비디오를 생성할 수 있다. 즉, 불확실성 예측 장치(100)는 기 학습된 제1 기계 학습 모델(102)이 학습 비디오의 객체 부근에서 객체 분할 예측이 틀리도록 객체에 해당하는 영역 부근에 공격 조각을 추가하여 학습 비디오를 생성할 수 있다. Referring to FIG. 3, the uncertainty prediction device (100) can generate a learning video by adding attack pieces corresponding to noise near an area corresponding to an object in a video (original video) that is learning data. That is, the uncertainty prediction device (100) can generate a learning video by adding attack pieces near an area corresponding to an object so that the first machine learning model (102) that has been previously learned makes an error in object segmentation prediction near the object in the learning video.

이때, 불확실성 예측 장치(100)는 원래 비디오에 공격 조각이 포함된 마스크를 합성하여 학습 비디오를 생성할 수 있다. 마스크는 공격 조각에 해당하는 픽셀의 값은 1이고, 공격 조각에 해당하지 않는 픽셀의 값은 0을 가질 수 있다. 여기서, 공격 조각의 크기는 원래 비디오 내 객체의 개수와 각 객체의 크기에 기초하여 결정될 수 있다. 이와 같이, 원래 비디오에서 객체에 해당하는 영역 부근에 공격 조각을 추가함으로써, 지역적인 적대적 공격(Local Adversarial Attack)을 진행하게 된다.At this time, the uncertainty prediction device (100) can generate a learning video by synthesizing a mask including an attack fragment to the original video. The mask can have a value of 1 for pixels corresponding to the attack fragment and a value of 0 for pixels not corresponding to the attack fragment. Here, the size of the attack fragment can be determined based on the number of objects in the original video and the size of each object. In this way, a local adversarial attack is performed by adding an attack fragment near an area corresponding to an object in the original video.

불확실성 예측 장치(100)는 원래 비디오에 공격 조각이 포함된 마스크를 합성하여 학습 비디오를 생성한 후, 학습 비디오를 제1 기계 학습 모델(102)에 입력할 수 있다. 이때, 불확실성 예측 장치(100)는 제1 기계 학습 모델(102)의 손실이 최대가 되도록 학습 비디오를 생성할 수 있다. The uncertainty prediction device (100) can generate a learning video by synthesizing a mask including an attack fragment into an original video, and then input the learning video into the first machine learning model (102). At this time, the uncertainty prediction device (100) can generate the learning video so that the loss of the first machine learning model (102) is maximized.

즉, 불확실성 예측 장치(100)는 원래 비디오에서 어떤 부분에 노이즈에 해당하는 공격 조각을 추가해야 제1 기계 학습 모델(102)에서 객체 분할 예측이 틀리도록 할 수 있는지를 고려하여 학습 비디오를 생성할 수 있다. 불확실성 예측 장치(100)는 노이즈가 추가된 비디오 프레임을 제1 기계 학습 모델(102)에 통과시켜 노이즈가 추가된 비디오 프레임에 대한 기울기를 찾을 수 있다. 이때, 왜곡의 양을 적게 만들기 위해 실수 값을 곱해준 뒤, 공격 조각이 포함된 마스크를 곱하여 제1 기계 학습 모델(102)의 손실이 최대화 되는 섭동(perturbation)을 생성할 수 있다. 불확실성 예측 장치(100)는 하기 수학식 2를 통해 학습 비디오를 생성할 수 있다. That is, the uncertainty prediction device (100) can generate a training video by considering which part of the original video to add an attack fragment corresponding to noise to make the object segmentation prediction of the first machine learning model (102) incorrect. The uncertainty prediction device (100) can find a slope for the video frame to which noise has been added by passing the video frame to which noise has been added through the first machine learning model (102). At this time, in order to reduce the amount of distortion, a real number value is multiplied, and then a mask including the attack fragment is multiplied to generate a perturbation that maximizes the loss of the first machine learning model (102). The uncertainty prediction device (100) can generate a training video through the following mathematical expression 2.

(수학식 2)(Mathematical formula 2)

x : 원래 비디오x: original video

x_N : 노이즈가 추가된 비디오 프레임x _N : Video frame with added noise

Ω(x) : 공격 조각이 추가되는 영역Ω(x): Area where attack pieces are added

ε : 기 설정되는 실수 값ε: A real number that is set

도 4는 본 발명의 일 실시예에 따른 심층 신경망의 불확실성 예측 방법을 나타낸 흐름도이다. 도시된 흐름도에서는 상기 방법을 복수 개의 단계로 나누어 기재하였으나, 적어도 일부의 단계들은 순서를 바꾸어 수행되거나, 다른 단계와 결합되어 함께 수행되거나, 생략되거나, 세부 단계들로 나뉘어 수행되거나, 또는 도시되지 않은 하나 이상의 단계가 부가되어 수행될 수 있다.Fig. 4 is a flow chart illustrating a method for predicting uncertainty of a deep neural network according to one embodiment of the present invention. In the illustrated flow chart, the method is described by dividing it into a plurality of steps, but at least some of the steps may be performed in a different order, combined with other steps and performed together, omitted, divided into sub-steps, or performed by adding one or more steps that are not illustrated.

도 4를 참조하면, 불확실성 예측 장치(100)는 제1 기계 학습 모델(102)을 학습할 수 있다(S 101). 일 실시예에서, 불확실성 예측 장치(100)는 비디오를 제1 기계 학습 모델(102)에 입력하고, 입력된 비디오에서 각 객체를 분할한 객체 분할 지도를 생성하도록 제1 기계 학습 모델(102)을 학습시킬 수 있다. Referring to FIG. 4, the uncertainty prediction device (100) can learn the first machine learning model (102) (S 101). In one embodiment, the uncertainty prediction device (100) can input a video into the first machine learning model (102) and train the first machine learning model (102) to generate an object segmentation map that segments each object in the input video.

다음으로, 불확실성 예측 장치(100)는 제1 기계 학습 모델(102)의 불확실성을 예측하기 위한 제2 기계 학습 모델(104)을 생성할 수 있다(S 103). 일 실시예에서, 불확실성 예측 장치(100)는 제1 기계 학습 모델(102)의 디코더(102b)와 동일한 구조의 심층 신경망으로 제2 기계 학습 모델(104)을 생성할 수 있다. Next, the uncertainty prediction device (100) can generate a second machine learning model (104) for predicting the uncertainty of the first machine learning model (102) (S 103). In one embodiment, the uncertainty prediction device (100) can generate the second machine learning model (104) with a deep neural network having the same structure as the decoder (102b) of the first machine learning model (102).

다음으로, 불확실성 예측 장치(100)는 제2 기계 학습 모델(104)을 학습하기 위한 학습 비디오를 생성할 수 있다(S 105). 일 실시예에서, 불확실성 예측 장치(100)는 원래 비디오에서 객체에 해당하는 영역 부근에 노이즈에 해당하는 공격 조각을 추가하여 학습 비디오를 생성할 수 있다. 이때, 불확실성 예측 장치(100)는 제1 기계 학습 모델(102)의 손실이 최대가 되도록 학습 비디오를 생성할 수 있다. Next, the uncertainty prediction device (100) can generate a learning video for learning the second machine learning model (104) (S 105). In one embodiment, the uncertainty prediction device (100) can generate a learning video by adding an attack piece corresponding to noise near an area corresponding to an object in the original video. At this time, the uncertainty prediction device (100) can generate a learning video so that the loss of the first machine learning model (102) is maximized.

다음으로, 불확실성 예측 장치(100)는 학습 비디오를 이용하여 제2 기계 학습 모델(104)을 학습할 수 있다(S 107). 불확실성 예측 장치(100)는 학습 비디오를 이용하여 제1 기계 학습 모델(102)의 객체 분할 결과가 틀리는 상황을 관찰하도록 제2 기계 학습 모델(104)을 학습할 수 있다. Next, the uncertainty prediction device (100) can learn the second machine learning model (104) using the learning video (S 107). The uncertainty prediction device (100) can learn the second machine learning model (104) to observe a situation in which the object segmentation result of the first machine learning model (102) is incorrect using the learning video.

구체적으로, 불확실성 예측 장치(100)는 학습 비디오를 제1 기계 학습 모델(102)의 인코더(102a)로 입력하여 학습 비디오에서 특징을 추출할 수 있다. 이때, 추출된 특징은 각각 제1 기계 학습 모델(102)의 디코더(102b)와 제2 기계 학습 모델(104)로 입력될 수 있다. Specifically, the uncertainty prediction device (100) can input a learning video into an encoder (102a) of a first machine learning model (102) to extract features from the learning video. At this time, the extracted features can be input into a decoder (102b) of the first machine learning model (102) and a second machine learning model (104), respectively.

여기서, 제1 기계 학습 모델(102)의 디코더(102b)는 추출된 특징에 기반하여 학습 비디오에서 각 객체들을 분할한 객체 분할 지도를 생성할 수 있다. 불확실성 예측 장치(100)는 디코더(102b)에서 예측한 객체 분할 지도와 객체 분할 정답 지도를 비교하여 디코더(102b)의 예측이 틀린 픽셀들로 구성된 객체 분할 지도를 제2 기계 학습 모델(104)의 정답 지도로 설정할 수 있다. Here, the decoder (102b) of the first machine learning model (102) can generate an object segmentation map that segments each object in the learning video based on the extracted features. The uncertainty prediction device (100) can compare the object segmentation map predicted by the decoder (102b) with the object segmentation correct answer map and set the object segmentation map composed of pixels for which the decoder (102b) made an incorrect prediction as the correct answer map of the second machine learning model (104).

그리고, 제2 기계 학습 모델(104)은 인코더(102)에서 추출된 특징을 기초로 불확실성 예측 지도를 생성하는데, 불확실성 예측 장치(100)는 제2 기계 학습 모델(104)에서 출력하는 불확실성 예측 지도가 정답 지도(즉, 디코더(102b)의 예측이 틀린 픽셀들로 구성된 객체 분할 지도)를 닮아가도록 제2 기계 학습 모델(104)을 학습할 수 있다.And, the second machine learning model (104) generates an uncertainty prediction map based on the features extracted from the encoder (102), and the uncertainty prediction device (100) can train the second machine learning model (104) so that the uncertainty prediction map output from the second machine learning model (104) resembles the correct map (i.e., the object segmentation map composed of pixels for which the decoder (102b) made an incorrect prediction).

개시되는 실시예에 의하면, 제1 기계 학습 모델(102)의 객체 분할 결과의 불확실성을 예측하기 위한 제2 기계 학습 모델(104)을 구성함으로써, 제1 기계 학습 모델(102)에 대한 신뢰도를 제공할 수 있게 되고 그로 인해 심층 신경망 모델의 결정이 중요한 분야에서 그 활용도를 높일 수 있게 된다.According to the disclosed embodiment, by configuring a second machine learning model (104) for predicting the uncertainty of the object segmentation result of the first machine learning model (102), it is possible to provide reliability for the first machine learning model (102), thereby increasing its usability in fields where the decision of a deep neural network model is important.

도 5는 예시적인 실시예들에서 사용되기에 적합한 컴퓨팅 장치를 포함하는 컴퓨팅 환경(10)을 예시하여 설명하기 위한 블록도이다. 도시된 실시예에서, 각 컴포넌트들은 이하에 기술된 것 이외에 상이한 기능 및 능력을 가질 수 있고, 이하에 기술된 것 이외에도 추가적인 컴포넌트를 포함할 수 있다.FIG. 5 is a block diagram illustrating a computing environment (10) including a computing device suitable for use in exemplary embodiments. In the illustrated embodiment, each component may have different functions and capabilities other than those described below, and may include additional components other than those described below.

도시된 컴퓨팅 환경(10)은 컴퓨팅 장치(12)를 포함한다. 일 실시예에서, 컴퓨팅 장치(12)는 심층 신경망의 불확실성 예측 장치(100)일 수 있다.The illustrated computing environment (10) includes a computing device (12). In one embodiment, the computing device (12) may be an uncertainty prediction device (100) of a deep neural network.

컴퓨팅 장치(12)는 적어도 하나의 프로세서(14), 컴퓨터 판독 가능 저장 매체(16) 및 통신 버스(18)를 포함한다. 프로세서(14)는 컴퓨팅 장치(12)로 하여금 앞서 언급된 예시적인 실시예에 따라 동작하도록 할 수 있다. 예컨대, 프로세서(14)는 컴퓨터 판독 가능 저장 매체(16)에 저장된 하나 이상의 프로그램들을 실행할 수 있다. 상기 하나 이상의 프로그램들은 하나 이상의 컴퓨터 실행 가능 명령어를 포함할 수 있으며, 상기 컴퓨터 실행 가능 명령어는 프로세서(14)에 의해 실행되는 경우 컴퓨팅 장치(12)로 하여금 예시적인 실시예에 따른 동작들을 수행하도록 구성될 수 있다.A computing device (12) includes at least one processor (14), a computer-readable storage medium (16), and a communication bus (18). The processor (14) may cause the computing device (12) to operate in accordance with the exemplary embodiments described above. For example, the processor (14) may execute one or more programs stored in the computer-readable storage medium (16). The one or more programs may include one or more computer-executable instructions, which, when executed by the processor (14), may cause the computing device (12) to perform operations in accordance with the exemplary embodiments.

컴퓨터 판독 가능 저장 매체(16)는 컴퓨터 실행 가능 명령어 내지 프로그램 코드, 프로그램 데이터 및/또는 다른 적합한 형태의 정보를 저장하도록 구성된다. 컴퓨터 판독 가능 저장 매체(16)에 저장된 프로그램(20)은 프로세서(14)에 의해 실행 가능한 명령어의 집합을 포함한다. 일 실시예에서, 컴퓨터 판독 가능 저장 매체(16)는 메모리(랜덤 액세스 메모리와 같은 휘발성 메모리, 비휘발성 메모리, 또는 이들의 적절한 조합), 하나 이상의 자기 디스크 저장 디바이스들, 광학 디스크 저장 디바이스들, 플래시 메모리 디바이스들, 그 밖에 컴퓨팅 장치(12)에 의해 액세스되고 원하는 정보를 저장할 수 있는 다른 형태의 저장 매체, 또는 이들의 적합한 조합일 수 있다.A computer-readable storage medium (16) is configured to store computer-executable instructions or program code, program data, and/or other suitable forms of information. A program (20) stored in the computer-readable storage medium (16) includes a set of instructions executable by the processor (14). In one embodiment, the computer-readable storage medium (16) may be a memory (volatile memory such as random access memory, non-volatile memory, or a suitable combination thereof), one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, any other form of storage medium that can be accessed by the computing device (12) and capable of storing desired information, or a suitable combination thereof.

통신 버스(18)는 프로세서(14), 컴퓨터 판독 가능 저장 매체(16)를 포함하여 컴퓨팅 장치(12)의 다른 다양한 컴포넌트들을 상호 연결한다.A communication bus (18) interconnects various other components of the computing device (12), including the processor (14) and computer-readable storage media (16).

컴퓨팅 장치(12)는 또한 하나 이상의 입출력 장치(24)를 위한 인터페이스를 제공하는 하나 이상의 입출력 인터페이스(22) 및 하나 이상의 네트워크 통신 인터페이스(26)를 포함할 수 있다. 입출력 인터페이스(22) 및 네트워크 통신 인터페이스(26)는 통신 버스(18)에 연결된다. 입출력 장치(24)는 입출력 인터페이스(22)를 통해 컴퓨팅 장치(12)의 다른 컴포넌트들에 연결될 수 있다. 예시적인 입출력 장치(24)는 포인팅 장치(마우스 또는 트랙패드 등), 키보드, 터치 입력 장치(터치패드 또는 터치스크린 등), 음성 또는 소리 입력 장치, 다양한 종류의 센서 장치 및/또는 촬영 장치와 같은 입력 장치, 및/또는 디스플레이 장치, 프린터, 스피커 및/또는 네트워크 카드와 같은 출력 장치를 포함할 수 있다. 예시적인 입출력 장치(24)는 컴퓨팅 장치(12)를 구성하는 일 컴포넌트로서 컴퓨팅 장치(12)의 내부에 포함될 수도 있고, 컴퓨팅 장치(12)와는 구별되는 별개의 장치로 컴퓨팅 장치(12)와 연결될 수도 있다.The computing device (12) may also include one or more input/output interfaces (22) that provide interfaces for one or more input/output devices (24) and one or more network communication interfaces (26). The input/output interfaces (22) and the network communication interfaces (26) are coupled to the communication bus (18). The input/output devices (24) may be coupled to other components of the computing device (12) via the input/output interfaces (22). Exemplary input/output devices (24) may include input devices such as a pointing device (such as a mouse or trackpad), a keyboard, a touch input device (such as a touchpad or a touchscreen), a voice or sound input device, various types of sensor devices and/or photographing devices, and/or output devices such as a display device, a printer, speakers, and/or a network card. The exemplary input/output devices (24) may be included within the computing device (12) as a component that constitutes the computing device (12), or may be coupled to the computing device (12) as a separate device distinct from the computing device (12).

이상에서 본 발명의 대표적인 실시예들을 상세하게 설명하였으나, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 상술한 실시예에 대하여 본 발명의 범주에서 벗어나지 않는 한도 내에서 다양한 변형이 가능함을 이해할 것이다. 그러므로 본 발명의 권리범위는 설명된 실시예에 국한되어 정해져서는 안 되며, 후술하는 특허청구범위뿐만 아니라 이 특허청구범위와 균등한 것들에 의해 정해져야 한다.Although representative embodiments of the present invention have been described in detail above, those skilled in the art will understand that various modifications can be made to the above-described embodiments without departing from the scope of the present invention. Therefore, the scope of the rights of the present invention should not be limited to the described embodiments, but should be determined not only by the claims described below but also by equivalents of the claims.

10 : 컴퓨팅 환경
12 : 컴퓨팅 장치
14 : 프로세서
16 : 컴퓨터 판독 가능 저장 매체
18 : 통신 버스
20 : 프로그램
22 : 입출력 인터페이스
24 : 입출력 장치
26 : 네트워크 통신 인터페이스
100 : 심층 신경망의 불확실성 예측 장치
102 : 제1 기계 학습 모델
102a : 인코더
102b : 디코더
104 : 제2 기계 학습 모델10: Computing Environment
12 : Computing Device
14 : Processor
16: Computer readable storage medium
18 : Communication Bus
20 : Program
22 : Input/output interface
24 : Input/output devices
26: Network Communication Interface
100: Uncertainty Prediction Device for Deep Neural Networks
102: First machine learning model
102a : Encoder
102b : Decoder
104: Second machine learning model

Claims

A first machine learning model trained to perform a preset task based on input data; and
An uncertainty prediction device of a deep neural network, comprising a second machine learning model trained to predict uncertainty regarding a task performance result of the first machine learning model.

In claim 1,
The above first machine learning model is,
It comprises an encoder configured to extract features from an input video and a decoder configured to generate an object segmentation map that segments each object in the video based on the extracted features.
The second machine learning model is an uncertainty prediction device of a deep neural network, which is formed by the same deep neural network structure as the decoder.

In claim 2,
The above uncertainty prediction device is,
When the learning of the first machine learning model is completed, the second machine learning model is learned.
An uncertainty prediction device of a deep neural network, which generates a training video, inputs the generated training video into an encoder of the first machine learning model to extract features, and inputs the extracted features into a decoder of the first machine learning model and the second machine learning model, respectively.

In claim 3,
The above learning video is,
An uncertainty prediction device of a deep neural network, in which noise is added to the original video to cause the object segmentation prediction of the first machine learning model trained above to be incorrect.

In claim 4,
The above uncertainty prediction device is,
An uncertainty prediction device of a deep neural network, which generates the training video by adding attack fragments corresponding to noise near the object area in the original video.

In claim 4,
The above decoder generates an object segmentation map that segments each object in the training video based on the extracted features,
The second machine learning model is an uncertainty prediction device of a deep neural network that generates an uncertainty prediction map indicating which part of the object segmentation map contains incorrect pixels based on the extracted features.

In claim 6,
The above uncertainty prediction device is,
An uncertainty prediction device of a deep neural network that compares the object segmentation map generated by the decoder with the object segmentation correct answer map for the corresponding training video and sets the object segmentation map composed of pixels for which the decoder's prediction is incorrect as the correct answer map of the second machine learning model.

In claim 7,
The above uncertainty prediction device is,
An uncertainty prediction device of a deep neural network that trains the second machine learning model so that the uncertainty prediction map generated by the second machine learning model resembles the correct answer map.

In claim 8,
The loss function (L _Obs ) of the above second machine learning model is an uncertainty prediction device of a deep neural network, expressed by the mathematical formula below.
(mathematical formula)

y _Obs : Correct answer map of the second machine learning model
x: learning videos
Seg(x): Object segmentation map predicted by the first machine learning model when a training video is input.
Observer: Uncertainty prediction map generated from the second machine learning model

one or more processors, and
A method performed on a computing device having a memory storing one or more programs executed by one or more processors,
A step of training a first machine learning model to perform a preset task based on input data; and
A method for predicting uncertainty in a deep neural network, comprising the step of training a second machine learning model to predict uncertainty regarding a task performance result of the first machine learning model.

In claim 10,
The above first machine learning model is,
It comprises an encoder configured to extract features from an input video and a decoder configured to generate an object segmentation map that segments each object in the video based on the extracted features.
A method for predicting uncertainty of a deep neural network, wherein the second machine learning model has the same deep neural network structure as the decoder.

In claim 11,
The computing device learns the second machine learning model after the learning of the first machine learning model is completed,
The step of learning the second machine learning model is:
A method for predicting uncertainty of a deep neural network, wherein after generating a training video, the generated training video is input into an encoder of the first machine learning model to extract features, and the extracted features are input into a decoder of the first machine learning model and the second machine learning model, respectively.

In claim 12,
The above learning video is,
A method for predicting uncertainty in a deep neural network, wherein noise is added to the original video to cause the object segmentation prediction to be incorrect in the first machine learning model trained above.

In claim 13,
The above computing device,
A method for predicting uncertainty in a deep neural network, which generates the training video by adding attack fragments corresponding to noise near the object area in the original video.

In claim 13,
The above decoder generates an object segmentation map that segments each object in the training video based on the extracted features,
A method for predicting uncertainty in a deep neural network, wherein the second machine learning model generates an uncertainty prediction map indicating which part of the object segmentation map contains incorrect pixels based on the extracted features.

In claim 15,
The above computing device,
A method for predicting uncertainty in a deep neural network, wherein the object segmentation map generated by the decoder is compared with the object segmentation correct answer map for the corresponding training video, and the object segmentation map composed of pixels for which the decoder's prediction is incorrect is set as the correct answer map of the second machine learning model.

In claim 16,
The above computing device,
A method for predicting uncertainty in a deep neural network, wherein the second machine learning model is trained so that the uncertainty prediction map generated by the second machine learning model resembles the correct answer map.

In claim 17,
The loss function (L _Obs ) of the above second machine learning model is a method for predicting uncertainty of a deep neural network, expressed by the mathematical formula below.
(mathematical formula)

y _Obs : Correct answer map of the second machine learning model
x: learning videos
Seg(x): Object segmentation map predicted by the first machine learning model when a training video is input.
Observer: Uncertainty prediction map generated from the second machine learning model

A computer program stored in a non-transitory computer readable storage medium,
The computer program comprises one or more instructions, which, when executed by a computing device having one or more processors, cause the computing device to:
A step of training a first machine learning model to perform a preset task based on input data; and
A computer program stored in a non-transitory computer-readable storage medium, which causes the computer to perform a step of training a second machine learning model to predict uncertainty in a task performance result of the first machine learning model.