KR101771146B1

KR101771146B1 - Method and apparatus for detecting pedestrian and vehicle based on convolutional neural network using stereo camera

Info

Publication number: KR101771146B1
Application number: KR1020170035985A
Authority: KR
Inventors: 이규철; 유지상
Original assignee: 광운대학교 산학협력단
Priority date: 2017-03-22
Filing date: 2017-03-22
Publication date: 2017-08-24
Anticipated expiration: 2037-03-22

Abstract

According to the present invention, provided is an object detection apparatus which comprises: an object candidate detection unit for acquiring a disparity image through stereo matching of a stereo image captured from a stereo camera including two or more lenses; and an object detection unit for detecting either one or both of a vehicle and a pedestrian among object candidates detected by the object candidate detection unit.

Description

TECHNICAL FIELD [0001] The present invention relates to a convolutional neural network-based pedestrian and vehicle detection method and apparatus using a stereo camera,

본 발명은 스테레오 카메라를 이용한 컨볼루션 신경망(Convolutional Neural Network, CNN) 기반의 객체 검출 방법 및 장치에 관한 것이다. 보다 구체적으로, 본 발명은 차량 등 이동체에 설치된 스테레오 카메라를 통해 획득한 영상을 이용하여 보행자 및 차량 등을 검출하는 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for object detection based on a Convolutional Neural Network (CNN) using a stereo camera. More particularly, the present invention relates to a method and an apparatus for detecting a pedestrian, a vehicle, and the like using an image acquired through a stereo camera installed on a moving object such as a vehicle.

영상 인식 기술이 발달하면서, 차량 등 이동체에 설치된 카메라를 이용하여 보행자나 차량을 인식하고, 충돌 가능성을 판단하여 필요한 경우 차량을 자동으로 정지시키는 충돌 방지 안전 기술이 연구되고 있다.With the development of image recognition technology, anti - collision safety technology that recognizes a pedestrian or a vehicle using a camera installed in a moving object such as a vehicle, determines the possibility of collision and automatically stops the vehicle if necessary, is being studied.

공개 논문 "이동식 지상 감시 정찰 시스템을 위한 관심영역 설정 및 능동적 감시 보조 방법에 관한 연구, 오상훈(2014)"에 따르면, 한 대의 카메라를 활용하여 움직이는 시스템에서 효율적으로 감시가 필요한 관심영역을 찾고 추적함으로써 능동적으로 표적이나 장애물의 움직임에 대응할 수 있는 방법에 관한 연구를 진행하였다. 연속된 영상으로부터 코너 포인트를 Lucas-Kanade 알고리즘을 이용해 추적함으로써 주행하는 시스템의 자가-움직임(Ego-motion)을 예측할 수 있으며, 이와 다른 움직임을 갖는 영역을 장애물 또는 표적으로 판단하고 관심영역(ROI: Region of interest)을 설정한다. 이때, 설정된 ROI는 파티클 필터 및 칼만 필터를 활용해 추적되고 그 궤적을 예측하여 능동적으로 표적의 움직임에 대응할 수 있도록 한다. According to "A Study on the Establishment of Area of Interest and Active Surveillance Assistance Method for Mobile Ground Surveillance Reconnaissance System, Oh Sang-hoon (2014)", a publicly known paper is used to find and track the area of interest that needs to be efficiently monitored in a moving system using a single camera We have studied a method to actively cope with the movement of the target or obstacle. We can predict the ego-motion of the running system by tracking the corner points from the successive images using the Lucas-Kanade algorithm, determine the areas with different motions as obstacles or targets, Region of interest. At this time, the set ROI is traced by using the particle filter and the Kalman filter, and predicts the trajectory so that it can actively respond to the movement of the target.

그러나, 이 연구 기술에서는 한 대의 카메라를 사용하기 때문에 카메라와 객체간의 거리를 측정할 수 없으며, 또한 알고리즘의 특성상 자동차의 움직임과 객체의 움직임이 비슷한 경우 객체가 추출되지 않는 문제가 발생한다. 또한, 객체에 대한 분류를 따로 하지 않기 때문에 검출된 객체가 차량인지 보행자인지 혹은 다른 물체인지 알 수 없다는 한계가 존재하였다.However, in this research technique, since a single camera is used, the distance between the camera and the object can not be measured. Also, due to the nature of the algorithm, when the movement of the vehicle is similar to the movement of the object, the object is not extracted. In addition, since there is no classification for the object, there is a limitation that the detected object is not a vehicle, pedestrian, or other object.

공개 논문 "교통신호제어를 위한 HOG 기반 보행자 검출 및 행동패턴 인식, 양성민, 조강현 (2013)"에서는 카메라 영상을 통해서 보행자를 검출하는 방법으로 HoG(Histogram of Oriented Gradient) 특징을 이용하여 외부환경에서 보행자를 검출하였다. 그 다음 보행자의 행동패턴을 정의 및 추적을 하고 보행자 횡단 유무를 판단하는 알고리즘을 제시하였다.In HOG-based pedestrian detection and behavior pattern recognition for traffic signal control, the detection of pedestrians through camera images is carried out using HoG (Histogram of Oriented Gradient) . Then we define the behavior pattern of the pedestrian and track the pedestrian crossing.

그러나, 이 연구 기술 역시 한 대의 카메라를 이용하기 때문에 객체와 카메라간의 거리를 측정할 수 없으며, 보행자 탐색 방법으로 HoG를 사용하지만 HoG의 특성상 검색 범위가 영상 전체이기 때문에 소요 시간이 오래 걸린다는 단점이 있다.However, since this technique also uses one camera, it can not measure the distance between the object and the camera, and HoG is used as a pedestrian search method. However, since the search range is the entire image due to the characteristics of the HoG, have.

대한민국 공개특허공보 제2016-0069834호Korean Patent Laid-Open Publication No. 2016-0069834

S. H. Oh, "Method for detection regions of interest and active surveillance assistance in the mobile ground reconnaissance system", Journal of KIIT, vol. 12, no. 6, pp. 31-38 (2014) S. H. Oh, "Method for detection regions of interest and active surveillance assistance in the mobile ground reconnaissance system ", Journal of KIIT, vol. 12, no. 6, pp. 31-38 (2014) 양성민, 조강현, "교통신호제어를 위한 HOG 기반 보행자 검출 및 행동패턴 인식", 2013 Yang, Sang-Hyun Cho, "HOG based pedestrian detection and behavior pattern recognition for traffic signal control", 2013

본 발명은 카메라와 객체 사이의 거리를 측정하기 위해 스테레오 카메라를 이용한 객체 검출 장치 및 방법을 제공하는 것을 목적으로 한다.It is an object of the present invention to provide an apparatus and method for detecting an object using a stereo camera to measure the distance between the camera and the object.

또한, 본 발명은 스테레오 영상에서 객체 후보 검출을 위한 검색 속도 및 효율을 향상시키는 객체 검출 장치 및 방법을 제공하는 것을 목적으로 한다.It is another object of the present invention to provide an object detecting apparatus and method for improving the retrieval speed and efficiency for object candidate detection in a stereo image.

또한, 본 발명은 종래의 HoG(Histogram of Oriented Gradient)에 비해 향상된 객체 인식률을 갖도록 CNN(컨볼루션 신경망)인 AlexNet 모델을 활용하는 객체 검출 장치 및 방법을 제공하는 것을 목적으로 한다.It is another object of the present invention to provide an object detecting apparatus and method using an AlexNet model, which is a CNN (Convolution Neural Network), so as to have an improved object recognition rate as compared with a conventional HoG (Histogram of Oriented Gradient).

또한, 본 발명은 종래의 AlexNet 모델의 방대한 구조를 보행자 및 차량 관련 데이터베이스에 적합하게 최적화한 모델을 이용하는 객체 검출 장치 및 방법을 제공하는 것을 목적으로 한다.It is another object of the present invention to provide an object detecting apparatus and method using a model in which a vast structure of a conventional AlexNet model is optimized for a pedestrian and a vehicle-related database.

본 발명의 해결 과제들은 이상에서 언급한 내용들로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The present invention has been made in view of the above problems,

본 발명의 일 실시예에서, 2개 이상의 렌즈를 포함한 스테레오 카메라로부터 촬영된 스테레오 영상의 스테레오 매칭(stereo matching)을 통해 시차(disparity) 영상을 획득하도록 구성된 객체 후보 검출부; 및 상기 객체 후보 검출부를 통해 검출된 객체 후보들 중 보행자 및 차량 중 적어도 하나의 객체를 검출하도록 구성된 객체 검출부를 포함하는 객체 검출 장치를 제공할 수 있다.In one embodiment of the present invention, an object candidate detector configured to obtain a disparity image through stereo matching of a stereo image photographed from a stereo camera including two or more lenses; And an object detection unit configured to detect at least one object of a pedestrian and a vehicle among the object candidates detected through the object candidate detection unit.

여기서, 상기 객체 후보 검출부는 상기 시차 영상의 히스토그램 분석을 통해 객체 후보를 검출할 수 있다. 또한, 상기 객체 후보 검출부는 상기 시차 영상의 히스토그램의 세로 방향의 분포의 균일도 분석을 통해 객체 후보를 검출할 수 있다. 또한, 상기 시차 영상의 히스토그램의 특정 픽셀 값에서의 분포가 미리 결정된 기준 이상의 변이값을 갖는 경우, 상기 특정 픽셀 값을 갖는 영역을 객체 후보로 검출할 수 있다.Here, the object candidate detector may detect an object candidate through histogram analysis of the parallax image. The object candidate detection unit may detect an object candidate by analyzing the uniformity of the longitudinal distribution of the histogram of the parallax image. In addition, when the distribution of the histogram of the parallax image in a specific pixel value has a deviation value greater than or equal to a predetermined reference value, an area having the specific pixel value can be detected as an object candidate.

또한, 상기 객체 검출부는 컨볼루션 신경망(Convolutional Neural Network, CNN)을 이용하여 상기 보행자 및 차량 중 적어도 하나를 검출할 수 있다. 또한, 상기 객체 검출부는 AlexNet 모델의 구조를 일부 변경한 최적화된 네트워크 모델을 이용하여 상기 보행자 및 차량 중 적어도 하나를 검출할 수 있다. 또한, 상기 최적화된 네트워크 모델은 5개의 컨볼루션 레이어(convolutional layer)를 포함하고, 피처 맵(feature map)의 개수는 각각 48개, 128개, 192개, 192개, 128개이고, 완전 연결 레이어(fully connected layer)의 개수는 512개, 512개, 2개일 수 있다.Also, the object detecting unit may detect at least one of the pedestrian and the vehicle using a Convolutional Neural Network (CNN). Also, the object detection unit may detect at least one of the pedestrian and the vehicle using the optimized network model in which the structure of the AlexNet model is partially changed. In addition, the optimized network model includes five convolutional layers, and the number of feature maps is 48, 128, 192, 192, and 128, respectively, The number of fully connected layers may be 512, 512, or 2.

본 발명의 다른 실시예에서, 2개 이상의 렌즈를 포함한 스테레오 카메라부터 촬영된 스테레오 영상을 획득하는 단계; 상기 스테레오 영상의 스테레오 매칭(stereo matching)을 통해 시차(disparity) 영상에 기초하여 객체 후보를 검출하는 객체 후보 검출 단계; 및 상기 검출된 객체 후보들에서 보행자 및 차량 중 적어도 하나의 객체를 검출하는 객체 검출 단계를 포함하는 객체 검출 방법을 제공할 수 있다.In another embodiment of the present invention, there is provided a method comprising: obtaining a captured stereo image from a stereo camera comprising two or more lenses; An object candidate detection step of detecting an object candidate based on a disparity image through stereo matching of the stereo image; And an object detecting step of detecting at least one object of a pedestrian and a vehicle in the detected object candidates.

여기서, 상기 객체 후보 검출 단계는 상기 시차 영상의 히스토그램 분석을 통해 객체 후보를 검출할 수 있다. 또한, 상기 객체 후보 검출 단계는 상기 시차 영상의 히스토그램의 세로 방향의 분포의 균일도 분석을 통해 객체 후보를 검출할 수 있다. 상기 시차 영상의 히스토그램의 특정 픽셀 값에서의 분포가 미리 결정된 기준 이상의 변이값을 갖는 경우, 상기 특정 픽셀 값을 갖는 영역을 객체 후보로 검출할 수 있다.Here, the object candidate detection step may detect an object candidate through histogram analysis of the parallax image. In addition, the object candidate detection step may detect an object candidate by analyzing the uniformity of the vertical distribution of the histogram of the parallax image. If the distribution of the histogram of the parallax image in a specific pixel value has a deviation value greater than or equal to a predetermined reference value, an area having the specific pixel value may be detected as an object candidate.

또한, 상기 객체 검출 단계는 컨볼루션 신경망(Convolutional Neural Network, CNN)을 이용하여 상기 보행자 및 차량 중 적어도 하나를 검출할 수 있다. 또한, 상기 객체 검출 단계는 AlexNet 모델의 구조를 일부 변경한 최적화된 네트워크 모델을 활용하여 상기 보행자 및 차량 중 적어도 하나를 검출할 수 있다. 또한, 상기 최적화된 네트워크 모델은 5개의 컨볼루션 레이어(convolutional layer)를 포함하고, 피처 맵(feature map)의 개수는 각각 48개, 128개, 192개, 192개, 128개이고, 완전 연결 레이어(fully connected layer)의 개수는 512개, 512개, 2개일 수 있다.Also, the object detecting step may detect at least one of the pedestrian and the vehicle using a Convolutional Neural Network (CNN). In addition, the object detection step may detect at least one of the pedestrian and the vehicle using an optimized network model in which the structure of the AlexNet model is partially changed. In addition, the optimized network model includes five convolutional layers, and the number of feature maps is 48, 128, 192, 192, and 128, respectively, The number of fully connected layers may be 512, 512, or 2.

본 발명에 의하면, 카메라와 객체 사이의 거리를 측정하기 위해 스테레오 카메라를 이용한 객체 검출 장치 및 방법을 제공할 수 있다.According to the present invention, an apparatus and method for detecting an object using a stereo camera to measure a distance between a camera and an object can be provided.

또한, 본 발명에 의하면, 스테레오 영상에서 객체 후보 검출을 위한 검색 속도 및 효율을 향상시키는 객체 검출 장치 및 방법을 제공할 수 있다.According to another aspect of the present invention, there is provided an apparatus and method for detecting an object that improves a search speed and efficiency for object candidate detection in a stereo image.

또한, 본 발명에 의하면, 종래의 HoG(Histogram of Oriented Gradient)에 비해 향상된 객체 인식률을 갖도록 CNN(컨볼루션 신경망)인 AlexNet 모델을 활용하는 객체 검출 장치 및 방법을 제공할 수 있다.Also, according to the present invention, it is possible to provide an object detecting apparatus and method using an AlexNet model, which is a CNN (Convolution Neural Network), so as to have an improved object recognition rate as compared with a conventional HoG (Histogram of Oriented Gradient).

또한, 본 발명에 의하면, 종래의 AlexNet 모델의 방대한 구조를 보행자 및 차량 관련 데이터베이스에 적합하게 최적화한 모델을 이용하는 객체 검출 장치 및 방법을 제공할 수 있다.Further, according to the present invention, it is possible to provide an object detecting apparatus and method using a model in which a vast structure of a conventional AlexNet model is optimized for a pedestrian and a vehicle-related database.

본 발명의 효과들은 이상에서 언급한 내용들로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The effects of the present invention are not limited to those mentioned above, and other effects not mentioned may be clearly understood by those skilled in the art from the following description.

도 1은 본 발명의 일 실시예에 따른 객체 검출 시스템의 구성을 설명하기 위한 블록도이다.
도 2는 본 발명의 일 실시예에 따른 객체 검출 방법을 설명하기 위한 흐름도이다.
도 3은 본 발명의 일 실시예에 따른 따른 객체 후보 검출 단계에서 이용하는 깊이 영상 및 히스토그램 분포도를 나타낸다.
도 4는 본 발명의 일 실시예에 따른 객체 검출 방법을 설명하기 위한 흐름도이다.
도 5는 본 발명의 일 실시예에 따른 객체 검출을 위한 최적화한 모델과 AlexNet을 비교한 네트워크 구조표이다.
도 6은 본 발명의 일 실시예에 따른 보행자 및 차량의 인식 결과를 보여주는 화면이다.
도 7은 본 발명의 일 실시예에 따른 객체 검출 방법의 성능 비교 결과를 보여주는 표이다.1 is a block diagram illustrating a configuration of an object detection system according to an embodiment of the present invention.
2 is a flowchart illustrating an object detection method according to an embodiment of the present invention.
3 illustrates a depth image and a histogram distribution diagram used in the object candidate detection step according to an exemplary embodiment of the present invention.
4 is a flowchart illustrating an object detection method according to an embodiment of the present invention.
5 is a network structure table comparing AlexNet with an optimized model for object detection according to an embodiment of the present invention.
6 is a screen showing recognition results of a pedestrian and a vehicle according to an embodiment of the present invention.
7 is a table showing performance comparison results of the object detection method according to an embodiment of the present invention.

이하, 첨부된 도면을 참조하여 본 발명에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다.The terminology used herein is for the purpose of illustrating embodiments and is not intended to be limiting of the present invention. In the present specification, the singular form includes plural forms unless otherwise specified in the specification.

본 명세서에서 사용되는 "포함한다(comprises)", "포함하는(comprising)"은 언급된 구성 요소, 단계, 동작 및/또는 소자는 하나 이상의 다른 구성 요소, 단계, 동작 및/또는 소자의 존재 또는 추가를 배제하지 않는다. The terms " comprises, "" comprising, " " comprising, " or " comprising ", when used in this application, specify the presence or absence of one or more other components, steps, operations and / Do not exclude the addition.

또한, 본 발명에서 사용되는 제1, 제2 등과 같이 서수를 포함하는 용어는 구성 요소들을 설명하는데 사용될 수 있지만, 구성 요소들은 용어들에 의해 한정되어서는 안 된다. 이와 같은 용어들은 하나의 구성 요소를 다른 구성 요소로부터 구별하는 목적으로만 사용된다. 또한, 본 발명을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.Furthermore, terms including ordinals such as first, second, etc. used in the present invention can be used to describe elements, but the elements should not be limited by terms. These terms are used only for the purpose of distinguishing one component from another. In the following description, well-known functions or constructions are not described in detail since they would obscure the invention in unnecessary detail.

또한, 본 발명의 실시예에 나타나는 구성부들은 서로 다른 특징적인 기능들을 나타내기 위해 독립적으로 도시되는 것으로, 각 구성부들이 분리된 하드웨어나 하나의 소프트웨어 구성단위로 이루어짐을 의미하지 않는다. 즉, 각 구성부는 설명의 편의상 각각의 구성부로 나열하여 기술되고, 각 구성부 중 적어도 두 개의 구성부가 합쳐져 하나의 구성부로 이루어지거나, 하나의 구성부가 복수 개의 구성부로 나뉘어져 기능을 수행할 수 있다. 이러한 각 구성부의 통합된 실시예 및 분리된 실시예도 본 발명의 본질에서 벗어나지 않는 한 본 발명의 권리 범위에 포함된다.In addition, the components shown in the embodiments of the present invention are shown independently to represent different characteristic functions, and do not mean that the components are composed of separate hardware or software constituent units. That is, each constituent unit is described by arranging each constituent unit for convenience of explanation, and at least two constituent units of each constituent unit may be combined to form one constituent unit or one constituent unit may be divided into a plurality of constituent units to perform a function. The integrated embodiments and the separate embodiments of each of these components are also included in the scope of the present invention without departing from the essence of the present invention.

이하, 첨부된 도면을 참조하여 본 발명에 따른 실시예를 상세하게 설명한다. 본 발명의 구성 및 그에 따른 작용 효과는 이하의 상세한 설명을 통해 명확하게 이해될 것이다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. The configuration of the present invention and the operation and effect thereof will be clearly understood through the following detailed description.

도 1은 본 발명의 일 실시예에 따른 객체 검출 시스템을 구성을 설명하기 위한 블록도이다.1 is a block diagram for explaining a configuration of an object detection system according to an embodiment of the present invention.

객체 검출 시스템은 객체 검출 장치(100)와 스테레오 카메라(200)를 포함할 수 있다. 객체 검출 장치(100)는 스테레오 카메라(200)와 직접 또는 간접으로 연결될 수 있으며, 또한 유선 또는 무선으로 연결되어 스테레오 카메라(200)에서 촬영된 영상 이미지 또는 영상 정보를 수신 가능하다.The object detection system may include an object detection apparatus 100 and a stereo camera 200. The object detecting apparatus 100 may be directly or indirectly connected to the stereo camera 200 and may be connected to the stereo camera 200 via a wired or wireless connection so as to receive image information or image information captured by the stereo camera 200.

먼저, 스테레오 카메라(200)는 스테레오 영상을 생성하기 위해 제1 카메라(210) 및 제2 카메라(220)를 포함할 수 있다. 또한, 스테레오 카메라(200)는 2개 이상의 렌즈를 포함하거나, 2개 이상의 카메라를 포함할 수 있으며, 스테레오 영상을 생성할 수 있는 다양한 형태의 구성을 가질 수 있다.First, the stereo camera 200 may include a first camera 210 and a second camera 220 to generate a stereo image. In addition, the stereo camera 200 may include two or more lenses, may include two or more cameras, and may have various types of configurations capable of generating stereo images.

스테레오 카메라(200)에서 촬영된 스테레오 영상을 직접 또는 간접적으로 수신하는 객체 검출 장치(100)는 객체 후보들을 검출하도록 구성되는 객체 후보 검출부(110) 및 검출된 객체 후보들 중에서 객체를 검출하도록 구성되는 객체 검출부(120)를 포함한다.An object detecting apparatus (100) for directly or indirectly receiving a stereo image photographed by a stereo camera (200) includes an object candidate detecting unit (110) configured to detect object candidates and an object And a detection unit 120.

객체 후보 검출부(110)는 객체 후보들을 검출하기 위해 시차(disparity) 영상 생성부(111) 및 히스토그램 분석부(112)를 포함할 수 있다. The object candidate detection unit 110 may include a disparity image generation unit 111 and a histogram analysis unit 112 to detect object candidates.

시차 영상 생성부(111)는 스테레오 카메라(200)로부터 획득한 스테레오 영상으로부터 스테레오 매칭을 이용하여 시차 영상을 생성한다. 이와 같은 시차 영상의 특성을 이용하여 객체 후보들을 검출할 수 있다. 시차 영상 생성부(111)는 카메라 파라미터 등을 이용하여 생성된 시차 영상을 깊이 영상으로 변환할 수 있다. 예컨대, 깊이 영상은 카메라로부터 객체까지의 거리를 0부터 255까지의 값으로 표현한 영상일 수 있다.The parallax image generating unit 111 generates a parallax image using stereo matching from the stereo image acquired from the stereo camera 200. [ The object candidates can be detected using the characteristic of the parallax image. The parallax image generating unit 111 may convert the parallax image generated using camera parameters or the like into a depth image. For example, the depth image may be an image in which the distance from the camera to the object is expressed as a value from 0 to 255. [

히스토그램 분석부(112)는 이와 같이 생성된 깊이 영상의 히스토그램을 분석한다. 본 발명에 따른 히스토그램 분석부(112)는 깊이 영상의 히스토그램을 세로 방향으로 분석함으로써, 보행자나 차량 영역 등과 같은 객체 영역과 도로, 배경 영역 등과 같은 비객체 영역을 보다 신속하고 효율적으로 검출 가능하다.The histogram analyzer 112 analyzes the histogram of the depth image thus generated. The histogram analyzing unit 112 according to the present invention analyzes the histogram of the depth image in the vertical direction, thereby making it possible to more quickly and efficiently detect object regions such as a pedestrian or a vehicle region, and non-object regions such as roads and background regions.

히스토그램 분석부(112)에서 세로 방향의 히스토그램 분포가 특정 픽셀 값에 집중되는 경우, 즉 특정 픽셀 값에서의 변이값이 미리 결정된 기준 이상에 해당하는 경우, 해당 픽셀 값을 가지는 영역을 객체 후보로 검출가능하다. 반대로, 세로 방향의 히스토그램 분포가 전체 픽셀 값에서 균일하게 분포되는 경우, 도로 영역과 같은 비객체 영역으로 판단 가능하다. 히스토그램 분석에 대한 보다 구체적인 설명은 도 3을 참조하여 후술될 것이다.When the histogram analyzer 112 concentrates histogram distributions in the vertical direction on a specific pixel value, that is, when a variation value in a specific pixel value corresponds to a predetermined reference or more, an area having the pixel value is detected as an object candidate It is possible. In contrast, when the histogram distribution in the vertical direction is uniformly distributed over the entire pixel values, it can be determined as a non-object area such as a road area. A more detailed description of the histogram analysis will be described later with reference to FIG.

객체 검출부(120)는 객체 후보 검출부(110)에서 검출된 객체 후보들 중에서 최종 검출 대상인 보행자 및 차량 등의 객체를 검출한다. 객체 검출부(120)는 최적화된 네트워크 모델(121)을 포함할 수 있다. 최적화된 네트워크 모델(121)은 최근 주목받고 있는 CNN(Convolutional Neural Network) 중에서 구조가 간단한 AlexNet 모델을 이용하며, 보행자와 차량을 검출하기 위한 최적화를 통해 파라미터를 조정하여 사용한다. 예컨대, 최적화된 네트워크 모델(121)은 5개의 컨볼루션 레이어(convolutional layer)를 포함하고, 피처 맵(feature map)의 개수는 각각 48개, 128개, 192개, 192개, 128개이고, 완전 연결 레이어(fully connected layer)의 개수는 512개, 512개, 2개일 수 있으며, AlexNet과의 구체적인 비교는 도 5를 참조하여 후술될 것이다.The object detection unit 120 detects an object such as a pedestrian and a vehicle, which are the final detection target, among the object candidates detected by the object candidate detection unit 110. [ The object detection unit 120 may include an optimized network model 121. The optimized network model 121 uses the simple AlexNet model among CNN (Convolutional Neural Network), which has recently been attracting attention, and adjusts parameters by optimizing for detecting pedestrians and vehicles. For example, the optimized network model 121 includes five convolutional layers, and the number of feature maps is 48, 128, 192, 192, and 128, respectively, The number of fully connected layers may be 512, 512, or 2, and a specific comparison with AlexNet will be described below with reference to FIG.

객체 검출부(120)는 이와 같이 AlexNet 모델에 기반하여 파라미터가 조정된 최적화 과정을 통해 객체 후보 검출부(110)에서 추출한 객체 후보들 중에서 보행자와 차량만을 빠르고 정확하게 검출할 수 있다.The object detection unit 120 can quickly and accurately detect only the pedestrian and the vehicle among the object candidates extracted from the object candidate detection unit 110 through the parameter-adjusted optimization process based on the AlexNet model.

도 2는 본 발명의 일 실시예에 따른 객체 검출 방법을 설명하기 위한 흐름도이다.2 is a flowchart illustrating an object detection method according to an embodiment of the present invention.

도 2를 참조하면, 본 발명의 일 실시예에 따른 객체 검출 방법은 크게 객체 후보 검출 단계(S210) 및 객체 검출 단계(S220)의 2 단계로 이루어진다.Referring to FIG. 2, the object detection method according to an exemplary embodiment of the present invention includes two steps: an object candidate detection step S210 and an object detection step S220.

객체 후보들을 검출하는 단계(S210)에서는 스테레오 영상으로부터 획득한 시차 영상의 특성을 이용하여 객체 후보를 검출하며, 다음으로 객체 검출 단계(S220)에서는 AlexNet 모델을 이용하여 파라미터가 조정된 최적화 과정을 통해 이전 단계에서 추출한 객체 후보들 중에서 보행자와 차량만을 빠르고 정확하게 검출한다. In the step of detecting object candidates (S210), object candidates are detected using the characteristics of the parallax images obtained from the stereo images. Next, in the object detecting step (S220), parameters are adjusted using an AlexNet model Only the pedestrian and the vehicle are detected quickly and accurately from the object candidates extracted in the previous step.

도 3은 본 발명의 일 실시예에 따른 객체 후보 검출 단계에서 이용하는 깊이 영상 및 히스토그램 분포도를 나타낸다. 도 3의 히스토그램(310,320)의 가로축은 깊이 영상의 픽셀의 깊이 값을 의미하며, 세로축은 해당 깊이 값의 픽셀의 개수를 의미한다.3 illustrates a depth image and a histogram distribution diagram used in the object candidate detection step according to an exemplary embodiment of the present invention. The horizontal axes of the histograms 310 and 320 in FIG. 3 denote the depth values of the pixels of the depth image, and the vertical axes denote the number of pixels of the depth value.

객체 후보 검출 단계(S210)에서 스테레오 매칭을 이용하여 스테레오 영상으로부터 시차 영상을 획득하고, 시차 영상은 카메라 파라미터를 이용하여 깊이 영상(depth image)으로 변환할 수 있다. 이와 같이 생성된 깊이 영상을 카메라로부터 객체까지의 거리를 0부터 255까지의 깊이 값으로 표현할 수 있다.In the object candidate detection step (S210), a parallax image may be acquired from a stereo image using stereo matching, and the parallax image may be converted into a depth image using camera parameters. The depth image can be expressed as the depth value from 0 to 255 from the camera to the object.

도 3에서 왼쪽에는 획득한 깊이 영상(300)이 도시되며, 객체가 존재하는 객체(object) 영역에 대한 세로 방향 히스토그램 분포도(310) 및 객체가 존재하지 않는 비객체(non-object) 영역에 대한 세로 방향 히스토그램 분포도(320)가 도시된다. 히스토그램 분포도(310)는 깊이 영상(300)에서 객체 영역, 예컨대 보행자가 존재하는 보행자 영역으로, 2개의 화살표 중 우측에 있는 세로 방향의 화살표로 표시된 영역에 대응하며, 히스토그램 분포도(320)는 깊이 영상(300)에서 비객체 영역, 예컨대 도로가 존재하는 도로 영역으로 2개의 화살표 중 좌측에 있는 세로 방향의 화살표로 표시된 영역에 대응한다.In FIG. 3, the acquired depth image 300 is shown on the left side of FIG. 3, and the histogram distribution 310 for the object region in which the object exists and the non-object region in which the object does not exist A longitudinal histogram distribution 320 is shown. The histogram distribution diagram 310 corresponds to an object area in the depth image 300, for example, a pedestrian area in which a pedestrian exists, and corresponds to an area indicated by a vertical arrow on the right side of two arrows, Corresponds to a non-object area, for example, a road area in which roads exist in the road 300, indicated by a vertical arrow on the left side of two arrows.

도 3에서 도시된 바와 같이, 객체 후보 검출 단계(S210)에서 획득한 깊이 영상의 히스토그램을 분석하여 객체 후보를 추출할 수 있다. 깊이 영상의 세로 방향으로 히스토그램(310,320)을 분석해보면 객체 영역의 히스토그램(310)의 경우 특정 픽셀 값에서 분포가 집중되는 형태를 보인다. 반면에 도로 영역과 같은 비객체 영역에서의 히스토그램(320)은 차량 등에 설치된 스테레오 카메라의 기하학적인 특성으로 인해 전체 픽셀 값에서 균일하게 분포하는 형태를 갖는다. 이와 같은 특성을 이용함으로써 객체 영역인지 비객체 영역인지 판별함으로써 객체 후보들을 신속하고 효율적으로 추출가능하다.As shown in FIG. 3, the object candidate can be extracted by analyzing the histogram of the depth image obtained in the object candidate detection step (S210). If the histograms 310 and 320 are analyzed in the longitudinal direction of the depth image, the histogram 310 of the object region shows a distribution in which the distribution is concentrated at a specific pixel value. On the other hand, the histogram 320 in a non-object area such as a road area is uniformly distributed over the entire pixel values due to the geometric characteristics of a stereo camera installed in a vehicle or the like. By using these characteristics, it is possible to extract object candidates quickly and efficiently by discriminating between object region and non-object region.

이와 같이 세로 방향의 히스토그램 분석을 이용하는 객체 후보 검출 방법은 영상의 좌측 상단부터 우측 하단까지 전체를 탐색하는 그리드 스캔 방식에 비하여 빠른 속도로 객체를 검출할 수 있다. 그리드 스캔 방식을 이용하면 수만 개의 객체를 후보로 처리하고 각각에 대하여 인식 과정을 수행하여야 하지만 본 방법은 추출한 객체 후보들에 대해서만 객체 인식 과정을 수행하기 때문에 보다 빠르고, 효율적이다.As described above, the object candidate detection method using the longitudinal histogram analysis can detect an object at a higher speed than the grid scan method that searches the entire upper left to lower right of the image. Using the grid scan method, it is necessary to process tens of thousands of objects as candidates and perform the recognition process for each object. However, this method is faster and more efficient because it performs the object recognition process only on the extracted object candidates.

도 4는 본 발명의 일 실시예에 따른 객체 검출 방법을 설명하기 위한 흐름도이다.4 is a flowchart illustrating an object detection method according to an embodiment of the present invention.

도 4를 참조하면, 스테레오 영상 획득 단계(S410), 시차 영상 획득 단계(S420), 히스토그램 분석을 통한 객체 후보 검출 단계(S430)는 도 2의 객체 후보 검출 단계(S210)에 해당한다. Referring to FIG. 4, a stereo image acquisition step S410, a parallax image acquisition step S420, and an object candidate detection step S430 through histogram analysis correspond to the object candidate detection step S210 of FIG.

먼저, 스테레오 카메라로부터 촬영된 스테레오 영상을 획득하고(S410), 획득한 스테레오 영상으로부터 시차 영상을 획득한다.(S420) 이와 같이 획득한 시차 영상으로부터 깊이 영상을 생성 가능하며, 생성된 깊이 영상에 기초하여 도 3을 참조하여 설명한 바와 같이, 세로 방향의 히스토그램 분석을 통하여 객체 영역과 비객체 영역을 판별함으로써 객체 후보들을 신속하게 검출한다.(S430)First, a stereo image photographed from a stereo camera is acquired (S410), and a parallax image is acquired from the acquired stereo image (S420). A depth image can be generated from the obtained parallax image, As described with reference to FIG. 3, the object candidates are quickly detected by determining the object region and the non-object region through histogram analysis in the vertical direction (S430)

다음으로 CNN을 이용한 객체 검출 단계(S440)은 도 2의 객체 검출 단계(S220)에 해당하며, CNN 중에서 구조가 간단한 AlexNet을 이용하며 최적화 과정을 통해 파라미터를 조정한다. 방대한 구조를 가지고 있는 AlexNet 모델의 구조를 차량과 보행자를 인식하기에 최적화된 구조를 설계함으로써, 데이터베이스의 규모를 줄임과 동시에 속도를 향상시키는 효과를 달성할 수 있다.Next, the object detection step using the CNN (S440) corresponds to the object detection step (S220) of FIG. 2, and the parameter is adjusted through the optimization process using a simple AlexNet structure among the CNNs. The structure of the AlexNet model, which has a huge structure, can be designed to optimize the recognition of vehicles and pedestrians, thereby reducing the size of the database and improving the speed.

도 5는 본 발명의 일 실시예에 따른 객체 검출을 위한 최적화한 모델과 AlexNet을 비교한 네트워크 구조표이다.5 is a network structure table comparing AlexNet with an optimized model for object detection according to an embodiment of the present invention.

네트워크 모델 선택은 개발자가 최적의 신경망 네트워크 구조를 가지는 하이퍼 파라미터를 찾는 과정으로서, 신경망 네트워크의 하이퍼 파라미터에는 히든 레이어의 개수, 히든 뉴런 및 활성화 함수의 유형, 풀링 및 컨벌루션 레이어의 구조가 포함되어 있다. 본 발명의 일 실시예에서 그리드 검색, 무차별 대입 알고리즘을 통해 애플리케이션을 위한 최적의 구조를 구축함으로써 보행자 및 차량 검출에 최적화된 네트워크 모델을 구현가능하다.The network model selection is a process in which a developer finds a hyperparameter having an optimal neural network structure. The hyperparameter of the neural network includes the number of hidden layers, the type of hidden neuron and activation function, and the structure of the pooling and convolution layer. In an embodiment of the present invention, a network model optimized for pedestrian and vehicle detection can be implemented by constructing an optimal structure for an application through a grid search and a random assignment algorithm.

도 5를 참조하면, AlexNet 모델을 최적화한 개선된 모델에서 컨볼루션 레이어(conlutional layer)는 총 5개(C1,C2,C3,C4,C5)로 구성되어 있으며 피처맵(feature-map)의 개수를 AlexNet의 96개, 256개, 384개, 384개, 256개에서 1/2 크기로 48개, 128개, 192개, 192개, 128개로 감소시켰다. 풀링 레이어(pooling layer)의 구조는 AlexNet과 동일하게 사용하였으며, 완전 연결 레이어(fully connected layer)의 구조는 AlexNet의 4096개, 4096개, 1000개에서 512개, 512개, 2개로 감소시켜 최적화를 달성하였다. Referring to FIG. 5, in the improved model optimized for the AlexNet model, the conglutional layer consists of a total of five (C1, C2, C3, C4, C5) and the number of feature- To 48, 128, 192, 192, 128 in AlexNet's 96, 256, 384, 384, and 256 to 1/2 sizes. The structure of the pooling layer is the same as that of AlexNet, and the structure of the fully connected layer is reduced to 4096, 4096, 1000 to 512, 512, Respectively.

기본적으로 AlexNet은 이미지넷(ImageNet) DB에 구조가 최적화되어 있으며 이미지넷 DB는 22,000개가 넘는 범주에 1,500만 개 이상의 고해상도 이미지로 구성되어 있다. 따라서, AlexNet의 구조는 차량과 보행자의 두 범주만을 인식하기에는 너무 큰 구조를 가지고 있기 때문에, 본 발명에서는 이와 같이 구조를 새롭게 설계하여 규모를 줄임과 동시에 속도를 향상시키는 효과를 달성하였다.By default, AlexNet is optimized for ImageNet databases, and ImageNet DB consists of more than 15 million high-resolution images in over 22,000 categories. Therefore, since the structure of AlexNet has a structure too large to recognize only two categories of the vehicle and the pedestrian, in the present invention, the structure is redesigned to achieve the effect of reducing the scale and increasing the speed.

도 6은 본 발명의 일 실시예에 따른 보행자 및 차량의 인식 결과를 보여주는 화면이다.6 is a screen showing recognition results of a pedestrian and a vehicle according to an embodiment of the present invention.

도 6은 본 발명의 일 실시예에 따른 객체 검출 방법에 의해, 보행자 및 차량에 대하여 최적화한 네트워크 모델을 이용하여 객체 후보들에 대하여 인식을 수행한 결과를 나타낸다. 도 6 내의 노란색 박스는 객체 후보들 중 보행자를 인식한 결과이며, 거리는 21.0m로 측정되었다. 도 6 내의 녹색 박스는 객체 후보들 중 차량을 인식한 결과이며, 거리는 27.8m로 측정되었다. 이와 같이, 본 발명에 의한 객체 검출 방법에 의해서 보행자 및 차량을 신속하고 정확하게 검출가능하다.FIG. 6 shows a result of performing recognition on object candidates using a network model optimized for a pedestrian and a vehicle by an object detection method according to an embodiment of the present invention. The yellow box in Figure 6 is the result of recognizing pedestrians among the object candidates, and the distance was measured as 21.0 m. The green box in FIG. 6 is the result of recognizing the vehicle among the object candidates, and the distance was measured as 27.8 m. Thus, the pedestrian and the vehicle can be detected quickly and accurately by the object detecting method according to the present invention.

도 7은 본 발명의 일 실시예에 따른 객체 검출 방법의 성능 비교 결과를 보여주는 표이다.7 is a table showing performance comparison results of the object detection method according to an embodiment of the present invention.

도 7은 본 발명에서 제안하는 객체 검출 방법과 종래의 단일 카메라를 이용하는 다른 방법들간의 성능 비교 결과를 보여준다. 제1 방법은 공개 논문, S. H. Oh, "Method for detection regions of interest and active surveillance assistance in the mobile ground reconnaissance system", Journal of KIIT, vol. 12, no. 6, pp. 31-38 (2014)에 개시된 방법이며, 제2 방법은 공개 논문, L. Zhao and C. Thorpe, "Stereo and neural network-based pedestrian detection", IEEE Trans. Intelligent Transportation System, vol. 1, no. 3, pp. 148-154 (2000)에 개시된 방법이다.FIG. 7 shows performance comparison results between the object detection method proposed in the present invention and other methods using a conventional single camera. The first method is disclosed in S. H. Oh, "Method for detection regions of interest and active surveillance assistance in the mobile ground reconnaissance system ", Journal of KIIT, vol. 12, no. 6, pp. 31-38 (2014), the second method is disclosed in L. Zhao and C. Thorpe, "Stereo and neural network-based pedestrian detection ", IEEE Trans. Intelligent Transportation System, vol. 1, no. 3, pp. 148-154 (2000).

도 7을 참조하면, 본 발명에서 제안한 객체 검출 방법에 의한 정확도(precision) 값 및 검출율(recall) 값은 각각 89.8%, 82.1%로 나타났으며, 실험비교결과 제안된 방법은 정확도 및 검출율에서 모두 종래의 제1 방법 및 제2 방법에 비해 우수한 성능을 갖는 것으로 나타났다. 이는 스테레오 카메라를 사용하여 거리 정보를 이용함으로써, 객체 후보를 보다 정확하게 추출할 수 있었기 때문일 것이다. Referring to FIG. 7, the precision and recall values of the object detection method proposed by the present invention are 89.8% and 82.1%, respectively. As a result of the experiment, All of which are superior to the conventional first and second methods. This may be because the object information can be extracted more accurately by using the distance information using the stereo camera.

또한, 본 발명에서는 먼저 시차(disparity) 영상을 이용하여 객체 후보를 검출하고 객체 후보 중에서 보행자 혹은 차량인지 아닌지를 검출하기 때문에 시간이 덜 소요되는 장점이 있다. 또한, 보행자 및 차량 인식에 사용하는 CNN은 보행자만 인식할 수 있는 종래의 HoG와 달리 보행자 및 차량뿐만 아니라 다른 객체의 DB가 존재한다면 트레이닝 과정을 통해 해당 객체에 대한 인식이 가능하며, HoG보다 인식률이 훨씬 높다는 장점을 갖는다.In addition, in the present invention, it is advantageous to detect object candidate using a disparity image and to detect whether the object candidate is a pedestrian or a vehicle, thereby requiring less time. In addition, the CNN used for pedestrian and vehicle recognition can recognize a corresponding object through a training process if a DB of not only a pedestrian and a vehicle but also other objects exists, unlike a conventional HoG which can recognize pedestrians only. Is much higher.

본 발명의 명세서에 개시된 실시예들은 예시에 불과한 것으로서, 본 발명은 이에 한정되지 않는 것이다. 본 발명의 범위는 아래의 특허청구범위에 의해 해석되어야 하며, 그와 균등한 범위 내에 있는 모든 기술도 본 발명의 범위에 포함되는 것으로 해석해야 할 것이다.The embodiments disclosed in the specification of the present invention are merely illustrative, and the present invention is not limited thereto. The scope of the present invention should be construed according to the following claims, and all the techniques within the scope of equivalents should be construed as being included in the scope of the present invention.

100: 객체 검출 장치 110: 객체 후보 검출부
111: 시차 영상 생성부 112: 히스토그램 분석부
120: 객체 검출부 121: 최적화된 네트워크 모델
200: 스테레오 카메라 210: 제1 카메라
220: 제2 카메라100: Object detecting apparatus 110: Object candidate detecting unit
111: parallax image generating unit 112: histogram analyzing unit
120: Object detection unit 121: Optimized network model
200: stereo camera 210: first camera
220: Second camera

Claims

An object candidate detector configured to obtain a disparity image through stereo matching of a stereo image taken from a stereo camera including two or more lenses; And
And an object detection unit configured to detect at least one object among a plurality of object candidates detected through the object candidate detection unit,
Wherein the object candidate detection unit detects an object candidate by analyzing uniformity of a longitudinal distribution of the histogram of the parallax image,
Wherein the object detecting unit detects at least one of the pedestrian and the vehicle among the object candidates using a Convolution Neural Network (CNN).

delete

The object detecting apparatus according to claim 1, wherein, when the distribution of the histogram of the parallax image in a specific pixel value has a variation value equal to or greater than a predetermined reference value, an area having the specific pixel value is detected as an object candidate.

delete

The object detecting apparatus according to claim 1, wherein the object detecting unit detects at least one of the pedestrian and the vehicle using an optimized network model in which the structure of the AlexNet model is partially changed.

7. The method of claim 6, wherein the optimized network model comprises five convolutional layers and the number of feature maps is 48, 128, 192, 192, 128, Wherein the number of fully connected layers is 512, 512, and 2.

Acquiring a photographed stereo image from a stereo camera including two or more lenses;
An object candidate detection step of detecting an object candidate based on a disparity image through stereo matching of the stereo image; And
And detecting an object of at least one of a pedestrian and a vehicle in the detected object candidates,
Wherein the object candidate detection step detects an object candidate by analyzing uniformity of a longitudinal distribution of the histogram of the parallax image,
Wherein the object detection step detects at least one of the pedestrian and vehicle among the object candidates using a Convolutional Neural Network (CNN).

delete

9. The object detection method according to claim 8, wherein, when the distribution of the histogram of the parallax image in the specific pixel value has a variation value equal to or greater than a predetermined reference value, an area having the specific pixel value is detected as an object candidate.

delete

9. The method of claim 8, wherein the object detection step detects at least one of the pedestrian and the vehicle using an optimized network model that partially changes the structure of the AlexNet model.

14. The method of claim 13, wherein the optimized network model comprises five convolutional layers and the number of feature maps is 48, 128, 192, 192, 128, Wherein the number of fully connected layers is 512, 512, or 2.