DE102024111500A1

DE102024111500A1 - Method and system for edge case detection in vehicle camera images

Info

Publication number: DE102024111500A1
Application number: DE102024111500.9A
Authority: DE
Inventors: Tin Stribor Sohn; Lukas Ewecker; Robin Schwager; Tim Brühl
Original assignee: Dr Ing HCF Porsche AG
Current assignee: Dr Ing HCF Porsche AG
Priority date: 2024-04-24
Filing date: 2024-04-24
Publication date: 2025-10-30

Abstract

Die vorliegende Erfindung betrifft ein Verfahren zur Randfall-Detektion in Fahrzeugkamerabildern, bei dem ein erstes großes visuelles Sprachmodell (22) mit Bild-Wort-Paaren trainiert wird, wobei ein jeweiliges Wort einem Katalog von Verkehrsszenarien (21) entnommen wird, bei dem ein zweites großes visuelles Sprachmodell (32) mit Bild-Wort-Paaren trainiert wird, wobei ein jeweiliges Wort einem allgemeinen Wörterbuch (31) entnommen wird, bei dem während einer Fahrt eines Fahrzeugs von mindestens einer auf ein Fahrzeugvorfeld ausgerichteten Fahrzeugkamera in ständiger Ausführung Umgebungsbilder (11) aufgenommen und an eine im Fahrzeug angeordnete Auswertungseinheit geleitet werden, bei dem durch die Auswertungseinheit ein jeweiliges Umgebungsbild gemäß dem ersten und zweiten großen visuellen Sprachmodell analysiert werden und mindestens ein darin identifiziertes Bild-Wort-Paar einer ersten und entsprechend zweiten Analysegruppe zugeordnet wird, bei dem mindestens ein nicht im Katalog von Verkehrsszenarien enthaltenes Bild-Wort-Paar der zweiten Analysegruppe einer Differenzgruppe zugeordnet wird, und bei dem jeweilige Analysedaten und ein jeweiliges Umgebungsbild zu dem mindestens einen Bild-Wort-Paar der Differenzgruppe als ein Randfall-Szenario an einen Cloud-Server (50) übermittelt (45) werden. Ferner wird ein System, auf welchem das Verfahren ausführbar ist, vorgestellt. The present invention relates to a method for edge case detection in vehicle camera images, in which a first large visual language model (22) is trained with image-word pairs, wherein a respective word is taken from a catalog of traffic scenarios (21), in which a second large visual language model (32) is trained with image-word pairs, wherein a respective word is taken from a general dictionary (31), in which, during a vehicle's journey, environmental images (11) are continuously recorded by at least one vehicle camera directed towards the vehicle's forecourt and transmitted to an evaluation unit arranged in the vehicle, in which the evaluation unit analyzes a respective environmental image according to the first and second large visual language models and assigns at least one image-word pair identified therein to a first and corresponding second analysis group, in which at least one image-word pair not included in the catalog of traffic scenarios is assigned to a difference group of the second analysis group, and in which respective analysis data and a respective environmental image at least one image-word pair of the difference group is transmitted as a marginal case scenario to a cloud server (50) (45). Furthermore, a system on which the procedure can be executed is presented.

Description

Die vorliegende Erfindung betrifft ein Verfahren zu einer Randfall-Detektion in Fahrzeugkamerabildern. Ferner wird ein System, auf welchem das Verfahren ausführbar ist, vorgestellt.The present invention relates to a method for edge case detection in vehicle camera images. Furthermore, a system on which the method can be implemented is presented.

Es gibt im Straßenverkehr eine Vielzahl untypischer Verkehrsszenarien, bspw. dass ein Rind über eine Autobahn läuft, welche für ein Szenario-basiertes Testen bei einer Erstellung eines Szenarien- und Testkatalogs für eine autonome Fahrfunktion nicht mit aufgenommen werden können, da es sich um bis zu ihrem Auftreten unbekannte Szenarien handelt. Eine Identifikation solcher Randfälle, bspw. in von Fahrzeugkameras bereitgestellten Bildern bzw. Daten, ist nicht trivial, da schon im Vorfeld Objektdetektoren anhand von bereits bekannten Szenarien für eine Erkennung bestimmter darin enthaltener Objekte, bspw. Fußgänger oder Fahrzeuge, trainiert sind und dann eine semantische Information, welche einen bis dato unbekannten Randfall - im Englischen als Corner-Case oder Edge-Case bezeichnet - beschreibt, ausbleibt. Zudem stellt eine Abarbeitung der schieren Menge an bspw. von einer Fahrzeugflotte bei Fahrten aufgezeichneten Bildern der jeweiligen Fahrzeugkameras ein datenanalytisches Problem dar.There are numerous atypical traffic scenarios in road traffic, such as a cow running across a highway. These cannot be included in scenario-based testing when creating a scenario and test catalog for an autonomous driving function because they are unknown until they occur. Identifying such edge cases, for example, in images or data provided by vehicle cameras, is not trivial. This is because object detectors are already trained on known scenarios to recognize specific objects, such as pedestrians or vehicles. Consequently, semantic information describing a previously unknown edge case—referred to as a corner case or edge case—is lacking. Furthermore, processing the sheer volume of images recorded by the individual vehicle cameras of, for example, a fleet of vehicles during journeys presents a significant data analysis challenge.

Ein visuelles Sprachmodell, abgekürzt mit VLM für Englisch „visual language model“, oder auch ein sogenanntes großes visuelles Sprachmodell mit einem gegenüber dem VLM erweiterten Wortschatz und mit LVLM für Englisch „large visual language model“ bezeichnet, sind Anwendungen des maschinellen Lernens, bei dem ein neuronales Netz darauf trainiert wird, visuelle Inhalte von einzelnen Bildern oder von Bildabläufen zu erkennen und sprachlichen Begriffe zuzuordnen.A visual language model, abbreviated VLM for "visual language model", or a so-called large visual language model with a vocabulary expanded compared to the VLM and referred to as LVLM for "large visual language model", are applications of machine learning in which a neural network is trained to recognize visual content from individual images or image sequences and assign it to linguistic terms.

Die Druckschrift CN 117073701 A offenbart ein Verfahren zum Navigieren nach Sprachbefehl anhand von visuellen Ortsmarken.The printed matter CN 117073701 A reveals a method for navigating by voice command using visual landmarks.

In der Druckschrift WO 2022/187063 A1 wird eine visuelle Eingabe durch die Verwendung eines VLM beschrieben.In the printed publication WO 2022/187063 A1 Visual input is described using a VLM.

Die Druckschrift US 2023/0154213 A1 offenbart ein Verfahren zur Bilderkennung. Ein LVM wird hierzu mit Wort-Bild-Paaren trainiert und lernt, Objekten Beschreibungs-Klassen zuzuordnen.The printed matter US 2023/0154213 A1 It reveals a method for image recognition. An LVM is trained with word-image pairs and learns to assign description classes to objects.

Vor diesem Hintergrund ist es eine Aufgabe der vorliegenden Erfindung, ein Verfahren zur Identifizierung von Randfall-Szenarien vorzustellen, bei dem eine reduzierte Datenmenge an einen Cloud-Server übermittelt wird. Die Identifizierung eines Randfall-Szenarios in Fahrzeugkamerabildern soll bereits in dem jeweiligen Fahrzeug erfolgen, so dass nur die zur Beschreibung des Randfalls notwendigen Daten anfallen. Ferner soll ein System vorgestellt werden, mit dem das Verfahren ausgeführt werden kann.Against this background, an object of the present invention is to present a method for identifying edge case scenarios in which a reduced amount of data is transmitted to a cloud server. The identification of an edge case scenario in vehicle camera images is intended to take place directly in the vehicle itself, so that only the data necessary to describe the edge case is generated. Furthermore, a system with which the method can be carried out is presented.

Zur Lösung der voranstehend genannten Aufgabe wird ein Verfahren zur Randfall-Detektion in Fahrzeugkamerabildern vorgeschlagen, bei dem ein erstes großes visuelles Sprachmodell mit Bild-Wort-Paaren trainiert wird, wobei ein jeweiliges Wort einem Katalog von Verkehrsszenarien entnommen wird. Zudem wird ein zweites großes visuelles Sprachmodell mit Bild-Wort-Paaren trainiert, wobei ein jeweiliges Wort einem allgemeinen Wörterbuch entnommen wird. Während einer Fahrt eines Fahrzeugs werden von mindestens einer auf ein Fahrzeugvorfeld ausgerichteten Fahrzeugkamera in ständiger Ausführung Umgebungsbilder aufgenommen und an eine im Fahrzeug angeordnete Auswertungseinheit geleitet. Durch die Auswertungseinheit wird ein jeweiliges Umgebungsbild gemäß dem ersten großen visuellen Sprachmodell analysiert und mindestens ein darin identifiziertes Bild-Wort-Paar einer ersten Analysegruppe zugeordnet. Durch die Auswertungseinheit wird das jeweilige Umgebungsbild gemäß dem zweiten großen visuellen Sprachmodell analysiert und mindestens ein darin identifiziertes Bild-Wort-Paar einer zweiten Analysegruppe zugeordnet. Mindestens ein nicht im Katalog von Verkehrsszenarien enthaltenes Bild-Wort-Paar der zweiten Analysegruppe wird einer Differenzgruppe zugeordnet. Jeweilige Analysedaten und ein jeweiliges Umgebungsbild zu dem mindestens einen Bild-Wort-Paar der Differenzgruppe werden als ein Randfall-Szenario an einen Cloud-Server übermittelt. Dadurch wird vorteilhaft eine zu übermittelnde Zahl von Umgebungsbildern durch Vorauswertung im Fahrzeug auf Bilder bzw. Daten zu Randfällen reduziert.To solve the aforementioned problem, a method for edge case detection in vehicle camera images is proposed. This method involves training a first large visual language model with image-word pairs, where each word is taken from a catalog of traffic scenarios. A second large visual language model is also trained with image-word pairs, where each word is taken from a general dictionary. During a vehicle's journey, at least one vehicle camera, focused on the area in front of the vehicle, continuously captures images of the surroundings and transmits them to an evaluation unit located within the vehicle. The evaluation unit analyzes each image of the surroundings according to the first large visual language model and assigns at least one identified image-word pair to a first analysis group. The evaluation unit then analyzes each image of the surroundings according to the second large visual language model and assigns at least one identified image-word pair to a second analysis group. At least one image-word pair from the second analysis group that is not included in the catalog of traffic scenarios is assigned to a difference group. Each analysis data set and corresponding environmental image for at least one image-word pair of the difference group are transmitted to a cloud server as a boundary case scenario. This advantageously reduces the number of environmental images to be transmitted by pre-evaluating them in the vehicle, focusing on images or data relevant to boundary cases.

Mittels des erfindungsgemäßen Verfahrens können im Fahrzeugvorfeld aufgenommene ungewöhnliche Situationen in den Fahrzeugkamerabildern identifiziert werden und damit eine Rückmeldung der aufgenommenen Bilder an den Cloud-Server bzw. von diesem an einen zentralen Server, bspw. stationiert beim Fahrzeughersteller oder in einem Entwicklungszentrum für autonome Fahrsysteme bzw. Fahrerassistenzsysteme, auf ein verglichen mit einer Gesamtheit aller während Fahrten bspw. einer Fahrzeugflotte aufgenommenen Bilder weitaus geringeres Maß eingeschränkt werden. Weiter vorteilhaft erübrigt sich damit auch eine zusätzlich datenanalytische Behandlung der übermittelten Daten, um solche Randfall-Szenarien herausfinden zu müssen.Using the method according to the invention, unusual situations recorded in the vehicle's camera images can be identified in the vehicle's approach path. This significantly reduces the amount of recorded images that need to be sent back to the cloud server or from there to a central server, e.g., located at the vehicle manufacturer's premises or in a development center for autonomous driving systems or driver assistance systems, compared to the totality of all images recorded during journeys of, for example, a fleet of vehicles. Furthermore, this also eliminates the need for additional data analysis of the transmitted data to identify such edge case scenarios.

Es ist denkbar, dass die übermittelten Bilder bzw. Daten zu dem Randfall-Szenario unmittelbar zur Entwicklung eines Verkehrsszenarios und entsprechenden Fahrvorgaben für das autonome Fahrsystem herangezogen werden. Es ist weiter denkbar, dass diese Vorgänge automatisiert auf einem virtuellen Teststand zur Validierung autonomer Fahrsysteme ausgeführt werden.It is conceivable that the transmitted images or data relating to the edge case scenario could directly contribute to the development of a traffic scenario and the... The driving instructions for the autonomous driving system can be used. Furthermore, it is conceivable that these processes could be automated on a virtual test bench for the validation of autonomous driving systems.

Es versteht sich, dass das Training des ersten und zweiten großen visuellen Sprachmodells mit einer Vielzahl von Wörtern durchgeführt wird. Dies erfolgt bei dem Katalog von Verkehrsszenarien mit dem Ziel, alle bekannten Verkehrsszenarien durch Bild-Wort-Paare identifizieren zu können, und bei der Entnahme aus dem allgemeinen Wörterbuch mit dem Ziel, alle möglicherweise im Verkehrsgeschehen vorkommenden Objekte ebenfalls durch entsprechende Bild-Wort-Paare identifizieren zu können. Zu diesen möglicherweise vorkommenden Objekten zählen bspw. alle von Lastkraftwagen transportierbaren Dinge, welche bspw. bei einem Verlust einer Ladung auf einer Straße zum Hindernis werden könnten. Hingegen können weite Bereiche im allgemeinen Wörterbuch zu von Fahrzeugkameras nicht erfassbaren Objekten (bspw. auf Grund ihrer Größe) oder generell nichtmaterielle Dinge beschreibende Wörter (bspw. philosophische Begriffe) zum Training des zweiten großen visuellen Sprachmodells ausgespart werden.It goes without saying that the training of the first and second major visual language models is conducted using a large number of words. This is done with the catalog of traffic scenarios, with the goal of being able to identify all known traffic scenarios through image-word pairs, and with the general dictionary, with the goal of being able to identify all objects that might occur in traffic situations, also through corresponding image-word pairs. These potentially occurring objects include, for example, all items that can be transported by trucks and that could become an obstacle on a road if cargo were to be lost. Conversely, large sections of the general dictionary can be excluded from the training of the second major visual language model, containing words describing objects that cannot be detected by vehicle cameras (e.g., due to their size) or words describing non-material things in general (e.g., philosophical terms).

In einer Ausführungsform des erfindungsgemäßen Verfahrens werden jeweiligen Wörtern eine jeweilige Koordinate in einem auf Sprache bezogenen Phasenraum zugewiesen. Zwischen einzelnen Koordinaten wird ein Entfernungsmaß gebildet. In one embodiment of the method according to the invention, each word is assigned a respective coordinate in a phase space related to language. A distance measure is established between individual coordinates.

Zwei Wörter werden synonym gesetzt, falls durch das zu ihnen gebildete Entfernungsmaß eine vorgegebene Obergrenze unterschritten wird. Das Entfernungsmaß kann in einem lexikalisch-semantischen Wortnetz definiert werden, wie es bspw durch WordNet® (https://wordnet.princeton.edu/) bereitgestellt ist.Two words are considered synonymous if the distance measure calculated for them falls below a predefined upper limit. This distance measure can be defined in a lexical-semantic word network, such as that provided by WordNet® (https://wordnet.princeton.edu/).

In einer weiteren Ausführungsform des erfindungsgemäßen Verfahrens wird von dem Cloud-Server das von dem Fahrzeug übermittelte Randfall-Szenario einer Instanz zur Entwicklung einer entsprechenden Fahrfunktion zugeleitet. Diese Instanz ist bspw. ein virtueller Teststand zur Entwicklung autonomer Fahrsysteme, auf dem eine Vielzahl von Fahrfunktionen als Reaktion auf das Auftreten des Randfall-Szenarios durchprobiert werden und nur diejenigen zur Umsetzung in dem autonomen Fahrsystem verbleiben, welche eine ungestörte Weiterfahrt ermöglichen.In a further embodiment of the method according to the invention, the edge case scenario transmitted by the vehicle is forwarded from the cloud server to an instance for the development of a corresponding driving function. This instance is, for example, a virtual test bench for the development of autonomous driving systems, on which a multitude of driving functions are tested in response to the occurrence of the edge case scenario, and only those that enable uninterrupted driving are retained for implementation in the autonomous driving system.

In einer fortgesetzt weiteren Ausführungsform des erfindungsgemäßen Verfahrens wird nach erfolgter Entwicklung einer zu dem Randfall-Szenario angepassten Fahrfunktion der Katalog von Verkehrsszenarien um das Randfall-Szenario erweitert. Der Katalog von Verkehrsszenarien wird mit entsprechenden den jeweiligen Verkehrsszenarien zugeordneten Fahrfunktionen an das Fahrzeug übermittelt.In a further embodiment of the method according to the invention, after the development of a driving function adapted to the edge case scenario, the catalog of traffic scenarios is extended to include the edge case scenario. The catalog of traffic scenarios, along with the corresponding driving functions assigned to each traffic scenario, is transmitted to the vehicle.

In einer noch weiteren Ausführungsform des erfindungsgemäßen Verfahrens wird das Fahrzeug durch ein autonomes Fahrsystem oder ein Fahrerassistenzsystem gemäß dem Katalog von Verkehrsszenarien und jeweilig entsprechender Fahrfunktionen gesteuert.In a further embodiment of the method according to the invention, the vehicle is controlled by an autonomous driving system or a driver assistance system according to the catalog of traffic scenarios and the respective corresponding driving functions.

Ferner wird ein System beansprucht, wobei das System ein Fahrzeug, das mindestens eine auf ein Fahrzeugvorfeld ausgerichtete Fahrzeugkamera, eine Auswertungseinheit mit einem Computerprozessor und ein Speichermedium, ein auf dem Speichermedium abgespeichertes und auf dem Computerprozessor ausführbares erstes und zweites großes visuelles Sprachmodell, und Mittel zur Funkkommunikation aufweist, und einen Cloud-Server umfasst. Das erste große visuelle Sprachmodell ist mit Bild-Wort-Paaren, wobei ein jeweiliges Wort einem Katalog von Verkehrsszenarien entnommen ist, trainiert. Das zweite große visuelle Sprachmodell ist mit Bild-Wort-Paaren, wobei ein jeweiliges Wort einem allgemeinen Wörterbuch entnommen ist, trainiert. Die Auswertungseinheit ist dazu konfiguriert, in ständiger Ausführung

• während einer Fahrt des Fahrzeugs von der mindestens einen Fahrzeugkamera Umgebungsbilder zu empfangen,
• ein jeweiliges Umgebungsbild gemäß dem ersten großen visuellen Sprachmodell zu analysieren und mindestens ein darin identifiziertes Bild-Wort-Paar einer ersten Analysegruppe zuzuordnen,
• das jeweilige Umgebungsbild gemäß dem zweiten großen visuellen Sprachmodell zu analysieren und mindestens ein darin identifiziertes Bild-Wort-Paar einer zweiten Analysegruppe zuzuordnen,
• mindestens ein nicht im Katalog von Verkehrsszenarien enthaltenes Bild-Wort-Paar der zweiten Analysegruppe einer Differenzgruppe zuzuordnen, und
• jeweilige Analysedaten und das jeweiliges Umgebungsbild zu dem mindestens einen Bild-Wort-Paar der Differenzgruppe als ein Randfall-Szenario per Funkkommunikation an den Cloud-Server zu übermitteln.

Furthermore, a system is claimed, comprising a vehicle with at least one vehicle camera oriented towards the area in front of the vehicle, an evaluation unit with a computer processor and a storage medium, a first and second major visual language model stored on the storage medium and executable on the computer processor, and means for radio communication, and a cloud server. The first major visual language model is trained with image-word pairs, where one word in each pair is taken from a catalog of traffic scenarios. The second major visual language model is trained with image-word pairs, where one word in each pair is taken from a general dictionary. The evaluation unit is configured to run continuously.

• to receive images of the surroundings from at least one vehicle camera while the vehicle is in motion,
• to analyze a given environmental image according to the first major visual language model and to assign at least one identified image-word pair to a first analysis group,
• to analyze the respective environment according to the second major visual language model and to assign at least one image-word pair identified therein to a second analysis group,
• to assign at least one image-word pair not included in the catalog of traffic scenarios to the second analysis group of a difference group, and
• to transmit the respective analysis data and the respective environmental image to the cloud server via radio communication as a marginal case scenario for at least one image-word pair of the difference group.

Das erfindungsgemäße System ermöglicht damit vorteilhaft eine Reduzierung einer Datenmenge zur Übertragung von Bildern aus der mindestens einen auf das Fahrzeugvorfeld ausgerichteten Fahrzeugkamera an den Cloud-Server bzw. zentralen Server zur Analyse von Randfall-Szenarien. Dies wird mittels einer im Fahrzeug eingebetteten Auswertungseinheit (Englisch „Embedding“) zur Extraktion semantischer Attribute aus den Kamerabildern erreicht. So können neue untypische (da bislang im Katalog von Verkehrsszenarien nicht vorhandene) Objekte erkannt werden. Zudem können zwar bereits bekannte (da im allgemeinen Wörterbuch vorhandene) Objekte (bspw. ein Rind, s. 1), welche sich aber in einem untypischen Kontext befinden (bspw. auf Fahrbahn), identifiziert werden.The system according to the invention thus advantageously enables a reduction in the amount of data required for transmitting images from the at least one vehicle camera aimed at the area in front of the vehicle to the cloud server or central server for analyzing edge case scenarios. This is achieved by means of an evaluation unit embedded in the vehicle (English "embedding") for extracting semantic attributes from the camera images. This allows new, atypical objects (since they are not yet present in the catalog of traffic scenarios) to be recognized. In addition, objects that are already known (since they are present in the general dictionary) (e.g., a cow, see...) can be identified. 1 ), which are located in an atypical context (e.g. on the roadway), are identified.

In einer Ausgestaltung des erfindungsgemäßen Systems sind jeweiligen Wörtern eine jeweilige Koordinate in einem auf Sprache bezogenen Phasenraum zugewiesen. Zwischen einzelnen Koordinaten ist ein Entfernungsmaß gebildet. Die Auswertungseinheit ist dazu konfiguriert, zwei Wörter, deren zu ihnen gebildetes Entfernungsmaß eine vorgegebene Obergrenze unterschreitet, synonym zu setzen. In one embodiment of the system according to the invention, each word is assigned a specific coordinate in a language-related phase space. A distance measure is established between individual coordinates. The evaluation unit is configured to treat two words whose distance measure falls below a predefined upper limit as synonymous.

In einer weiteren Ausgestaltung des erfindungsgemäßen Systems ist der Cloud-Server dazu konfiguriert, das von dem Fahrzeug übermittelte Randfall-Szenario einer Instanz zur Entwicklung einer entsprechenden Fahrfunktion zuzuleiten.In a further embodiment of the system according to the invention, the cloud server is configured to forward the edge case scenario transmitted by the vehicle to an instance for the development of a corresponding driving function.

In einer fortgesetzt weiteren Ausgestaltung des erfindungsgemäßen Systems ist der Cloud-Server dazu konfiguriert, eine zu dem Randfall-Szenario von der Instanz entwickelte Fahrfunktion und den um das Randfall-Szenario erweiterten Katalog von Verkehrsszenarien mit entsprechenden den jeweiligen Verkehrsszenarien zugeordneten Fahrfunktionen an das Fahrzeug zu übermitteln.In a further embodiment of the system according to the invention, the cloud server is configured to transmit a driving function developed by the instance for the edge case scenario and the catalog of traffic scenarios extended by the edge case scenario with corresponding driving functions assigned to the respective traffic scenarios to the vehicle.

In einer noch weiteren Ausgestaltung des erfindungsgemäßen Systems, umfasst das System ein autonomes Fahrsystem oder ein Fahrerassistenzsystem. Das autonome Fahrsystem oder das Fahrerassistenzsystem sind dazu konfiguriert, gemäß dem Katalog von Verkehrsszenarien entsprechende Fahrfunktionen auszuführen.In a further embodiment of the system according to the invention, the system comprises an autonomous driving system or a driver assistance system. The autonomous driving system or the driver assistance system is configured to execute corresponding driving functions according to the catalog of traffic scenarios.

Weitere Vorteile und Ausgestaltungen der Erfindung ergeben sich aus der Beschreibung und der beiliegenden Zeichnung.Further advantages and embodiments of the invention will become apparent from the description and the accompanying drawing.

Es versteht sich, dass die voranstehend genannten und die nachstehend noch zu erläuternden Merkmale nicht nur in der jeweils angegebenen Kombination, sondern auch in anderen Kombinationen oder in Alleinstellung verwendbar sind, ohne den Rahmen der vorliegenden Erfindung zu verlassen.

1 zeigt ein Ablaufschema zu einer Ausführungsform des erfindungsgemäßen Verfahrens.

It is understood that the features mentioned above and those to be explained below can be used not only in the combinations specified, but also in other combinations or on their own, without leaving the scope of the present invention.

1 shows a flowchart for an embodiment of the method according to the invention.

In 1 wird ein Ablaufschema 10 zu einer Ausführungsform des erfindungsgemäßen Verfahrens gezeigt. Zunächst werden vor einem Fahrbetrieb, vorzugsweise vor einer ersten Inbetriebnahme des Fahrzeugs, das erste große visuelle Sprachmodell 22 und das zweite große visuelle Sprachmodell 32 mit jeweiligen Bild-Wort-Paaren trainiert. Hierzu wird beim ersten großen visuellen Sprachmodell 22 ein jeweiliges Wort zu einem jeweiligen Bild-Wort-Paar aus einem Katalog von Verkehrsszenarien 21 entnommen und beim zweiten großen visuellen Sprachmodell 32 ein jeweiliges Wort zu einem jeweiligen Bild-Wort-Paar einem allgemeinen Wörterbuch 31 entnommen. Für jeweilige Verkehrsszenarien liegen jeweilige Fahrszenarien vor, mit welchen bspw. ein autonomes Fahrsystem oder ein autonomes Fahrerassistenzsystem trainiert werden kann, um das jeweilige Verkehrsszenario zu bewältigen. Von einer auf ein Fahrzeugvorfeld ausgerichteten Fahrzeugkamera wird während einer Fahrt in fortgesetzter Ausführung ein jeweiliges Umgebungsbild 11 aufgenommen, welches in einem hier dargestellten beispielhaften Verkehrsszenario auf einer vom Betrachter linken Fahrbahnseite ein Rind 14 und eine Person 15 zeigt. Das Umgebungsbild 11 wird einer im Fahrzeug angeordneten Auswertungseinheit zugeleitet, welche einerseits das Umgebungsbild 11 mit dem ersten großen visuellen Sprachmodell 22 analysiert, andererseits das Umgebungsbild 11 mit dem zweiten großen visuellen Sprachmodell 32 analysiert. Von der Auswertungseinheit wird einem jeweiligen Wort zu einem jeweilig in dem Umgebungsbild 11 identifizierten Bild ein Ähnlichkeitsmaß 23, 33 zugeordnet, wobei ein Wert für das Ähnlichkeitsmaß 23, 33 durch das jeweilige große visuelle Sprachmodell 22, 32 entsprechend einer Übereinstimmung zwischen dem identifizierten Bild und dem entsprechenden Bild des trainierten Bild-Wort-Paares erfolgt. In dem beispielhaft gezeigten Umgebungsbild 11 wird durch das erste große visuelle Sprachmodell 22 das Bild einer Landstraße 24, einer Person 25 und einer Fahrspur 26 erkannt und dem jeweilig assoziierten Wort (Landstraße 24, Person 25, Fahrspur 26) der jeweilige Wert für das Ähnlichkeitsmaß 23, in der 1 repräsentiert durch eine jeweilige Balkenhöhe, zugewiesen. Die gefundenen Wörter (Landstraße 24, Person 25, Fahrspur 26) werden einer ersten Analysegruppe zugeordnet. Desgleichen wird im Umgebungsbild 11 durch das zweite große visuelle Sprachmodell 32 das Bild eines Rindes 34, eines Mannes 35 und einer Straße 36 erkannt, wobei für das Bild des Rindes 34 ein vergleichsweise großer Wert im Ähnlichkeitsmaß erhalten wird. Auch hier werden die gefundenen Wörter (Rind 34, Mann 35, Straße 36) einer zweiten Analysegruppe zugeordnet. Im nächsten Schritt werden einander ähnliche Wörter in beiden Analysegruppen synonym gesetzt. Hierzu wird ein Entfernungsmaß zwischen in einem Wortraum angeordneten jeweiligen Wörtern betrachtet. So fällt bspw. das Entfernungsmaß zwischen Landstraße 24 und Straße 36 in Anbetracht einer Verkehrsszenarienbeschreibung so klein aus, dass beide Begriffe synonym gesetzt werden und bspw. das Wort Straße 36 gestrichen wird und das Wort Landstraße 24 verbleibt. Solche Vorgänge können mit Hilfe eines lexikalisch-semantischen Wortnetzes automatisiert werden, indem eine Obergrenze für das Entfernungsmaß vorgegeben wird, unterhalb derer Wörter synonym gesetzt werden. Die Obergrenze wird bspw. entsprechend einem mittleren Entfernungsmaß innerhalb von Wortfamilien gewählt. In einem Zuweisungsschritt der Ausführungsform des erfindungsgemäßen Verfahrens werden aus den verbliebenen Wörtern beider Analysegruppen diejenigen Wörter, welche im Katalog von Verkehrsszenarien enthalten sind, einer Verkehrsszenariengruppe 41 zugeordnet 42, 43 (was intrinsisch bereits für alle Wörter der ersten Analysegruppe der Fall ist). Andererseits bilden diejenigen Wörter der zweiten Analysegruppe, welche nicht im Katalog von Verkehrsszenarien enthalten sind, eine Differenzgruppe. Im dargestellten Fall finden sich Landstraße 24, Person 25, Fahrspur 26 und Mann 35 in der Verkehrsszenariengruppe 41 wieder, während Rind 34 die Differenzgruppe bildet. So dann wird an einen Cloud-Server 50 die Differenzgruppe bzw. eine darin enthaltene Information zu dem identifizierten Bild-Wort-Paar, hier Rind 34, zusammen mit dem Bild der Fahrzeugkamera, in dem das Bild-Wort-Paar als Randfall identifiziert wurde, übermittelt 45.In 1 A flowchart 10 for an embodiment of the method according to the invention is shown. First, before driving operation, preferably before the vehicle is first put into operation, the first large visual language model 22 and the second large visual language model 32 are trained with their respective image-word pairs. For this purpose, for the first large visual language model 22, a word corresponding to a given image-word pair is taken from a catalog of traffic scenarios 21, and for the second large visual language model 32, a word corresponding to a given image-word pair is taken from a general dictionary 31. For each traffic scenario, corresponding driving scenarios are available, which can be used, for example, to train an autonomous driving system or an autonomous driver assistance system to handle the respective traffic scenario. During a journey, a vehicle camera focused on the area in front of the vehicle continuously captures an image 11 of the surroundings. In an exemplary traffic scenario presented here, this image shows a cow 14 and a person 15 on the left side of the road (from the observer's perspective). The image 11 is transmitted to an evaluation unit located in the vehicle. This unit analyzes the image 11 using both the first large visual language model 22 and the second large visual language model 32. The evaluation unit assigns a similarity measure 23, 33 to each word associated with a corresponding image identified in the image 11. The value for the similarity measure 23, 33 is determined by the respective large visual language model 22, 32 according to the degree of similarity between the identified image and the corresponding image of the trained image-word pair. In the exemplary environment image 11 shown, the first large visual language model 22 recognizes the image of a country road 24, a person 25 and a lane 26 and assigns the respective value for the similarity measure 23 to the associated word (country road 24, person 25, lane 26). 1 represented by a respective bar height, the words found (country road 24, person 25, lane 26) are assigned to a first analysis group. Similarly, in the environment image 11, the second large visual language model 32 recognizes the image of a cow 34, a man 35, and a road 36, with the image of the cow 34 receiving a comparatively high similarity value. Here, too, the found words (cow 34, man 35, road 36) are assigned to a second analysis group. In the next step, similar words in both analysis groups are made synonymous. For this purpose, a distance measure between the respective words arranged in a word space is considered. For example, the distance measure between country road 24 and road 36 is so small in light of a traffic scenario description that both terms are considered synonymous. Words can be treated as synonyms, for example, the word "Straße 36" (Street 36) is deleted, leaving only the word "Landstraße 24" (Country Road 24). Such processes can be automated using a lexical-semantic word network by specifying an upper limit for the distance measure below which words are treated as synonyms. The upper limit is chosen, for example, according to an average distance measure within word families. In an assignment step of the embodiment of the method according to the invention, those words from the remaining words of both analysis groups that are contained in the catalog of traffic scenarios are assigned to a traffic scenario group 41 (42, 43) (which is already intrinsically the case for all words of the first analysis group). On the other hand, those words of the second analysis group that are not contained in the catalog of traffic scenarios form a difference group. In the illustrated case, "Landstraße 24", "Person 25", "Fahrlane 26", and "Mann 35" are found in traffic scenario group 41, while "Rind 34" forms the difference group. Then the difference group or information contained therein relating to the identified image-word pair, here Rind 34, is transmitted to a cloud server 50 together with the image from the vehicle camera in which the image-word pair was identified as a marginal case 45.

BezugszeichenlisteList of reference symbols

1010: AblaufschemaFlowchart
1111: Umgebungsbilder von FahrzeugkameraSurround images from vehicle camera
1414: RindBeef
1515: Personperson
2121: Entitäten als Wörter aus SzenarienkatalogEntities as words from scenario catalog
2222: Trainiertes großes Bild-SprachmodellTrained large image-language model
2323: Ähnlichkeitsmaß für erste AnalysegruppeSimilarity measure for first analysis group
2424: Personperson
2525: Landstraßecountry road
2626: Fahrstraßeroad
3131: Entitäten als Wörter aus allgemeinem WörterbuchEntities as words from a general dictionary
3232: Trainiertes großes Bild-SprachmodellTrained large image-language model
3333: Ähnlichkeitsmaß für zweite AnalysegruppeSimilarity measure for second analysis group
3434: RindBeef
3535: MenschenPeople
3636: StraßeStreet
4040: ZuweisungsschrittAssignment step
4141: VerkehrsszenariengruppeTraffic Scenarios Group
4242: ZuführungSupply
4343: ZuführungSupply
4545: Übermittlungtransmission
5050: Cloud-ServerCloud server

ZITATE ENTHALTEN IN DER BESCHREIBUNGQUOTES CONTAINED IN THE DESCRIPTION

Diese Liste der vom Anmelder aufgeführten Dokumente wurde automatisiert erzeugt und ist ausschließlich zur besseren Information des Lesers aufgenommen. Die Liste ist nicht Bestandteil der deutschen Patent- bzw. Gebrauchsmusteranmeldung. Das DPMA übernimmt keinerlei Haftung für etwaige Fehler oder Auslassungen.This list of documents cited by the applicant was automatically generated and is included solely for the reader's convenience. The list is not part of the German patent or utility model application. The DPMA accepts no liability for any errors or omissions.

Zitierte PatentliteraturCited patent literature

CN 117073701 A [0004]
WO 2022/187063 A1 [0005]
US 2023/0154213 A1 [0006]

Claims

Method for edge case detection in vehicle camera images, wherein a first large visual language model (22) is trained with image-word pairs, each word being taken from a catalog of traffic scenarios (21), wherein a second large visual language model (32) is trained with image-word pairs, each word being taken from a general dictionary (31), wherein during a vehicle journey, ambient images (11) are continuously recorded by at least one vehicle camera directed towards the vehicle's forecourt and transmitted to an evaluation unit arranged in the vehicle, wherein the evaluation unit analyzes each ambient image (11) according to the first large visual language model (22) and assigns at least one image-word pair identified therein to a first analysis group, wherein the evaluation unit analyzes the respective ambient image (11) according to the second large visual language model (32) and assigns at least one image-word pair identified therein to a second analysis group is assigned, in which at least one image-word pair not included in the catalog of traffic scenarios (21) of the second analysis group (32) is assigned to a difference group, and in which respective analysis data and a respective environment image (11) to which at least one image-word pair of the difference group is transmitted as a marginal case scenario to a cloud server (50) (45), thereby reducing the number of environment images (11) to be transmitted to marginal cases by pre-evaluation in the vehicle.

Procedure according to Claim 1 , in which each word is assigned a respective coordinate in a language-related phase space, in which a distance measure is formed between individual coordinates, and in which two words are made synonymous if a predetermined upper limit is not exceeded by the distance measure formed for them.

Method according to one of the preceding claims, wherein the edge case scenario transmitted by the vehicle is forwarded by the cloud server to an instance for the development of a corresponding driving function.

Procedure according to Claim 3 , in which, after the development of a driving function adapted to the edge case scenario, the catalog of traffic scenarios is extended to include the edge case scenario, and in which the catalog of traffic scenarios (21) with corresponding driving functions assigned to the respective traffic scenarios is transmitted to the vehicle.

Method according to one of the preceding claims, wherein the vehicle is controlled by an autonomous driving system or a driver assistance system in accordance with the catalogue of traffic scenarios (21) and respective corresponding driving functions.

System comprising a cloud server and a vehicle, comprising at least one vehicle camera facing the area in front of the vehicle, an evaluation unit with a computer processor and a storage medium, a first and second major visual language model (22, 32) stored on the storage medium and executable on the computer processor, and means for radio communication, wherein the first major visual language model (22) is trained with picture-word pairs, each word being taken from a catalog of traffic scenarios (21), wherein the second major visual language model (32) is trained with picture-word pairs, each word being taken from a general dictionary (31), wherein the evaluation unit is configured to continuously: • receive environmental images (11) from the at least one vehicle camera while the vehicle is driving, • analyze each environmental image (11) according to the first major visual language model (22) and assign at least one picture-word pair identified therein to a first analysis group to assign, • to analyze the respective environment image (11) according to the second large visual language model (32) and to assign at least one image-word pair identified therein to a second analysis group, • to assign at least one image-word pair not included in the catalog of traffic scenarios to the second analysis group of a difference group, and • to transmit the respective analysis data and the respective environment image (11) to the at least one image-word pair of the difference group as a marginal case scenario via radio communication to the cloud server (50) (45).

System according Claim 6 , wherein each word is assigned a respective coordinate in a language-related phase space, wherein a distance measure is formed between individual coordinates, and wherein the evaluation unit is configured to treat two words whose distance measure is less than a specified upper limit as synonymous.

system according to one of the Claim 6 or 7 , where the cloud server is configured to handle the edge case scenario transmitted by the vehicle to forward the information to an authority for the development of a corresponding driving function.

System according Claim 8 , wherein the cloud server is configured to transmit to the vehicle a driving function developed by the instance for the edge case scenario and the catalog of traffic scenarios (21) extended by the edge case scenario with corresponding driving functions assigned to the respective traffic scenarios.

system according to one of the Claims 6 until 9 , comprising an autonomous driving system or a driver assistance system, wherein the autonomous driving system or the driver assistance system is configured to perform appropriate driving functions in accordance with the catalogue of traffic scenarios (21).