CN120612768A

CN120612768A - An intelligent early warning system and method for pedestrians and non-motor vehicles entering toll stations

Info

Publication number: CN120612768A
Application number: CN202510779572.2A
Authority: CN
Inventors: 黄文涵; 罗晟; 曾俊铖; 郑传钊; 王歆远
Original assignee: Fujian Expressway Science And Technology Innovation Research Institute Co ltd
Current assignee: Fujian Expressway Science And Technology Innovation Research Institute Co ltd
Priority date: 2025-06-12
Filing date: 2025-06-12
Publication date: 2025-09-09

Abstract

The invention relates to the technical field of intelligent traffic monitoring, in particular to an intelligent early warning system and method for pedestrian and non-motor vehicle intrusion in a toll station, wherein a terminal layer comprises a sensing terminal, a directional sound column and the like, and the sensing terminal comprises a camera, a hard disk video recorder and the like; the system comprises a cloud end layer, a dynamic hierarchical early warning module, a closed loop optimization mechanism, a normal report/false report feedback iterative visual small model, a dynamic hierarchical early warning module and a closed loop optimization mechanism, wherein the dynamic hierarchical early warning module is used for triggering three-level early warning according to the target track, the position and the residence time and automatically degrading along with the presence of workers, and the closed loop optimization mechanism is used for collecting the normal report/false report feedback iterative visual small model. The cloud side end coordination method realizes cloud side end coordination, integrates multiple models and algorithms, has high-precision detection, real-time response, dynamic self-adaption and self-optimization capabilities, can effectively prevent intrusion risk, and improves the safety management level of a toll station.

Description

Intelligent early warning system and method for pedestrian and non-motor vehicle intrusion in toll station

Technical Field

The invention relates to the technical field of intelligent traffic monitoring, in particular to an intelligent early warning system and method for pedestrian and non-motor vehicle intrusion in a toll station.

Background

With the rapid development of the traffic infrastructure in China, toll stations are used as key nodes of traffic junction, and the importance of safety management is increasingly highlighted. In actual operation, the phenomenon that pedestrians and non-motor vehicles mistakenly enter a toll station area occurs, and the behavior seriously threatens the safety of an intruder, can interfere with normal traffic order, even causes traffic accidents, and causes casualties and property loss.

Traditional toll station safety management relies on manual inspection and passive monitoring of monitoring cameras. Manual inspection has the problems of insufficient time and space coverage, sudden break-in events are difficult to find and process in real time, and simple video monitoring requires a worker to stare at a monitoring picture at any time, so that the labor cost is high and missed inspection is easily caused by fatigue. Although a part of toll booths introduce a basic video analysis system, the prior art generally has the defects of low detection precision and high false alarm rate, and particularly, the pedestrian and the non-motor vehicle targets are difficult to accurately identify under the scenes of complex illumination, target shielding and the like. In addition, the traditional system lacks a dynamic response mechanism, cannot adjust an early warning strategy according to actual conditions, does not have self-optimizing capability, and is difficult to adapt to the environmental change and traffic flow fluctuation of a toll station.

In recent years, artificial intelligence technology has made remarkable progress in the fields of target detection and image recognition, but application of the technology to pedestrian and non-motor vehicle intrusion early warning in toll stations still faces a plurality of challenges. On one hand, the toll station scene has specificity, such as dense vehicles, frequent personnel flow, complex and changeable environment, and difficult direct adaptation of a general target detection algorithm, and on the other hand, the toll station scene has contradiction of high real-time requirements and limited calculation force of edge equipment, so that a high-precision model is difficult to be deployed locally and efficiently. Meanwhile, most of the prior art is single equipment or independent modules, and architecture design of cloud edge end cooperation is lacking, so that deep fusion of data, algorithm and decision cannot be realized, and the requirements of intellectualization and dynamics of the safety management of a toll station are difficult to meet.

Therefore, the intelligent early warning system and the intelligent early warning method for pedestrian and non-motor vehicle intrusion in the toll station are provided for solving the problems.

Disclosure of Invention

The invention aims to provide an intelligent early warning system and method for pedestrian and non-motor vehicle intrusion in a toll station, so as to solve the problems in the background technology.

In order to achieve the above purpose, the present invention provides the following technical solutions:

an intelligent early warning system for pedestrian and non-motor vehicle intrusion in a toll station, comprising:

the terminal layer is a sensing terminal formed by a camera and a hard disk video recorder which cover a toll station square, and an early warning terminal formed by a directional sound column, a flash lamp and an interphone;

the edge layer is arranged on an edge computing unit of the charging station, is connected with the sensing terminal through a video private network, is connected with the early warning terminal through the Internet of things and is used for executing the following operations in real time:

Decoding a camera video stream, and detecting targets of pedestrians and non-motor vehicles by adopting a small visual model;

Tracking a target and generating a motion track, and predicting a future 3-second track;

intercepting an image area for a pedestrian target, and distinguishing a worker from a common pedestrian through a classification model;

triggering cloud rechecking on the target with the confidence coefficient lower than 0.6;

Cloud layer, including central cloud server, is used for:

performing secondary rechecking on the low-confidence target uploaded by the edge layer through the visual language large model;

Pre-labeling training data by using a large visual language model to train a small visual language model;

automatically generating an electronic fence area based on the toll station video;

Storing historical data and providing a visualization platform;

And the dynamic grading early warning module dynamically triggers grading early warning according to the target predicted track, the position and the residence time, and the early warning level automatically degrades along with the presence of staff.

As a preferred embodiment, the visual language big model adopts a Qwen2.5-VL-32B model, and the following functions are performed through text prompt words:

pre-labeling the image data of the training vision small model, and outputting the target category, the position coordinates and the identity characteristics;

rechecking the low confidence level target;

and analyzing the newly accessed video of the toll station, and outputting the polygonal coordinates of the electronic fence.

As a preferred aspect, the visual small model includes:

after the target detection model YOLO11s-det is pre-trained by an Objects365+COCO data set, adopting toll station scene data transfer learning to detect two targets of pedestrians and non-motor vehicles;

the pedestrian classification model YOLO11s-cls is used for distinguishing three types of pedestrians, namely a worker wearing the reflective vest, and a worker not wearing the reflective vest but wearing the work clothes, and a common pedestrian;

after the model is exported by ONNX format, the model is quantitatively deployed to an edge computing unit through FP16 or INT8, so that the single frame processing delay is ensured to be less than 30 milliseconds;

The pedestrian classification model adopts an improved multi-scale feature fusion mechanism, wherein a plurality of layers of input feature images are multiplied with corresponding dynamic weight matrixes element by element respectively, the weighted feature images are averaged, and the weighted feature images are added with a basic feature image after being processed by a Sigmoid linear unit activation function to generate an enhanced feature image;

the dynamic weight matrix is generated by inputting the input feature map into a multi-layer perceptron network after global average pooling operation is carried out on the input feature map.

As a preferred scheme, the dynamic hierarchical early warning strategy includes:

The first-level early warning is that when a target prediction track breaks into an electronic fence, a directional sound column is triggered to play early warning voice;

Second-level early warning, namely when a target actually enters an electronic fence, increasing the volume of a sound column and sending a treatment instruction to an interphone of a staff;

third-stage early warning, namely starting an explosion lamp and maximizing volume when a target stays in the fence for more than 5 seconds;

The early warning threshold is dynamically adjusted according to traffic flow, wherein the peak period shortens the residence time threshold, and the low flow period prolongs the threshold.

As a preferable scheme, the electronic fence generation mode is as follows:

uploading a video picture newly accessed into the camera to the cloud;

inputting a prompt word into the visual language big model;

and (5) after the manual rechecking, defining the electronic fence.

As a preferred scheme, when the edge layer performs object tracking:

adopting ByteTrack algorithm to correlate the target ID to generate a track point set;

filtering abnormal values by sliding the window smooth track;

Predicting a future 75-frame track based on a least square method;

correcting the predicted track by adopting a space-time error compensation algorithm, namely superposing the product of a time attenuation factor and the acceleration change rate of the historical track on the basis of the original least square method predicted track, and superposing the weighted sum of an environment compensation coefficient and various environment interference vectors;

the environmental interference vector is generated by multiplying the result of multiplying the environmental interference intensity value input Sigmoid function by the interference sensitivity adjustment factor and the interference direction vector.

As a preferred solution, the method further comprises a closed loop optimization mechanism:

The cloud collects forward report/false report feedback of the early warning pictures;

Generating an incremental training data set after manual labeling;

The transition learning iteration of the visual small model is triggered periodically.

A method for intelligent early warning of pedestrian and non-motor vehicle intrusion in a toll station is implemented by using an intelligent early warning system of pedestrian and non-motor vehicle intrusion in the toll station.

According to the technical scheme provided by the invention, the intelligent early warning system and the intelligent early warning method for pedestrian and non-motor vehicle intrusion in the toll station have the beneficial effects that:

the method has the advantages of high precision real-time early warning, enhanced safety protection, integration of a small visual model and a large visual language model, realization of rapid target detection by utilizing a YOLO11s series model, combination of Qwen2.5-VL-32B for rechecking of a low confidence target, great reduction of false alarm rate and false alarm rate, less than 30 milliseconds of single frame processing delay of an edge layer, cooperation of dynamic grading early warning and multi-equipment linkage, formation of full-flow protection from risk prejudgment to emergency treatment, effective avoidance of traffic accidents caused by break-in events, and guarantee of toll station personnel and vehicle safety;

yun Bianduan, a collaborative architecture is adopted, so that the system performance is improved, namely a cloud end layered architecture is adopted, an edge layer is responsible for real-time data processing and local decision making, network dependence is reduced, and the system can still normally operate when the network is disconnected;

The system comprises a dynamic self-adaptive mechanism, a closed-loop optimization mechanism, a dynamic hierarchical early warning threshold, a closed-loop optimization mechanism and a dynamic hierarchical early warning mechanism, wherein the dynamic self-adaptive mechanism enhances environment adaptation, the electronic fence is automatically generated and manually rechecked, and can quickly adapt to the layout change of a toll station;

The cloud end stores historical data and provides a visual platform to support management personnel to remotely monitor the operation state of multiple sites and quickly position the problem; the remote configuration of system parameters and automatic iteration of the model reduce the dependence on professional technicians, reduce the labor and time cost of operation and maintenance, prolong the service life of the equipment and indirectly save the maintenance expenditure;

The multi-technology integration innovation expands application potential, namely, the technology integration of visual model lightweight deployment, space-time track prediction algorithm, man-machine cooperative early warning and the like is not only suitable for toll station scenes, but also can be expanded to pedestrian and non-motor vehicle management in areas such as parking lots, service areas, urban road intersections and the like, and the data accumulation and analysis capability also provides a data basis for traffic flow prediction, potential safety hazard analysis and the like, so that the further development of the intelligent traffic field is assisted.

Drawings

Fig. 1 is a schematic diagram of the whole structure of an intelligent early warning system for pedestrian and non-motor vehicle intrusion in a toll station.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

In order to better understand the above technical solutions, the following detailed description will be given with reference to the accompanying drawings and the specific embodiments.

As shown in fig. 1, an embodiment of the present invention provides an intelligent early warning system for pedestrian and non-motor vehicle intrusion in a toll station, including:

Cloud layer, including central cloud server, is used for:

Storing historical data and providing a visualization platform;

In the embodiment, the terminal layer is used as a 'perception antenna' and an 'execution terminal' for a toll station pedestrian and a non-motor vehicle based on a cloud end architecture to intrude into an intelligent early warning system, and bears key tasks of environment information acquisition and early warning instruction execution, and the terminal layer is comprehensively and deeply explained from the dimensions of functional positioning, composition structure, working principle, application value and the like:

1. Overview of overall functionality:

The terminal layer mainly comprises two core function modules of sensing and early warning, wherein the sensing terminal is used for realizing real-time image acquisition and data storage of pedestrians and non-motor vehicles in a scene through a camera and a hard disk video recorder which cover a toll station square;

2. sub-module composition and function:

and (one) sensing the terminal:

The high-definition camera array is characterized in that a plurality of high-definition network cameras are deployed in key areas (such as entrance gates, charging windows and lane junctions) of a toll station square, an H.265 coding format is adopted, 1080P and above resolution is supported, the frame rate is more than 25fps, clear and smooth acquired pictures are ensured, and the functions of Wide Dynamic Range (WDR), infrared night vision and the like are built in the cameras, so that the high-definition network cameras can adapt to complex illumination environments such as strong light, backlight, night and the like, and all-weather dead-angle-free monitoring coverage is realized;

The storage system formed by the hard disk video recorder adopts a distributed storage architecture, deploys high-capacity NVR (network video recorder) equipment, supports video data storage for at least 30 days, is provided with a disk array (RAID 5/6) technology, ensures the safety and reliability of data storage, supports the functions of real-time writing, searching and playback of video streams, and provides data support for later event tracing and algorithm optimization;

(II) early warning terminal:

The directional sound column is arranged at the two sides of a charging lane, the edge of a square and the like, has directional sound wave propagation characteristics, can accurately cover a target area with early warning sounds, reduces interference to surrounding environments, supports multi-language sound broadcasting (such as mandarin, dialect and English), has a volume adjusting range of 60-120dB, can dynamically adjust the volume intensity according to the early warning level, plays prompt sounds such as 'you enter a dangerous area and leave as soon as possible' when triggering primary early warning, increases the volume and repeatedly broadcasts when triggering secondary early warning, plays high-strength warning sounds when triggering tertiary early warning, and synchronously starts with a flash lamp;

The explosion flash lamp is arranged at the top of a toll booth, a lane portal frame and other obvious positions, has a red/yellow double-color warning light mode, and can flash at the highest frequency in a three-level early warning state, so that the explosion flash lamp is linked with a directional sound column to form visual and auditory double warning, and the deterrent effect on an intrusion target is enhanced;

When the secondary or more early warning triggers, the system automatically sends words or voice treatment instructions (such as 'pedestrian intrusion is found at the east side of a square, immediate processing is requested') to the interphone of the staff, and meanwhile, real-time voice communication can be carried out through the interphone, so that the staff can quickly respond and take corresponding measures;

3. key technical principle:

Firstly, sensing a terminal data acquisition principle:

The video camera converts optical signals into electric signals based on a CMOS image sensor technology, performs preprocessing such as denoising, color correction, sharpening and the like through an Image Signal Processor (ISP), and transmits video streams to an edge computing unit through network transmission protocols (such as RTSP and Onvif);

(II) a linkage control principle of an early warning terminal:

The edge computing unit sends a control instruction to the early warning terminal through an Internet of things protocol (such as MQTT and CoAP) according to a dynamic hierarchical early warning strategy, the directional sound column receives the instruction and then calls a pre-stored voice file to play, the flash lamp adjusts a flash mode according to the instruction, the interphone receives and displays treatment information through the wireless communication module, and each early warning device adopts a low-power consumption design to support remote configuration and state monitoring so as to ensure stable operation of the system;

4. Terminal layer workflow:

an initialization stage:

After the terminal layer equipment is powered on, hardware self-checking (such as camera lens cleanliness detection, sound column volume calibration and interphone signal strength test) is automatically carried out, equipment configuration parameters (such as camera resolution, sound column broadcasting language and interphone group setting) are loaded, communication connection with an edge layer is established, and equipment registration and state reporting are completed;

And (II) a data acquisition and transmission stage:

The video camera continuously collects the video data of the toll station square, and the video data is transmitted to the edge computing unit in real time through the video private network after encoded compression;

(III) early warning response stage:

when the edge layer judges that a pedestrian or non-motor vehicle intrusion event exists and triggers early warning, a corresponding instruction is sent to an early warning terminal according to the early warning level:

the first-level early warning is that the directional sound column plays low-volume warning voice to remind an intruded target to leave;

second-level early warning, namely, the sound column is oriented to increase the volume and a disposal instruction is sent to the interphone of the staff;

three-stage early warning, namely maximizing sound volume by a directional sound column, flashing a flash lamp at high frequency, and sending an emergency treatment notification to all staff;

and (IV) a state monitoring and feedback stage:

If equipment faults (such as camera disconnection and sound loss of a sound column) are detected, fault information is automatically reported, a closed loop optimization mechanism of a system is triggered, and normal operation of the terminal layer equipment is ensured;

5. application value of terminal layer:

And (one) enhancing scene perception capability:

The combination of the high-definition camera and the distributed storage system realizes the video data acquisition and storage of the full area and the full time period of the toll station square, provides a high-quality data base for the target detection and track tracking of the edge layer, and effectively improves the sensing precision and the response speed of the system to the intrusion event;

and (II) the early warning treatment efficiency is improved:

the directional sound column, the explosion flash lamp and the interphone work cooperatively, a multidimensional and three-dimensional early warning system is constructed, the attention of an intruding target can be quickly brought, workers are timely informed to carry out treatment, the event response time is obviously shortened, and the safety risk is reduced;

and (III) guaranteeing system reliability:

the hardware self-checking, state monitoring and fault reporting mechanism of the terminal layer equipment is matched with the closed loop optimization strategy of the system, so that the problem in the running process of the equipment can be timely found and solved, the long-term stable running of the terminal layer is ensured, and the system failure risk caused by equipment faults is reduced;

(IV) optimizing user management experience:

The system has the advantages of providing clear event tracing basis for toll station management staff, accurately early warning and efficient linkage treatment, effectively improving the safety management level of toll stations, and guaranteeing the passing safety of staff and passing vehicles.

In the embodiment, the edge layer is used as an intelligent center of an intelligent early warning system for the intrusion of a toll station pedestrian and a non-motor vehicle based on a cloud end architecture, bears the key tasks of real-time data processing and the execution of a core algorithm and the linkage control of equipment, greatly reduces the response delay of the system through a localized computing capability and an efficient decision mechanism, ensures the rapid identification and treatment of an intrusion event, and deeply analyzes the edge layer from the aspects of functional positioning, a core component, a technical principle, a workflow and the like:

1. Overview of overall functionality:

The edge layer is arranged at a charging station, is connected with terminal layer sensing equipment (cameras) through a video private network, is communicated with early warning equipment (directional sound columns, explosion lights and the like) through an Internet of things to form a processing closed loop of data acquisition, real-time analysis and instant response, and has the core functions of decoding camera video streams in real time and detecting targets of pedestrians and non-motor vehicles, tracking the targets, predicting tracks and classifying identities;

2. core components and functions:

An edge calculation unit:

The system adopts a high-performance embedded computing platform (such as NVIDIA Jetson series or domestic ARM architecture chips), integrates a GPU acceleration module, supports FP16/INT8 quantitative computation, has strong parallel processing capability, is provided with a plurality of gigabit network ports, USB interfaces and an Internet of things communication module (such as 4G/5G, loRa), ensures stable data transmission with a terminal layer and a cloud, is internally provided with a real-time operating system (RTOS), can ensure low-delay execution of algorithm tasks, and meets the performance requirement that single-frame video processing time is less than 30 milliseconds;

And (II) a target detection and classification module:

visual small model deployment:

The target detection comprises integrating a YOLO11s-det model, performing migration learning through toll station scene data after pre-training based on Objects365+COCO data sets, and accurately identifying pedestrian and non-motor vehicle targets;

Pedestrian classification, namely adopting a YOLO11s-cls model, and adopting an improved multi-scale feature fusion mechanism (formula: , wherein, In order to enhance the feature map,For the number of multi-scale feature maps,Is the firstA dynamic weight matrix of the layer characteristics,Is the firstThe layer is input with a feature map,For the element-by-element multiplication,The function is activated for the Sigmoid linear unit,Basic feature images) are used for distinguishing three types of pedestrians (reflective vest workers, working clothes workers and common pedestrians), and auxiliary judgment of the risk level of intrusion into the target is performed;

dynamic weight matrix Generated by a channel attention mechanism:

(wherein, For global average pooling operation, MLP is a multi-layer perceptron network);

The low confidence coefficient processing is that an image area is automatically intercepted and uploaded to the cloud for a target with the confidence coefficient of the detection result lower than 0.6, and a visual language large model is triggered to carry out secondary rechecking, so that the omission rate and the false detection rate are reduced;

And (III) a target tracking and track predicting module:

Real-time tracking algorithm, namely realizing target ID association and track generation based on ByteTrack algorithm, solving the tracking problem under the condition of target shielding and crossing by combining Kalman filtering and Hungary algorithm, and generating a track point set ;

Track optimization and prediction:

smoothing, namely filtering the original track by utilizing a sliding window algorithm to remove abnormal points generated by environmental interference (such as illumination change and vehicle shielding) so as to obtain a smooth track ;

Future track prediction, namely fitting the smoothed track by adopting a least square method to predict the future 75-frame (3 seconds, supposing the frame rate of 25 fps) motion track;

Error correction, introducing a space-time error compensation algorithm (formula:

, wherein, In order to correct the predicted trajectory after the correction,The trajectory is predicted for the original least squares method,As a time-decay factor,For the current frame to be time-difference from the historical frame,For the rate of change of acceleration of the historical track,For the environmental compensation factor(s),Is the firstThe class-environment element weight is calculated,As the vector of the ambient interference,The number of the environmental element categories) is comprehensively considered, and the time attenuation, the acceleration change and the environmental interference (such as wind speed and vehicle running influence) are comprehensively considered, so that the prediction accuracy is further improved;

And (IV) an early warning control and linkage module:

dynamic hierarchical early warning, namely executing a three-level early warning strategy according to a target prediction track, an actual position and a residence time:

the first-level early warning is that when a target predicted track is about to break into an electronic fence, a directional sound column is triggered to play prompt voice;

Secondary early warning, namely, the target actually enters the electronic fence, the volume of the sound column is increased, and a treatment instruction is sent to the interphone of the staff;

Three-stage early warning, namely, enabling a target to stay in a fence for more than 5 seconds (a threshold value is dynamically adjusted along with traffic flow), starting a flashing lamp and maximizing volume;

The equipment linkage control is that the equipment linkage control is communicated with the terminal layer early warning equipment through an Internet of things protocol (such as MQTT and Modbus), control instructions are issued in real time, equipment state feedback (such as sound column volume and explosion lamp working state) is received, and the early warning execution accuracy and reliability are ensured;

3. key technical principle:

the visual small model light deployment principle comprises the following steps:

The model quantization technology (FP 16/INT 8) is adopted to reduce the calculated amount and memory occupation, and the ONNX format is combined to realize cross-platform efficient deployment;

(II) principle of target tracking and prediction:

the ByteTrack algorithm realizes stable tracking of the target by associating a detection frame with a tracking track, the least square method fits a motion trend based on historical track data, and a space-time error compensation algorithm introduces environmental factors to correct a prediction deviation so as to form a complete closed loop of detection, tracking, prediction and correction, and improves the prediction accuracy of the dynamic target;

and (III) an early warning linkage control principle:

based on a rule engine, realizing hierarchical early warning logic, matching an early warning strategy according to target behavior characteristics (track, position and residence time), rapidly issuing instructions to terminal equipment through an internet of things protocol, and simultaneously supporting dynamic adjustment of an early warning threshold (such as shortening the residence time threshold in peak time) so as to ensure that early warning response is matched with actual scene requirements;

4. Edge layer workflow:

an initialization stage:

after the edge computing unit is started, hardware self-checking (such as GPU computing power detection and network interface connectivity test) is completed, pre-trained visual small model parameters and pre-warning strategy configuration files are loaded, communication connection with a terminal layer and a cloud end is established, and equipment state information is reported to the cloud end;

And (II) a data receiving and preprocessing stage:

Receiving a video stream transmitted by a camera through a video private network in real time, decoding and converting formats by using FFmpeg and other tools, extracting single-frame image data, and providing input for subsequent target detection and analysis;

And (III) a real-time analysis and decision stage:

Inputting single-frame images into a YOLO11s-det model and a YOLO11s-cls model to finish target identification and pedestrian classification, and outputting detection results (category, confidence level and position coordinates);

Target tracking and prediction, namely executing ByteTrack tracking algorithm on the detected target, generating a track, predicting future motion trend, and correcting prediction error:

the early warning decision is that according to the position relation between the target track and the electronic fence, the stay time is combined, and the early warning instruction of the corresponding level is triggered;

Cloud interaction, namely uploading a low confidence detection result to the cloud, and receiving a rechecking result returned by the cloud and model updating parameters;

And (IV) early warning execution and feedback stage:

Receiving equipment state feedback, recording an early warning event log, and uploading key information (such as early warning time, target type and treatment result) to a cloud end for storage;

(V) closed loop optimization stage:

receiving an incremental training data set and a model updating instruction issued by a cloud, and periodically performing migration learning iteration on the local vision small model to continuously optimize detection and classification accuracy;

5. Application value of edge layer:

firstly, system delay is reduced, and instantaneity is improved:

the localized data processing and decision mechanism avoids network delay of the data round-trip cloud, ensures millisecond response to an intrusion event, and remarkably improves the instantaneity and safety of the system;

and secondly, relieving cloud pressure and optimizing resource allocation:

The edge layer bears most of real-time computing tasks (such as target detection and track prediction), so that cloud data processing load is reduced, the cloud is focused on complex model training, data management and global decision making, and the overall operation efficiency of the system is improved;

And (III) enhancing system robustness:

even if the network is interrupted or the cloud fails, the edge layer can still independently finish local detection, early warning and treatment, so that the core function of the system is not interrupted, and the reliability and fault tolerance of the system are improved;

(IV) supporting scenerized dynamic optimization:

Through local model iteration and early warning strategy dynamic adjustment, the edge layer can be rapidly adapted to environmental differences (such as lane layout and traffic flow change) of different toll stations, and personalized and intelligent upgrading of the system is realized.

In this embodiment, the visual language big model adopts the Qwen2.5-VL-32B model, and the following functions are executed through text prompt words:

rechecking the low confidence level target;

Analyzing newly accessed video of the toll station, and outputting polygonal coordinates of the electronic fence;

Furthermore, the cloud layer is used as a 'smart brain' and a 'data center' of a toll station pedestrian and a non-motor vehicle based on a cloud end architecture to intrude into an intelligent early warning system, and bears the core functions of complex model operation, global data management and system optimization decision, and realizes the whole-flow support from data processing, model training to service management through the cooperative work with an edge layer:

1. Overview of overall functionality:

The cloud layer relies on a central cloud server cluster to construct a comprehensive platform integrating data storage, model training, intelligent decision making and system management, and has the core functions of performing secondary rechecking on a low-confidence target uploaded by an edge layer by utilizing a visual language big model, training and optimizing a visual small model based on mass data, automatically generating an electronic fence area, storing and managing historical data of the whole life cycle of the system, providing a visual management interface, supporting system parameter configuration, state monitoring and decision making analysis, and realizing the intellectualization, automation and sustainable optimization of the system;

2. Core module and function:

a visual language big model service module:

The low confidence target rechecking is carried out by accessing a Qwen2.5-VL-32B and other high-performance visual language large model, receiving a target image with the confidence lower than 0.6 uploaded by an edge layer, driving the model analysis through text prompting words (such as judging whether the image is a pedestrian or a non-motor vehicle which enters a toll station or not and outputting a conclusion) and combining the semantic understanding and knowledge reasoning capability of the image to carry out secondary confirmation on the target, thereby reducing the false detection and omission ratio of the system;

The training data pre-marking, namely, automatically generating pre-marking information of target categories (pedestrians, non-motor vehicles), position coordinates and identity features (such as whether the target is a worker) by utilizing a visual language large model aiming at image data of a toll station scene;

the electronic fence is generated, namely, a video picture of a newly accessed camera is received, a prompt word (such as 'identifying a toll station forbidden to enter an area and outputting polygon coordinates') is input into a visual language large model, and the model automatically extracts the boundary of a dangerous area through semantic understanding and space analysis capability;

And (II) a model training and optimizing module:

The incremental learning mechanism is used for collecting early warning forward report/false report data fed back by an edge layer and combining manual labeling to form an incremental training data set, periodically triggering the migration learning of visual small models (YOLO 11s-det and YOLO11 s-cls), adapting to new scenes and complex environment changes by adjusting model parameters, and continuously optimizing detection precision and classification accuracy;

super-parameter tuning, namely automatically adjusting super-parameters (such as learning rate, batch size and network depth) in the model training process by using Bayesian optimization, random search and other algorithms, searching for optimal parameter combinations, and improving model training efficiency and generalization capability;

Model version management, namely establishing a model version library, recording training data, hyper-parameter configuration and performance indexes (such as mAP (mAP) and F1 values) of each version model, supporting model rollback and comparative analysis, and ensuring the stability and traceability of the system in the iterative process;

and (III) a data storage and management module:

The distributed storage system adopts a storage architecture combining a distributed file system (such as Ceph, glusterFS) and a time sequence database (such as InfluxDB) to realize the efficient storage of multiple types of data such as video data, detection results, early warning logs and the like;

The manager can quickly inquire historical event videos, early warning records and model training data through a visual platform to assist event duplication and system optimization;

Data security and backup, namely adopting security policies such as data encryption (AES-256), access right control (RBAC) and the like to ensure the data privacy and integrity;

and (IV) a visual management platform module:

The real-time monitoring panel visually displays the equipment running state (camera online rate, edge computing unit load), real-time early warning information (break-in event position, early warning level) and traffic flow thermodynamic diagram of each toll station based on the GIS map, and helps management personnel to master the global situation;

Generating a statistical report (such as daily break-in event number, fault equipment distribution and model performance trend) based on historical data, supporting self-defined time range and data dimension analysis, and providing data support for system optimization and resource allocation;

Providing a visual interface, supporting operations such as remote configuration of an electronic fence area, adjustment of an early warning threshold value, issuing of a model updating instruction and the like, and realizing flexible configuration and unified management of system parameters;

3. key technical principle:

And (one) a visual language big model cooperative principle:

The visual language big model is driven to execute tasks through text prompt words, and complex functions such as low-confidence target identification, region boundary extraction and the like are realized by utilizing the cross-modal understanding capability (fusion of image semantics and text information) of the model;

(II) model incremental training principle:

based on the transfer learning technology, the network parameters are finely adjusted by using newly marked incremental data on the basis of a pre-training model, and the model is gradually adapted to new scene characteristics by focusing on difficult sample through loss function optimization (such as cross entropy loss and FocalLoss), so that the problem of 'catastrophic forgetting' is avoided;

(III) data visualization and interaction principle:

Adopting WebGL, ECharts front-end technology to convert structured and unstructured data in the database into a visual chart and a map layer, realizing front-end and back-end data interaction through RESTfulAPI, and supporting a user to query, analyze and configure system parameters in real time through interface operation;

4. cloud layer workflow:

an initialization stage:

After the cloud server is started, the initialization of a storage system, the loading of model services and the establishment of a communication link between the cloud server and an edge layer are completed;

And (II) a data receiving and processing stage:

Receiving low confidence data, namely receiving a low confidence target image uploaded by an edge layer, storing the low confidence target image into a temporary data queue, and waiting for processing of a visual language big model;

Collecting incremental data, namely summarizing early warning results (forward report/false report) fed back by an edge layer and manual annotation data, and arranging the early warning results and the manual annotation data into an incremental training data set;

video data storage, namely receiving a video stream newly accessed to a camera, performing distributed storage and establishing an index;

and (III) intelligent processing and decision stage:

target rechecking and marking, namely calling a visual language big model to carry out secondary analysis on a low-confidence target, and outputting a rechecking result;

Triggering visual small model training according to the incremental data set, generating a new version model after the visual small model training is completed, and issuing the new version model to an edge layer after test verification;

Generating an electronic fence, namely processing new video data by utilizing a visual language large model, generating electronic fence coordinates, and synchronizing to the edge layers of each toll station after manual confirmation;

Fourth, data management and display stage:

data storage and indexing, namely storing processed data (rechecking results, training data and early warning logs) into a database in a classified manner, and updating index information;

the visual display comprises the steps of acquiring system operation data in real time, updating a monitoring panel and a statistical report, responding to a user query request, and providing data retrieval and analysis service;

(V) a system optimization stage:

According to data analysis results (such as model performance reduction and high-frequency false alarm areas), system parameters (early warning threshold values and electronic fence boundaries) are adjusted or model retraining is triggered, and overall efficiency of the system is continuously optimized;

5. application value of cloud layer:

and (one) improving intelligent decision-making capability of the system:

the depth semantic analysis and reasoning capability of the visual language large model makes up the limitation of edge layer detection, realizes accurate judgment in complex scenes, reduces the human intervention requirement and improves the intelligent level of the system;

And (II) guaranteeing long-term optimization iteration of the system:

the cloud layer can adapt to the environmental change (such as lane transformation and traffic regulation) of a toll station according to the feedback continuous optimization algorithm of the actual application through a data closed loop and model increment training mechanism, so that the long-term stable operation of the system is ensured;

and thirdly, realizing global management and scheduling:

The unified data storage and visualization platform supports management personnel to monitor the operation state of multiple sites from a global view, quickly positions the problems and carries out remote configuration, reduces operation and maintenance cost and improves management efficiency;

and (IV) pushing data value mining:

The accumulation and analysis of massive historical data not only serves for system optimization, but also can further mine information such as traffic flow rules, potential safety hazard characteristics and the like, and provides data support for traffic planning and safety policy formulation.

In the embodiment, a dynamic grading early warning module is used as an emergency center of a toll station pedestrian and a non-motor vehicle intrusion intelligent early warning system based on a cloud end architecture, and an accurate and efficient risk response mechanism is constructed through multidimensional data fusion analysis and dynamic strategy adjustment, and the module takes target behavior characteristics as a core judgment basis, combines environment dynamic changes to realize full-flow grading management and control from risk prejudgment to emergency disposal, and the following is elaborated in the aspects of functional positioning, core mechanism, technical principle, application value and the like:

1. Overview of overall functionality:

The dynamic grading early warning module carries out dynamic risk assessment on targets of pedestrians and non-motor vehicles which intrude into a toll station by means of real-time detection data of an edge layer and global strategy support of a cloud end, automatically triggers three-level progressive early warning response according to a target prediction track, an actual position and residence time, dynamically adjusts an early warning threshold value by combining factors such as traffic flow, the presence state of workers and the like, realizes multi-mode transmission of risk information by means of cooperative work of terminal equipment such as a directional sound column, a flashing lamp, an interphone and the like, ensures that the workers and the intruded targets acquire warning in time, and simultaneously provides accurate guidance for emergency treatment;

2. core mechanism and function:

and (one) a three-level early warning classification strategy:

The first-level early warning is triggered when the edge layer predicts that the target track is about to break into the electronic fence area, and at the moment, the directional sound column plays prompt voice (such as 'danger in front area, please get away immediately') at medium volume, prompts the break-in target to change the travelling route through sound warning, and simultaneously sends preliminary early warning information to staff to enable the staff to enter a standby state;

The system is updated to the secondary early warning immediately when a target actually enters an electronic fence area in real time, the volume of a directional sound column is obviously increased, high-strength warning voice (such as 'you enter a dangerous area and please withdraw rapidly') is circularly played, and meanwhile, a detailed treatment instruction (comprising the target position, the type and the suggested treatment mode) is sent to an interphone of the staff;

The method comprises the steps of three-level early warning, namely, when the retention time of a target in an electronic fence exceeds a preset threshold (default 5 seconds and can be dynamically adjusted), triggering the highest-level three-level early warning, wherein at the moment, an explosion lamp flashes at high frequency, a directional sound column starts a maximum volume to continuously play an alarm, and simultaneously sends an emergency notification to all staff;

(II) dynamic threshold adjustment mechanism:

The traffic flow self-adaptive system dynamically adjusts the early warning threshold according to the real-time traffic flow of the toll station:

The peak period is that the target residence time threshold value is shortened (for example, the residence time is reduced from 5 seconds to 3 seconds), the early warning response speed is accelerated, and the congestion or accident caused by the break-in event when the traffic is dense is avoided;

The low flow period is that the stay time threshold value is prolonged (for example, the stay time threshold value is prolonged to 8 seconds), false alarm triggered by short stay (for example, road asking and temporary standing) is reduced, and the early warning accuracy is improved;

When the edge layer recognizes that the staff is in the early warning area through the pedestrian classification model, the early warning level is automatically reduced, namely, if the current early warning is secondary or tertiary, the early warning is degraded to primary early warning, and if the early warning is primary, the alarm is temporarily restrained, so that the interference to the normal operation of the staff is avoided;

and (III) multi-equipment cooperative linkage:

The early warning equipment controls the module to send an instruction to the terminal equipment through an internet of things protocol (such as MQTT) according to the early warning level:

primary early warning, namely starting voice broadcasting by the directional sound column;

second-level early warning, namely improving the volume of a directional sound column and sending instructions to an interphone;

Three-stage early warning, namely high-frequency flashing of the flashing lamp, full volume playing of the sound column, triggering of emergency prompt sound by the interphone, real-time feedback of the states of all equipment to the edge layer, and effective execution of instructions;

The man-machine interaction optimization staff can manually intervene in the early warning process (such as suspending alarm and prolonging early warning time) through the interphone or the management platform, and the system records manual operation as a log for subsequent strategy optimization;

3. key technical principle:

Risk assessment algorithm:

Combining a target prediction track (generated based on a least square method and a space-time error compensation algorithm), an actual position (compared with an electronic fence coordinate) and a residence time to construct a risk assessment function:

(wherein, Is a risk value; The track intrusion probability is calculated by predicting the distance between the track and the boundary of the fence; The risk coefficient of the current position is taken as a value 1 in the electronic fence and 0 outside the electronic fence; comparing and normalizing the residence time risk value with a preset threshold value; 、、 training optimization through historical data for weight coefficients), when Triggering corresponding early warning when the threshold value of each level is exceeded;

(II) dynamic threshold optimization model:

Adopting reinforcement learning algorithm (such as Q-learning), taking historical early warning data (forward report/false report record) and manual feedback as reward signals, dynamically adjusting early warning parameters such as residence time threshold, volume, brightness and the like;

and (III) a multi-device synchronous control technology:

The system ensures the reliable transmission of the instruction in the network fluctuation environment through a heartbeat detection and retransmission mechanism, and realizes millisecond-level cooperative response of a sound column, a flash lamp and an interphone;

4. The working flow is as follows:

A data acquisition stage:

The edge layer provides a target detection result, track prediction data and electronic fence coordinates in real time, and uploads traffic flow statistics and staff identity position information as basic data of early warning decision;

(II) risk assessment stage:

The dynamic grading early warning module calculates a target risk value according to a risk assessment algorithm, and judges whether to trigger early warning and a corresponding grade;

And (III) early warning execution stage:

Generating an instruction set according to the risk level, issuing the instruction set to terminal equipment through an Internet of things protocol, starting voice broadcasting, lamplight flashing or information pushing, and recording early warning event logs (time, position, level and equipment state);

(IV) a dynamic adjustment stage:

The traffic flow change and the state of staff are monitored in real time, and the early warning threshold value is automatically updated or the early warning level is adjusted;

and (V) a feedback optimization stage:

uploading early warning execution results (such as whether the driving is successful or not and the false report situation) to the cloud end, participating in vision small model training and early warning strategy optimization, and forming closed loop iteration of 'detection-early warning-optimization';

5. application value:

and (I) the safety protection precision is improved:

The method has the advantages that the step progressive early warning and the dynamic threshold adjustment can be carried out, the rapid response to the high-risk intrusion behavior can be realized, the false alarm interference of a low-risk scene can be reduced, and the pertinence and the effectiveness of the safety protection are obviously improved;

And (II) reducing emergency disposal cost:

by the cooperation of multiple devices and the pushing of accurate instructions, the response time of workers is shortened, invalid disposal actions are reduced, the consumption of manpower and equipment resources is reduced, and the emergency management efficiency is improved;

and (III) enhancing system environment adaptability:

The dynamic strategy adjustment mechanism enables the system to automatically adapt to complex environmental changes such as traffic flow of toll stations, personnel activities and the like, and avoids the problems of early warning failure or excessive reaction caused by fixed threshold values;

And (IV) strengthening man-machine cooperative performance:

the combination of manual intervention and automatic early warning not only plays the real-time advantage of the algorithm, but also reserves the flexibility of personnel decision making, and builds an intelligent and humanized safety management system.

In this embodiment, the electronic fence generation mode is:

uploading a video picture newly accessed into the camera to the cloud;

inputting a prompt word into the visual language big model;

after manual rechecking, defining an electronic fence;

further, the specific operation flow comprises:

And (one) data acquisition and uploading:

The video source is accessed, wherein a newly deployed or replaced camera uploads a real-time video stream to a cloud server through a video private network in an initialization stage, a video format is encoded by adopting H.265, the resolution is not lower than 1080P, the definition of picture details is ensured, and high-quality data is provided for the subsequent region identification;

The cloud server extracts the key frames from the video stream according to the frequency of 1-5 frames per second (such as pictures of traffic and frequent personnel activity time periods), so that the data processing amount is reduced and the key characteristics of the scene are reserved;

(II) intelligent recognition and coordinate generation:

the visual language big model is driven by inputting text prompt words into the visual language big model such as Qwen2.5-VL-32B, for example, the method is used for identifying the areas where pedestrians and non-motor vehicles are forbidden to enter in a toll station in a picture and outputting polygon vertex coordinates (formats are [ x1, y1], [ x2, y2],.+ -.) ";

Generating polygon coordinates, namely converting the identified regional boundary into a polygon geometric figure by a model, and outputting a vertex sequence in a pixel coordinate form, wherein for a rectangular lane region, four vertex coordinates are output;

And (III) manually rechecking and correcting:

The cloud management platform superimposes the electronic fence polygons output by the model on an original video picture and displays the electronic fence polygons in a highlight line or color block mode, so that a manager can conveniently and intuitively check whether the fence range is accurate;

If false judgment (such as that the fence comprises a non-forbidden area) or missed judgment (such as that the fence does not cover dead angles), a manager can manually add, delete or move polygon vertexes through a graphic editing tool provided by a platform to adjust the boundary of the fence in real time;

(IV) data issuing and synchronization:

The edge layer updates, namely the confirmed error-free electronic fence coordinate data is sent to an edge computing unit of a corresponding toll station through a secure encryption channel, the edge layer immediately replaces the original fence configuration after receiving new data, and the successful receipt confirmation information is returned to the cloud;

The cloud support for mass sending of electronic fence data to a plurality of stations managed by a large-scale intercommunication toll station or networking ensures the consistency and synchronism of protection strategies of the stations;

The key technical principle is as follows:

First, visual language large model area recognition principle:

In the electronic fence generating task, the model firstly extracts object features (such as a railing, a warning mark and a lane line) in a picture, and infers the area range to be protected by combining the semantic constraint of a forbidden area in a prompt word;

and (II) polygon fitting and coordinate conversion technology:

boundary contour extraction, namely carrying out refinement treatment on the identified region edge by a model, and converting an irregular boundary into a continuous pixel point set by adopting a Canny edge detection or GrabCut segmentation algorithm;

Simplifying the pixel point set by using a Ramer-Douglas-Peucker (RDP) algorithm, and reducing the number of vertexes on the premise of keeping the boundary shape characteristics to generate an optimal polygonal representation;

coordinate mapping, namely converting pixel coordinates output by the model into actual geographic coordinates (such as longitude and latitude or plane projection coordinates), and ensuring the spatial consistency of the electronic fence under different camera view angles;

And (III) a man-machine collaborative optimization mechanism:

The platform records the operation log and the reason explanation of each manual adjustment to form a closed loop of model prediction-manual correction-data feedback, wherein the data are used for subsequent training of a large visual language model to enhance the adaptability of the large visual language model to complex scenes (such as temporary adjustment of a construction area and special lane layout);

The working flow is as follows:

Initializing configuration phase:

After the new camera is accessed into the system, the cloud records the position of the equipment, the visual angle parameters and the information of the toll station to which the new camera belongs, and establishes a communication link with an edge layer to prepare for generating an electronic fence;

(II) intelligent generation:

The cloud receives a video key frame of the camera, and triggers the visual language big model to execute an area identification task;

the model outputs polygon coordinate data, and uploads the polygon coordinate data to a management platform to wait for rechecking;

(III) manual intervention stage:

A manager logs in a platform to check the visualized result of the electronic fence;

performing interactive adjustment on the wrong or unreasonable fence boundary, and submitting confirmation after saving and modifying;

(IV) deployment validation stage:

the cloud end transmits the finally confirmed electronic fence data to the corresponding edge layer;

updating local configuration by the edge layer, and using fence data for subsequent target intrusion detection and early warning judgment;

And (V) a dynamic maintenance stage:

when the toll station area changes (such as lane reconstruction and facility migration), the electronic fence is updated by repeating the flow, so that the protection boundary is always attached to the actual requirement.

In this embodiment, the system further includes a closed-loop optimization mechanism:

Generating an incremental training data set after manual labeling;

Periodically triggering the transfer learning iteration of the visual small model;

Further, the closed-loop optimization mechanism is used as a self-evolution engine for the pedestrians and the non-motor vehicles of the toll station based on the cloud end architecture to intrude into the intelligent early warning system, and continuous improvement of system performance is realized through a circulation flow of data feedback, model iteration and strategy optimization:

1. Overview of overall functionality:

The core functions of the system comprise collecting early warning positive report/false report data uploaded by an edge layer, forming a high-quality training sample through manual labeling, triggering the transfer learning iteration of a vision small model based on the sample data, optimizing a target detection and classification algorithm, synchronously updating system parameters (such as an early warning threshold value and an electronic fence boundary), and feeding back an optimization result to the edge layer, thereby realizing the self-adaptive evolution of the system to a new scene and a new problem;

2. The core optimization flow comprises the following steps:

data collection and labeling:

The early warning data acquisition, namely receiving early warning event records uploaded by an edge layer in real time by a cloud, wherein the early warning event records comprise information such as an intrusion target image, detection results (category and confidence), early warning level, manual treatment feedback (whether false report exists or not) and the like; the system marks the time stamp of the data and stores the time stamp of the data in association with metadata such as camera numbers, geographic positions and the like of the corresponding toll stations;

The manual marking and auditing method comprises the steps of carrying out image marking by a professional through a cloud management platform aiming at a low confidence detection result or a dispute early warning event, wherein marking contents comprise a target real type (pedestrian, non-motor vehicle and staff), an accurate position coordinate and a behavior attribute (whether break-in or stay time length) and enabling data to enter an auditing flow after marking is finished, and ensuring marking accuracy through cross verification or expert rechecking;

and (II) model training and iteration:

the system performs cluster analysis according to data characteristics (such as scene illumination and target gesture), preferentially selects difficult samples (such as occlusion targets and similar object misjudgment), and ensures that the training data cover complex scenes in practical application;

The migration learning optimization comprises the steps of adopting a migration learning strategy based on visual small models such as YOLO11s-det and YOLO11s-cls, finely adjusting network weights aiming at incremental data of a toll station scene based on pre-training parameters of a freezing model, applying a self-adaptive learning rate algorithm (such as AdamW) and a regularization technology (L2 regularization) in the training process, preventing overfitting and improving the generalization capability of the model;

After training, performing performance evaluation on the new version model by using an independent test set, wherein indexes comprise average precision average (mAP), F1 value, false alarm rate and the like;

And (III) policy updating and deployment:

The parameter dynamic adjustment is to automatically or manually adjust system parameters according to the model optimization result and the historical early warning data analysis, for example, if false alarm occurs in a certain area at high frequency, the system can reduce the range of the electronic fence in the area;

The method comprises the steps of version issuing and synchronization, namely issuing the evaluated new version model and updated strategy parameters to each toll station edge computing unit through a secure encryption channel, and after receiving an update instruction, carrying out model compatibility test locally by an edge layer, replacing an old version model after the test passes, restarting related services, thereby ensuring smooth transition of the system;

(IV) effect verification and feedback:

the real-time effect monitoring comprises the steps of continuously monitoring key indexes such as early warning accuracy, response time and the like after the edge layer deploys the new version model, and uploading data to the cloud end in real time;

If the system performance after the optimization is found to be not expected (such as false alarm rate not decreasing and reversely rising), the system automatically backtracks the training process, analyzes the problem of data labeling deviation or model parameter setting, and restarts the optimization flow to form a spiral rising cycle of optimization-verification-re-optimization;

3. key technical principle:

The migration learning and increment training principle is as follows:

The characteristic representation learned by the pre-training model on a large-scale general data set is utilized to carry out targeted fine adjustment by combining with the incremental data of the toll station scene;

(II) data-driven parameter optimization:

Based on statistical analysis of historical early warning data (such as false alarm distribution of each area and false alarm rate in different time periods), adopting reinforcement learning or heuristic algorithm (such as genetic algorithm) to search for optimal parameter combination;

And (III) version management and rollback mechanism:

if a serious problem occurs in the new version model, the new version model can be rolled back to a historical stable version by one key, and the problem data is reserved for subsequent analysis, so that the stability and traceability of the system in the iterative process are ensured;

4. The working flow is as follows:

and (one) a data acquisition and preprocessing stage:

the edge layer continuously uploads early warning event data to the cloud end;

the cloud performs cleaning, de-duplication and format standardization processing on the data to form an original data set;

And (II) manual labeling and auditing stage:

The system screens the data to be marked and distributes the data to marking personnel;

cross auditing is carried out after labeling is completed, so that the data quality is ensured to reach the standard;

and (III) model training and optimizing:

Constructing an incremental training data set, and dividing a training set, a verification set and a test set;

Performing transfer learning training and adjusting model parameters;

evaluating the performance of the model, and screening an optimal version;

(IV) policy update and deployment stage:

Adjusting system parameters according to the model optimization result;

The new version model and the strategy are issued to an edge layer, and deployment updating is completed;

And (V) an effect verification and feedback stage:

monitoring system performance indexes in real time;

if the expected problem is not met, feeding back the problem to a data labeling or model training link, and restarting the optimization flow;

5. application value:

and (one) improving the long-term adaptability of the system:

through continuous model iteration and strategy optimization, the system can automatically adapt to the environmental change of a toll station (such as newly added facilities and traffic rule adjustment), and long-term stable detection precision and early warning reliability are maintained;

And (II) reducing operation and maintenance cost:

The workload of manually checking the problems of false alarm and missing alarm is reduced, the dependence on professional technicians is reduced through the automatic optimization of data driving, and the operation and maintenance efficiency is improved;

And (III) enhancing system robustness:

The closed-loop mechanism can rapidly respond to newly-appearing complex scenes (such as severe weather and special events), and by timely adjusting the model and strategy, the system performance is prevented from being greatly reduced, and the safety protection capability is ensured;

(IV) accumulating data asset value:

The massive annotation data and model iteration records accumulated in the optimization process can be further used in the fields of traffic behavior analysis, safety risk prediction and the like, the potential value of the data is mined, and decision support is provided for traffic management.

The intelligent early warning method for pedestrian and non-motor vehicle intrusion in the toll station is implemented by using an intelligent early warning system for pedestrian and non-motor vehicle intrusion in the toll station, and comprises the following operation steps:

The method comprises the steps of data acquisition and preprocessing, namely acquiring the video data of a toll station square by a camera of a terminal layer in real time, and synchronously storing the acquired data by a hard disk video recorder, and removing noise interference after the acquired data is decoded and preprocessed by an edge layer so as to provide a basis for subsequent analysis;

The target detection and tracking, namely, the edge layer adopts a YOLO11s-det visual small model to detect the targets of pedestrians and non-motor vehicles, and associates the target ID and generates a track through ByteTrack algorithm;

The method comprises the steps of grading early warning and linkage response, wherein dynamic grading early warning is triggered according to a target prediction track, an actual position and a residence time, primary early warning aims at predicting an intrusion behavior and gives a voice prompt through a directional sound column, secondary early warning increases volume and informs workers when the target actually intrudes, tertiary early warning aims at staying the target for a long time, a flashing lamp is started and the volume is maximized, an early warning threshold is dynamically adjusted according to traffic flow, and the workers automatically degrade when the workers are present;

The cloud cooperation and closed-loop optimization comprises the steps that an edge layer uploads a target with the confidence coefficient lower than 0.6 to the cloud, a Qwen2.5-VL-32B visual language large model is used for conducting secondary review, meanwhile, the cloud automatically generates an electronic fence area and pre-labeling training data by means of the model, in addition, the cloud collects early warning positive report/false report feedback, an incremental data set is generated after manual labeling, and edge layer visual small model migration learning iteration is driven regularly to continuously optimize system performance.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. An intelligent warning system for pedestrians and non-motor vehicles entering toll booths, characterized by:

Terminal layer: Perception terminals consisting of cameras and hard disk recorders covering the toll booth square, and early warning terminals consisting of directional sound columns, strobe lights, and intercoms;

Edge layer: Edge computing units deployed at toll booths connect to perception terminals via a dedicated video network and to warning terminals via the Internet of Things. They perform the following operations in real time:

Decode the camera video stream and use a small visual model to detect pedestrians and non-motorized vehicles;

Track the target and generate the motion trajectory, and predict the trajectory in the next 3 seconds;

Capture the image area of pedestrian targets and use the classification model to distinguish between staff and ordinary pedestrians;

Trigger cloud-based review for targets with a confidence level lower than 0.6;

Cloud layer: includes central cloud servers, used for:

Use the large visual language model to conduct a secondary review of low-confidence targets uploaded from the edge layer;

Use the large visual language model to pre-label the training data and train the small visual model;

Automatically generate electronic fence areas based on toll station videos;

Store historical data and provide a visualization platform;

Dynamic graded warning module: Dynamically trigger graded warnings based on the target's predicted trajectory, location, and residence time. The warning level automatically downgrades as staff are present.

2. The intelligent warning system for pedestrians and non-motor vehicles entering toll booths according to claim 1 is characterized in that the large visual language model adopts the Qwen2.5-VL-32B model and performs the following functions through text prompt words:

Pre-label the image data for training the small visual model and output the target category, location coordinates and identity features;

Review low-confidence targets;

Analyze the newly connected toll station video and output the polygon coordinates of the electronic fence.

3. The intelligent warning system for pedestrians and non-motor vehicles entering a toll station according to claim 1, wherein the visual model comprises:

The object detection model YOLO11s-det, pre-trained on the Objects365 and COCO datasets, uses transfer learning from toll booth scene data to detect pedestrians and non-motorized vehicles.

The pedestrian classification model YOLO11s-cls distinguishes three types of pedestrians: workers wearing reflective vests, workers not wearing reflective vests but wearing work clothes, and ordinary pedestrians;

After the model is exported in ONNX format, it is quantized to FP16 or INT8 and deployed to the edge computing unit to ensure that the single-frame processing delay is less than 30 milliseconds.

The pedestrian classification model uses an improved multi-scale feature fusion mechanism: multiplying the multi-layer input feature maps by the corresponding dynamic weight matrix element-by-element, calculating the average of the weighted feature maps, processing them through the Sigmoid linear unit activation function, and then adding them to the base feature map to generate an enhanced feature map.

The dynamic weight matrix is generated by performing a global average pooling operation on the input feature map and then inputting it into the multi-layer perceptron network.

4. The intelligent warning system for pedestrians and non-motor vehicles entering toll booths according to claim 1, wherein the dynamic hierarchical warning strategy includes:

Level 1 warning: When the target's predicted trajectory enters the electronic fence, the directional sound column is triggered to play a warning voice;

Level 2 warning: When the target actually enters the electronic fence, the sound column volume is increased and the handling instructions are sent to the staff intercom;

Level 3 warning: If the target stays in the fence for more than 5 seconds, the flashing light will be activated and the volume will be maximized;

The warning threshold is dynamically adjusted according to traffic flow: the dwell time threshold is shortened during peak hours and extended during low traffic periods.

5. The intelligent warning system for pedestrians and non-motor vehicles entering a toll station according to claim 1, wherein the electronic fence is generated in the following manner:

Upload the video footage of the newly connected camera to the cloud;

Input prompt words into the visual language model;

The electronic fence is demarcated after manual review.

6. The intelligent warning system for pedestrians and non-motor vehicles entering toll booths according to claim 1, wherein when the edge layer performs target tracking:

The ByteTrack algorithm is used to associate the target ID and generate a set of trajectory points;

Smooth the trajectory through a sliding window and filter outliers;

Predict the trajectory of the next 75 frames based on the least squares method;

The predicted trajectory is corrected using a spatiotemporal error compensation algorithm: Based on the original least squares predicted trajectory, the product of the time attenuation factor and the historical trajectory acceleration change rate is superimposed, and then the environmental compensation coefficient and the weighted sum of various environmental interference vectors are superimposed;

The environmental interference vector is generated by multiplying the result of the environmental interference intensity value input into the Sigmoid function and multiplying it by the interference sensitivity adjustment factor with the interference direction vector.

7. The intelligent warning system for pedestrians and non-motor vehicles entering toll booths according to claim 1 is characterized by further comprising a closed-loop optimization mechanism:

Collect positive/false alarm feedback from warning images in the cloud;

Generate incremental training datasets after manual annotation;

Periodically trigger transfer learning iterations of the small vision model.

8. An intelligent warning method for pedestrians and non-motor vehicles entering a toll station, characterized in that the method is executed using the intelligent warning system for pedestrians and non-motor vehicles entering a toll station according to any one of claims 1 to 7.