CN121210929A

CN121210929A - A multimodal and reinforcement learning-based intelligent evaluation method and system

Info

Publication number: CN121210929A
Application number: CN202511757439.3A
Authority: CN
Inventors: 孙鑫; 金蕾; 罗彬�; 吴双威; 金巍; 周粲
Original assignee: Sichuan Energy Internet Research Institute EIRI Tsinghua University
Current assignee: Sichuan Energy Internet Research Institute EIRI Tsinghua University
Priority date: 2025-11-27
Filing date: 2025-11-27
Publication date: 2025-12-26
Anticipated expiration: 2045-11-27
Also published as: CN121210929B

Abstract

本发明涉及人工智能技术评估领域，公开了一种多模态与强化学习的智能评估方法与系统，该方法包括：通过改进的数据清洗与跨模态对齐处理生成特征张量；基于预训练模型和贝叶斯推断进行权重优化；采用同态加密与安全聚合生成优化模型；通过强化学习策略迭代与稳定性验证生成权重矩阵；结合哈希加密与智能合约实现区块链存证；最后基于模糊综合评价与可视化处理生成评估报告。本发明有效提升了多模态数据评估的准确性与鲁棒性，保障了数据安全与可追溯性，并通过动态权重优化与可视化输出增强了结果的可解释性与实用性。This invention relates to the field of artificial intelligence technology evaluation, and discloses an intelligent evaluation method and system for multimodal and reinforcement learning. The method includes: generating feature tensors through improved data cleaning and cross-modal alignment processing; optimizing weights based on a pre-trained model and Bayesian inference; generating an optimized model using homomorphic encryption and secure aggregation; generating a weight matrix through reinforcement learning strategy iteration and stability verification; implementing blockchain notarization by combining hash encryption and smart contracts; and finally generating an evaluation report based on fuzzy comprehensive evaluation and visualization processing. This invention effectively improves the accuracy and robustness of multimodal data evaluation, ensures data security and traceability, and enhances the interpretability and practicality of the results through dynamic weight optimization and visualization output.

Description

Multi-mode and reinforcement learning intelligent evaluation method and system

Technical Field

The invention relates to the field of artificial intelligence technology evaluation, in particular to an intelligent evaluation method and system for multi-modal and reinforcement learning.

Background

In the technical result intelligent evaluation field, the existing scheme related to a multi-mode and reinforcement learning intelligent evaluation method and system generally adopts a single-mode data analysis and static weight distribution mechanism, and has the limitations of large subjective score error, evaluation index weight solidification, low multi-source data processing efficiency and the like. The existing method is characterized in that the characteristic extraction rule and the scoring weight are set by relying on manual experience, cross-modal characteristic alignment deviation and dynamic environment adaptability are easy to be insufficient in multi-modal scenes related to patent images, test texts and experimental data, and objectivity and timeliness requirements of technical result evaluation are difficult to be met. Aiming at the joint processing of patent images and test text data, the prior art generally lacks a collaborative optimization mechanism for text semantic understanding and image feature recognition, and a fixed weight distribution mode cannot respond to the change of an evaluation environment, so that the stability and reliability of an evaluation result are reduced. In addition, in the traditional method, flow splitting exists in the steps of preprocessing the multi-mode data set, generating the feature matrix and iterating the dynamic weight, and full-link closed-loop processing from data cleaning, feature alignment to weight optimization is difficult to realize, so that evaluation efficiency is reduced and resource consumption is increased.

Disclosure of Invention

The invention provides an intelligent evaluation method and system for multi-modal and reinforcement learning, which are used for solving the problems of large subjective scoring error, fixed weight distribution and low data processing efficiency in the evaluation of results in the prior art by using the multi-modal learning and reinforcement learning module link based on the patent image and test text data.

In order to solve the technical problems, the invention provides an intelligent evaluation method for multi-modal and reinforcement learning, which comprises the following steps:

the method comprises the steps of obtaining original data from a patent image, a test text and experimental data, and performing data cleaning based on an improved Chebyshev norm and a shannon entropy and cross-modal alignment based on an attention mechanism to generate a characteristic tensor structure, wherein the patent image comprises a technical scheme schematic diagram, a structural diagram and a flow chart;

Based on the characteristic tensor structure, performing a XGBoost model evaluation of pre-training in the technical field of patent and a weight parameter optimization process based on Bayesian inference to generate a weight matrix to be optimized;

performing homomorphic gradient encryption and aggregation calculation processing based on secure multi-party calculation and differential privacy to generate an optimized evaluation model structure;

based on the optimized evaluation model structure, performing reinforcement learning strategy iteration based on state coding and stability verification processing based on weighted moving average to generate a final weight distribution matrix;

acquiring data from the final weight distribution matrix, performing SHA-256 hash encryption and intelligent contract verification processing based on data validity verification and authority verification, and generating a blockchain certificate record;

and (3) based on the weight matrix and the optimized evaluation model structure, performing fuzzy comprehensive evaluation based on triangle or trapezoid membership functions and visualization processing based on a histogram, a line graph, a radar graph or a thermodynamic diagram, and generating a final evaluation report structure.

Further, the process of generating the feature tensor structure further includes:

Obtaining a patent image, a test text and experimental data, and performing image quality detection based on an improved chebyshev norm and data cleaning treatment based on text language consistency detection of shannon entropy to obtain a standardized multi-modal data set;

Extracting text feature vectors and image feature vectors from the preprocessing data set, performing cross-modal alignment processing based on an attention mechanism, and generating a joint feature matrix;

And carrying out dimension normalization processing on the combined feature matrix based on the leachable weight vector to generate a feature tensor.

Further, the process of generating the weight matrix to be optimized further includes:

Inputting the characteristic tensor into a XGBoost model pre-trained in the technical field of patents, and performing initial evaluation calculation to obtain initial evaluation diversity;

Extracting index weight parameters from the initial scoring set, performing reliability verification processing based on Bayesian inference, and generating a weight parameter sequence;

and carrying out outlier filtering processing based on the box graph and a local outlier factor algorithm on the weight parameter sequence to generate a weight matrix to be optimized.

Further, the process of generating the optimized evaluation model structure further comprises:

obtaining parameters of each federal node model, and performing homomorphic encryption processing to obtain encrypted gradient data;

extracting effective update parameters from the encryption gradient, performing aggregation calculation processing based on secure multiparty calculation and differential privacy, and generating global parameters;

And performing cross-validation and leave-one-out model validation processing on the global parameters to generate an optimized evaluation model structure.

Further, the process of generating the final weight distribution matrix further includes:

Acquiring an optimized evaluation model structure, initializing a reinforcement learning intelligent agent based on state coding, and generating a strategy space;

selecting a weight adjustment strategy from a strategy space, performing improved Q-learning iterative computation, and generating a dynamic weight;

and performing stability verification processing based on weighted moving average on the dynamic weights to generate a final weight distribution matrix.

Further, the process of generating the blockchain certification record further includes:

Acquiring a final weight distribution matrix, performing SHA-256 hash encryption processing, and generating an encrypted data packet;

Key parameters are extracted from the encrypted data packet, intelligent contract verification processing based on data validity verification and authority verification is carried out, and block data are generated;

And carrying out chain link-based consensus mechanism verification processing on the block data to generate a certificate record.

Further, the process of generating the final assessment report structure further comprises:

Acquiring a weight matrix and an optimized evaluation model structure, and performing fuzzy membership calculation processing based on a triangle or trapezoid membership function to generate a fuzzy score;

Extracting positive and negative ideal solutions from the fuzzy scores, performing TOPSIS distance measurement processing based on Euclidean distance, and generating an evaluation result;

And performing visualized rendering processing on the evaluation result based on a histogram, a line graph, a radar graph or a thermodynamic diagram to generate a final evaluation report structure.

Further, generating the expression of the feature tensor structure further includes:

Based on the quality assessment in the form of an improved chebyshev norm, an image quality coefficient is defined: ,

Wherein, the A cleaning quality coefficient representing an i-th patent image; Is a summation index for traversing pixel blocks; representing the total number of pixel blocks after the i-th type image is divided; is the ith class of image Actual measurement values of the resolutions of the individual pixel blocks; And The ideal resolution mean value and standard deviation of the ith class of image are respectively; For an improved chebyshev norm;

When (when) Triggering an image enhancement algorithm;

Language consistency scores were calculated using improved shannon entropy: ,

Wherein, the Testing the language entropy value of the text for the j-th class, wherein j represents the class index of the text; is a summation index for traversing all character categories; the number of character categories; Represent the first Class text of the firstThe frequency of occurrence of the individual characters; is a smoothing factor; Represent the first Total number of characters of the class text;

The cross-modal alignment is achieved with an improved attention mechanism:

Wherein, the Aligning an attention matrix for the text-image; And Is a trainable projection matrix; The projection dimension parameter is; Semantic feature vectors representing the c-th text sample; A visual feature vector representing the d-th image sample; the sample index, c corresponds to the text mode, and d corresponds to the image mode; is a normalization function; A transpose operation representing a matrix;

after the moment matrix dynamically adjusts the characteristic weight, a joint characteristic matrix is generated :

Performing improved dimension normalization processing on the joint feature matrix to construct a feature tensor:

Wherein, the Represent the firstNormalized feature tensor of modality; Representing belonging to the first joint feature matrix Features of that part of the modality; And Respectively, the joint feature matrix is at the firstMean and standard deviation in modal dimensions; Is a learnable weight vector; Representing the Hadamard product; is a numerical stability constant.

constructing a state space of the reinforcement learning agent:

Wherein, the A state vector at the time t; Representing a state encoding function; Is an environmental parameter vector; The weight parameters of the optimization model are; Representing a vector concatenation operation;

state vector initialization policy exploration space :

Wherein, the Represent the firstThe number of the seed weight adjustment action; Represents an mth optional action;

Further, the strategy iteration is performed by adopting a modified Q-learning algorithm:

Wherein, the Is a dynamic learning rate; is a discount factor; Is an instant rewards; Is shown in the state Execute action downwardsIs a desired value of (2); indicating in the next state All possible actions are as followsThe highest Q value estimate of (a); Is an action space Any action in (a); Representing old action cost functions prior to updating;

Generating dynamic weight parameters after 20 rounds of iteration ;

For dynamic weight parametersPerforming an improved weighted moving average process:

Wherein, the Distributing a matrix for the final weight; is a smoothing coefficient; and the weight matrix is the weight matrix at the time t.

Further, a multi-modal and reinforcement learning intelligent assessment system, applied to the method of any one of the above, comprises:

The environment calibration module is used for acquiring the estimated environment parameters and the optimization model and completing the system initialization calibration;

The multi-mode acquisition module is used for acquiring patent images, test texts and experimental data and outputting a standardized multi-mode data set in a limited frame;

the feature processing module is used for extracting text feature vectors and image feature vectors and generating a joint feature matrix;

The strengthening arbitration module is used for executing strategy exploration space iterative computation and outputting dynamic weight parameters;

The decision execution module is used for receiving the dynamic weight parameters and generating a multi-mode evaluation result;

And the parameter updating module is used for recording the weight distribution time stamp and updating the strategy exploration space parameter.

The key innovation points of the invention include:

(1) And through a multi-modal learning feature extraction technology, the BERT model and the CNN technology are fused, so that the joint processing of text data and image data is realized, and a cross-modal aligned joint feature matrix is generated.

(2) And (3) dynamically adjusting the evaluation weight by utilizing Q-learning iterative calculation based on a reinforcement learning dynamic weight distribution algorithm to generate a final weight distribution matrix.

(3) By combining federal learning and blockchain technology, data sharing and dispute handling across institutions are realized, and data security and objectivity of evaluation results are ensured.

The following main beneficial effects are as follows:

(1) Through the combined processing of the text data and the image data, the accuracy of feature extraction and the alignment effect of the multi-mode data are improved, scoring errors caused by feature alignment deviation are reduced, and the method is suitable for multi-source data evaluation scenes.

(2) The evaluation weight is dynamically adjusted, the self-adaptive capacity of the evaluation model to environmental changes is enhanced, the stability and reliability of the evaluation result are improved, and the method is suitable for dynamic evaluation environments.

(3) Through the combination of federal learning and blockchain, the safety of data in the cross-mechanism sharing process is ensured, the risk of data leakage is reduced, and the objectivity and the credibility of an evaluation result are improved in a decentralization mode.

Drawings

FIG. 1 is a schematic flow chart of a multi-modal and reinforcement learning intelligent evaluation method according to an embodiment of the present application;

FIG. 2 is a block diagram of a multi-modal and reinforcement learning intelligent assessment system according to an embodiment of the present application.

Detailed Description

Referring to fig. 1, a flow chart of a multi-modal and reinforcement learning intelligent evaluation method according to an embodiment of the present invention may at least include steps S100-S600:

s100, acquiring original data from a patent image, a test text and experimental data, performing data cleaning based on an improved Chebyshev norm and shannon entropy and performing cross-modal alignment based on an attention mechanism, and generating a characteristic tensor structure;

s200, performing weight parameter optimization processing based on a characteristic tensor structure, namely performing XGBoost model evaluation of pre-training in the technical field of patents and based on Bayesian inference, and generating a weight matrix to be optimized;

s300, homomorphic gradient encryption and aggregation calculation processing based on safe multi-party calculation and differential privacy are executed, and an optimized evaluation model structure is generated;

s400, performing reinforcement learning strategy iteration based on state coding and stability verification based on weighted moving average based on the optimized evaluation model structure to generate a final weight distribution matrix;

s500, acquiring data from the final weight distribution matrix, performing SHA-256 hash encryption and intelligent contract verification processing based on data validity verification and authority verification, and generating a blockchain certificate record;

and S600, carrying out fuzzy comprehensive evaluation based on triangle or trapezoid membership functions and visualization processing based on a histogram, a line graph, a radar graph or a thermodynamic diagram based on the weight matrix and the optimized evaluation model structure, and generating a final evaluation report structure.

Step S100 comprises at least steps S110-S130:

S110, acquiring a patent image, a test text and experimental data, and performing data cleaning treatment to obtain a standardized multi-mode data set;

Specifically, the multi-modal feature extraction module S100 first receives multi-source patent image data from a patent image acquisition interface, where the patent image includes multiple types of technical scheme diagrams, structure diagrams, flowcharts, and the like, and the image format covers bitmaps, vector diagrams, and scanned images. Meanwhile, the module acquires test text information related to the patent from a test text receiving end, wherein the test text covers patent specifications, claims, abstract and related technical documents, and text formats support plain text, rich text and structured text. Further, experimental data related to the patent technology is obtained from an experimental data acquisition system, wherein the experimental data comprises a numerical type experimental result, time sequence data and multidimensional signals acquired by a sensor. And the module adopts a data cleaning processing flow for the acquired multi-mode original data, specifically, the module detects the image quality of the patent image, including resolution detection, noise identification and ambiguity judgment, the abnormal image is preprocessed by an image enhancement algorithm, and the image with serious blurring or missing is marked as abnormal and stored in an abnormal log. For the test text, executing text format unified conversion, removing redundant symbols and invalid characters, adopting a language detection module in natural language processing (Natural Language Processing, NLP) technology to confirm text language consistency, and triggering an exception handling mechanism and recording an exception text. In the aspect of experimental data, missing value detection and abnormal value elimination are executed, outliers are identified by adopting a statistical method, abnormal data are filled up by an interpolation method or a regression model, and unrepairable data are marked and isolated. The data cleaning process combines a rule engine and a machine learning model, and dynamically adjusts the cleaning strategy to adapt to the diversity of different patent fields. After the data are cleaned, the module performs standardized processing on the patent image, the test text and the experimental data, and specifically comprises unified adjustment of image size, color space conversion, text coding standardization and experimental data normalization, so that compatibility and consistency of the multi-mode data in subsequent processing are ensured. The standardized process is controlled by a configuration parameter manager, supports a plurality of preset standards and custom standards, and records operation logs and abnormal information in all processing steps, thereby being convenient for tracing and debugging. Finally, the step S110 outputs a standardized multi-modal dataset, and the output field name is "pre-processed dataset", and the dataset is used as an input of the "pre-processed dataset" in the subsequent step S120, for the cross-modal feature joint extraction and use, and meanwhile, the product also supports the multi-modal data call generated by the initial scoring of the subsequent module, such as S200, to form a complete data stream link.

S120, extracting text feature vectors and image feature vectors from the preprocessing data set, and performing cross-modal alignment processing to generate a joint feature matrix;

Specifically, the step takes the "preprocessing dataset" output in S110 as input, and first performs text feature extraction on the test text portion by using a BERT (Bidirectional Encoder Representations from Transformers, a converter represented by a bi-directional encoder) model based on a transducer architecture. The BERT model loads a patent field corpus through a pre-training language model, and combines word embedding (word embedding) and context semantic coding to generate a high-dimensional semantic vector, wherein the text feature vector comprises multi-layer semantic expressions of word level and sentence level. Further, for the patent image, the step adopts a convolutional neural network (Convolutional Neural Network, CNN) structure to extract the characteristics of the image, specifically comprises a multi-layer convolutional layer, a pooling layer and a full-connection layer, and extracts the texture characteristics, edge information and structural layout characteristics of the image, wherein the image characteristic vector can be used for describing the authenticity judgment information and key visual elements of the image. The text feature vector and the image feature vector are subjected to feature normalization processing respectively, so that feature dimensions and numerical ranges of different modes are comparable. Further, the steps adopt a cross-modal alignment algorithm to map and fuse text and image features in a semantic space, and specifically comprise a multi-modal alignment module based on an attention mechanism, wherein feature weights are adjusted by utilizing a mutual information maximization principle, so that matching of semantic corresponding relations is realized. The cross-modal alignment process eliminates the heterogeneous influence among modes by constructing a multi-modal attention matrix, dynamically capturing the correlation between the text and the image. After alignment is completed, the aligned text feature vectors and the image feature vectors are spliced according to a predefined feature dimension sequence to form a unified joint feature matrix, and the joint feature matrix structure supports multidimensional tensor representation and can reflect interaction information of multi-mode data. In the whole processing flow, the steps comprise an abnormality detection and fault tolerance mechanism of input data, and a compensation strategy or a rollback mechanism is triggered to ensure the integrity and the accuracy of the joint feature matrix aiming at the condition of feature extraction failure or alignment abnormality. The step S120 outputs a joint feature matrix, the output field name is 'joint feature matrix', the product is used as the input of the 'joint feature matrix' in the step S130, and unified feature input is provided for an initial scoring generating module in the step S200, so that the multi-mode data base of the evaluation model is ensured.

S130, performing dimension normalization processing on the combined feature matrix to generate a feature tensor;

Specifically, the step takes the 'joint feature matrix' output by the step S120 as input, firstly executes a multidimensional normalization algorithm, adopts Min-Max normalization, Z-score normalization or self-adaptive normalization methods for each feature dimension in the joint feature matrix, eliminates the difference of different feature scales, and improves the balance of feature distribution. And the normalization processing dynamically selects a proper algorithm according to the feature type, adopts linear normalization for the numerical type features, and adopts single-heat encoding for the category type features for normalization. Further, the step converts the normalized joint Feature matrix into a high-order Tensor structure, wherein the Feature Tensor (Feature Tensor) is defined as a multidimensional array, contains uniform dimension identification and index rules, and supports Tensor operation and parallel processing. The feature tensor structure design supports batch data input, is convenient for subsequent machine learning model processing, and is compatible with the data format requirements of various deep learning frameworks. In the process of generating the characteristic tensor, the step comprises an abnormal value detection mechanism, and the extreme values are identified and truncated or smoothed to ensure the stability of tensor data. The step simultaneously realizes metadata annotation of the feature tensor, and the metadata annotation comprises feature sources, dimension information and normalization parameters, so that model interpretation and debugging are facilitated. The step S130 outputs a standardized feature tensor, the output field name is "feature tensor", the product is used as the input of the step S200 of generating the initial score "feature tensor", and a unified and high-quality multi-mode feature basis is provided for the federal learning optimization module S300 and the dynamic weight adjustment module S400, so that cross-module data flow and collaborative processing are supported.

In another embodiment, in step S110, bitmap and vector diagram data from the patent image acquisition interface are first received, and a standardized multi-modal dataset is constructed by a composite data cleaning process in combination with the claim structured text at the test text receiving end and the multi-dimensional signal of the experimental data acquisition system. Quality assessment based on improved chebyshev norm form for patent image, defining image quality coefficient, formula ①

Wherein:

The cleaning quality coefficient of the i-th patent image is represented and is derived from resolution detection data of a patent image acquisition interface;

Is a summation index for traversing pixel blocks;

representing the total number of pixel blocks after the i-th type image is divided;

is the ith class of image Actual measurement values of the resolutions of the individual pixel blocks;

And The ideal resolution mean value and standard deviation of the ith class of image are respectively derived from a preset patent image standard database;

the Chebyshev norm is used for enhancing the detection sensitivity of the abnormal pixel block;

the formula converts the original image data into an intermediate index for determining the cleaning quality when The image enhancement algorithm is triggered at that time. For test text data, improved shannon entropy is used to calculate language consistency scores ②

Wherein:

The language entropy value of the j-th test text is obtained from the character frequency statistics of the NLP language detection module;

is a summation index for traversing all character categories, Ranging from 1 to;

In order to be the number of character classes,=5000, Representing the size of the patent corpus;

Represent the first Class text of the firstThe occurrence frequency of each character is derived from word frequency analysis after text redundancy symbol removal;

Is a smoothing factor and is used for processing the zero probability problem of the low-frequency character;

Represent the first Total number of characters of the class text;

The output field of this step is named as "preprocessed dataset", and is consumed by the "preprocessed dataset" of S120 as an input source for cross-modal feature extraction.

In step S120, the pre-processed dataset extracts text semantic vectors via BERT modelCNN model extracts image feature vectorsCross-modal alignment using improved attention mechanisms ③

Wherein:

Aligning the attention matrix for the text-image, derived from the image data processed by equation ① and the text data processed by equation ②;

And For a trainable projection matrix, the parameters are derived from the initialized configuration of the cross-modal alignment module;

The projection dimension parameter is;

The semantic feature vector representing the c-th text sample, derived from the BERT model, ;

The visual feature vector representing the d-th image sample, derived from the CNN model,;

Is the sample index, c corresponds to the text modality, and d corresponds to the image modality.

Normalizing each row of the matrix for a normalization function to make the sum of the rows of the matrix be 1, and representing the attention weight distribution;

representing the transpose operation of the matrix.

After the attention moment matrix dynamically adjusts the characteristic weight, a joint characteristic matrix is generated. The output field of this step is named "joint feature matrix", which is consumed by the "joint feature matrix" of S130.

S130, performing improved dimension normalization processing on the joint feature matrix to construct a feature tensor formula ④

Wherein:

Represent the first The normalized feature tensor of the mode is derived from the joint feature matrix generated by the formula ③;

Representing belonging to the first joint feature matrix Features of that part of the modality;

And Respectively, the joint feature matrix is at the firstMean and standard deviation in modal dimensions;

For the learnable weight vector, the parameters are derived from preset standards configuring a parameter manager;

Representing a hadamard product (Hadamardproduct), i.e., element-wise multiplication;

is a numerical stability constant.

The multi-mode data cleaning is realized through improved Chebyshev quality assessment and shannon entropy language analysis, a joint feature matrix is generated by combining with the alignment of an attention mechanism, and finally a feature tensor structure with dimension normalization is constructed, so that high-dimension fusion feature input is provided for a follow-up assessment model.

Step S200 includes at least steps S210-S230:

S210, inputting the characteristic tensor into XGBoost models, and performing initial evaluation calculation to obtain initial evaluation diversity;

Specifically, the step takes the "feature tensor" output in S130 as input, and firstly performs format adaptation processing on the feature tensor, and converts the feature tensor into a feature matrix format required by XGBoost (eXtreme Gradient Boosting, extreme gradient lifting) model, so as to ensure compatibility of the data structure and the model input interface. The XGBoost model is an integrated learning algorithm based on a gradient lifting tree (Gradient Boosting Tree) and has the capability of processing high-dimensional sparse data and multi-modal features. Further, the step loads a pre-trained XGBoost evaluation model which is trained according to a large-scale training sample in the technical field of patents and comprises a plurality of decision trees, each decision tree carries out splitting judgment according to different dimensions of a characteristic tensor, and leaf node output corresponding to the sample is calculated. The initial evaluation calculation process comprises traversing the characteristic tensor sample by sample, and carrying out characteristic weight weighting and gradient updating by combining a decision tree model to generate a corresponding prediction score. The scoring encompasses multiple evaluation dimensions, such as innovations, practicality, technical maturity, etc., each corresponding to an independent scoring output. And when the step is operated, the weight distribution of the input features is dynamically adjusted by combining the feature importance index of the model, so that the accuracy of evaluation is improved. The abnormal data detection modules run in parallel, perform interpolation filling or rejecting operation aiming at abnormal values or missing values in the input characteristic tensor, and ensure stability and robustness of model calculation. The step is to record model reasoning logs including input feature distribution, scoring results and calculation time consumption, so that subsequent analysis and debugging are facilitated. Finally, the step S210 outputs an initial scoring set of each dimension, and outputs a field name of "initial scoring set", and the scoring set is used as an input of the "initial scoring set" of the subsequent step S220, and simultaneously provides basic scoring data for the step S400 dynamic weight adjustment module, and supports weight optimization and strategy iteration.

S220, extracting index weight parameters from the initial score set, and performing reliability verification processing to generate a weight parameter sequence;

Specifically, the step takes the initial evaluation set output in S210 as input, firstly performs statistical analysis on the scores of each dimension, calculates statistical features such as the mean, variance, skewness and the like of the score distribution, and is used for evaluating the concentration trend and the discrete degree of the scores. Further, the step performs credibility verification on each index weight parameter in the initial scoring set according to a preset credibility threshold and by combining historical scoring data with an expert rule base. The credibility verification comprises confidence interval estimation and confidence score calculation, the credibility of the weight parameters is quantified by adopting a Bayesian inference method, and abnormal fluctuation or potential deviation of scores is identified. The step combines the multisource data fusion technology, score results from different evaluation nodes are compared, and a weighted average and confidence weighting mechanism is adopted to enhance the stability and consistency of weight parameters. Aiming at weight parameters with lower credibility, the step executes a weight correction strategy, which specifically comprises weight smoothing, interval adjustment and abnormal elimination, so as to ensure the rationality and optimization space of a weight sequence. The method comprises the steps of simultaneously introducing an abnormality detection module based on rules, automatically triggering an alarm and recording an abnormality log aiming at abrupt change points and extreme values in weight parameters, and facilitating subsequent audit. After the reliability verification of the weight parameters is completed, a weight parameter sequence meeting the optimization requirement is generated, wherein the weight sequence is a one-dimensional array and comprises weight values of all evaluation indexes and confidence marks thereof. The weight sequence structural design supports dynamic adjustment and online updating, and facilitates strategy iteration of a subsequent reinforcement learning module. The output field name of the weight parameter sequence output in the step S220 is "weight parameter sequence", the sequence is used as the input of the weight parameter sequence in the step S320, and meanwhile, the weight optimized basic data is provided for the dynamic weight adjustment module in the step S400, so that the further filtering and matrix construction of the weight parameters are supported.

S230, performing outlier filtering processing on the weight parameter sequence to generate a weight matrix to be optimized;

Specifically, the step takes the "weight parameter sequence" output in S220 as input, firstly performs outlier detection on the weight sequence, and performs outlier recognition on the weight data by adopting a Box Plot (Box Plot) and local outlier factor (Local Outlier Factor, LOF) based algorithm. The outlier filtering processing module performs a culling or replacing operation on the weight parameters marked as outliers according to the detection result, wherein the replacing operation comprises adjacent weight average filling and interpolation based on historical trend. Further, the step converts the filtered weight parameter sequence into a weight matrix structure, wherein the weight matrix is a two-dimensional array, the rows represent the category of the evaluation index, the columns represent the weight distribution time points or sample batches, and the dynamic adjustment and the historical tracing of the multi-dimensional weight are supported. The weight matrix structural design comprises metadata fields, and weight sources, filtering states and timestamp information are recorded, so that subsequent model training and audit trails are facilitated. The step combines the sparsity characteristic of the weight matrix, applies the compression storage and indexing technology, and improves the data access efficiency. The step further carries out normalization processing on the weight matrix, and an L1 norm normalization method is adopted to ensure that the weight sum in the matrix is a unit value and meets the input requirement of the reinforcement learning optimization algorithm. The exception handling module runs through the whole flow, executes a fault tolerance mechanism for data missing and exception in the weight matrix construction process, comprises automatic resampling and weight redistribution, and ensures the integrity of an output result. After the step is completed, a weight matrix to be optimized is output, the output field name is 'weight matrix to be optimized', the weight matrix is used as the input of the step S320, and a structured weight data base is provided for the dynamic weight adjustment module S400, so that the iterative optimization of the reinforcement learning strategy is supported.

Step S300 includes at least steps S310-S330:

S310, acquiring model parameters of each federal node, and performing gradient encryption processing to obtain encrypted gradient data;

The step takes local training model parameters from each federal node as input, specifically, each federal node model parameter is an intermediate product obtained after each distributed computing unit performs local training, and the specific content includes:

Core gradient data, namely model gradient vectors which are obtained by local evaluation model calculation based on locally stored multi-mode feature tensors and weight matrixes to be optimized by all nodes, wherein the model gradient vectors are specifically expressed as updating quantities of weight and bias parameters;

the security auxiliary data comprises parameters required by homomorphic encryption matched with gradient data, hash digests used for integrity verification and node digital signatures for ensuring transmission security and privacy;

Training context data, namely metadata describing local training attributes, mainly comprising the number of local samples, calculation time stamps and training iteration serial numbers, wherein the data is used for distributing proper weights for different nodes in aggregation calculation.

The federal node comprises a plurality of distributed computing units, and each unit performs local model training based on a multi-mode feature tensor and a weight matrix held by the unit and generates corresponding model gradient information. For the model gradient, firstly, gradient encryption processing is executed, the encryption processing adopts homomorphic encryption algorithm (Homomorphic Encryption), the algorithm supports addition and multiplication operation of the gradient in an encryption domain, and the data privacy and safety of gradient data in the transmission and aggregation processes are ensured. The encryption process comprises key generation, encryption algorithm initialization and encryption operation, wherein a key management module is responsible for distributing and updating keys, and the security and usability of the keys are ensured. Further, the steps comprise a gradient integrity verification mechanism, hash digest computation is performed on encrypted gradient data, and data fingerprints are generated for subsequent integrity verification and anomaly detection. The steps also realize an abnormal gradient identification mechanism, trigger retransmission or local retraining mechanism aiming at abnormal or abnormal fluctuation gradient data, and ensure the accuracy and usability of the transmitted data. In the data transmission link, the encryption gradient is transmitted through a secure communication protocol (such as TLS/SSL) to prevent man-in-the-middle attack and data leakage. In the whole processing flow, the steps dynamically regulate and control the model parameter updating frequency and the data size of each federal node, and balance the communication overhead and the model updating efficiency. The step S310 outputs encryption model gradient data, the output field name is ' encryption gradient ', the data is used as the input of the encryption gradient ' of the subsequent step S320, and simultaneously an encryption data base is provided for the S500 blockchain storage module, so that cross-mechanism data security sharing is realized.

S320, extracting effective update parameters from the encryption gradient, and performing aggregation calculation processing to generate global parameters;

Specifically, the step takes the "encryption gradient" output in S310 as input, and firstly performs format analysis and preprocessing on the received encryption gradient data, and eliminates invalid or duplicate data packets, so as to ensure the data quality of subsequent aggregation calculation. Further, the steps adopt a homomorphic encryption supported secure multi-party computing (Secure Multiparty Computation, SMPC) technology to carry out a weighted aggregation operation on the encryption gradient, wherein the weighted aggregation dynamically distributes weights based on the sample size and model contribution of each federal node. Specifically, the aggregation calculation module performs addition operation on the encryption gradient of each node according to preset weights to obtain an overall encryption gradient vector, and the vector keeps privacy protection in an encryption domain. The steps are combined with a differential privacy mechanism (DIFFERENTIAL PRIVACY), noise disturbance is introduced in the aggregation process, and model performance and privacy protection requirements are balanced. The aggregated results are then partially decrypted by the key management module, extracting valid update parameters including weight adjustment coefficients and bias update vectors reflecting the latest training state of the global model. The step integrates a model version control function, records the parameter version number and the time stamp of each aggregation, and is convenient for model iteration management and backtracking. The anomaly detection modules run in parallel, and execute filtering and correcting operations aiming at anomaly values or inconsistencies in the aggregation parameters, so that the stability of the global model parameters is ensured. After the steps are completed, global model parameters conforming to the format specification are generated, the output field name is "global parameters", the parameters are used as the input of the global parameters in the following step S330, and meanwhile, the latest model basic data is provided for the S400 dynamic weight adjustment module, so that intelligent iteration of the weight strategy is supported.

S330, performing model verification processing on the global parameters to generate an optimized evaluation model structure;

Specifically, the step takes the "global parameter" output in S320 as input, and firstly loads the global parameter into an evaluation model framework, where the model framework supports fusion and scoring computation of multi-modal features based on an integrated learning structure. Further, the step of executing model consistency verification adopts a cross verification and leave-one-out method to evaluate model prediction performance, and verification indexes comprise accuracy, recall, F1 score and mean square error, so that generalization capability of global parameters on different sub-data sets is ensured. The method comprises the steps of detecting model stability, and evaluating convergence and robustness in the model training process by monitoring fluctuation range and gradient change trend of model output. Aiming at the abnormality or performance degradation found in the verification process, the steps trigger a model fine-tuning process, automatically adjust partial parameters or retrain the partial sub-model, and improve the overall model performance. The model verification processing combines an automatic log recording function, and input data, parameter change, performance indexes and abnormal events in the verification process are recorded in detail, so that follow-up audit and optimization are supported. After verification is completed, the steps generate an optimized model structure which accords with the system interface specification, wherein the structure comprises model weight parameters, feature mapping relation and grading strategy configuration, and supports dynamic loading and online updating. The optimization model structure is packaged into a modularized component, so that cross-platform deployment and calling are facilitated. The step S330 outputs an optimized evaluation model structure, and the output field name is 'the optimized evaluation model structure', and the structure is used as the input of the subsequent step S410, and simultaneously provides a high-quality model foundation for the S600 anti-interference score aggregation module to support the generation of a final evaluation result.

Step S400 includes at least steps S410-S430:

s410, acquiring an optimized evaluation model structure, initializing a reinforcement learning intelligent agent, and generating a strategy space;

Specifically, the step takes the "optimized evaluation model structure" output in S330 as input, and first receives current evaluation environment parameters from the system environment monitoring module, where the parameters include, but are not limited to, multi-dimensional information such as evaluation task types, data distribution characteristics, historical scoring trends, external interference indexes, and the like, and the environment parameters are continuously updated through a real-time data acquisition interface and stored in an environment parameter buffer area. Further, the step loads a corresponding reinforcement learning framework according to the structure and weight configuration of the optimization model, and the reinforcement learning agent (Reinforcement LEARNING AGENT) is constructed based on a strategy gradient method and comprises a state representation layer, a strategy network and a value network, so that multidimensional state input and continuous action output are supported. Specifically, the state representation layer fuses and encodes the current environment parameters and model weight parameters to form a high-dimensional state vector, the strategy network adopts a deep neural network structure, the state vector is mapped to the action probability distribution by combining a convolution layer and a full connection layer, and the action space is defined as a weight adjustment strategy set and covers the operations of weight increase and decrease, weight redistribution, weight freezing and the like. The value network is used for estimating the expected return of the current strategy and providing a value reference for strategy optimization. The method comprises the steps of generating a strategy exploration space according to a preset strategy initialization rule, wherein the strategy exploration space comprises an initial strategy set and corresponding probability distribution, an epsilon-greedy strategy selection mechanism is supported, and the diversity and exploratory property of strategies are ensured. In the process of initializing the intelligent agent, the state normalization module is included to normalize input parameters, so that the numerical ranges of different parameter dimensions are ensured to be consistent, and gradient disappearance or explosion in the training process is avoided. The anomaly detection mechanism penetrates through the initialization flow, triggers early warning and records an anomaly log aiming at environment parameter anomaly fluctuation or model weight anomaly, and supports subsequent audit and debugging. After the initialization of the agent is completed, outputting a strategy exploration space, and outputting a field name of the strategy space, wherein the strategy space is used as the input of the strategy space in the following step S420, and meanwhile, basic data of strategy iteration is provided for a dynamic weight adjustment module to support a continuous learning process of weight optimization.

S420, selecting a weight adjustment strategy from a strategy space, and performing Q-learning iterative computation to generate a dynamic weight;

Specifically, the step takes the "policy space" output in S410 as input, and firstly, according to the current state vector and in combination with policy probability distribution, adopts an epsilon-greedy policy selection method to select a weight adjustment policy from the policy space, where the policy is defined as an adjustment operation for evaluating index weight, and the adjustment operation includes weight increment adjustment, weight proportion redistribution, weight threshold constraint and the like. Further, the steps are based on a Q-learning algorithm, an action cost function Q (s, a) is constructed, wherein the state s consists of current environment parameters and optimization model weights, and the action a corresponds to a selected weight adjustment strategy. The Q-learning iterative process comprises state evaluation, action execution, return calculation and Q value update, specifically, after executing a selected weight adjustment strategy, the system calculates new scoring errors and model performance indexes through an evaluation module, and the new scoring errors and model performance indexes are used as instant reward signals r and fed back to an intelligent agent. The Q value is updated by adopting a time difference (Temporal Difference, TD) learning method, and the updated formula integrates historical experience and future expectations by using the learning rate and discount factors. In order to improve training efficiency, an experience playback mechanism is introduced in the step, history state-action-rewarding-new state quadruples are stored in an experience pool, random sampling is used for Q value updating, and sample correlation is reduced. The steps combine the double-network structure of the target network and the main network, periodically synchronize the parameters of the target network and stabilize the learning process. Further, the steps comprise an action constraint module, wherein an upper limit constraint and a lower limit constraint are applied to the weight adjustment strategy, so that the model is prevented from being unstable due to excessive weight deviation. The abnormality monitoring module monitors Q value fluctuation and rewarding abnormality in real time, performs strategy backspacing or resampling aiming at abnormal conditions, and ensures robustness of the iterative process. After the steps complete multiple Q-learning iterations, dynamic weight parameters are generated, wherein the dynamic weight parameters are one-dimensional arrays and comprise real-time weight values of all evaluation indexes and confidence marks thereof. And the dynamic weight parameters are subjected to normalization processing, and the constraint condition of the weight sum as a unit is satisfied. The step outputs a dynamic weight, and the field is used as the input of the dynamic weight in the subsequent step S430, and simultaneously provides weight data adjusted in real time for the S600 anti-interference score aggregation module to support the self-adaptive optimization of the score.

S430, performing stability verification processing on the dynamic weights to generate a final weight distribution matrix;

Specifically, the step takes the "dynamic weight" output in S420 as input, and first performs time sequence stability analysis on the dynamic weight parameter, where the analysis calculates the mean, variance and autocorrelation coefficient of the weight parameter by using a sliding window technique, and determines the fluctuation range and trend of the weight sequence. Further, the steps are combined with a statistical process control (STATISTICAL PROCESS CONTROL, SPC) method to monitor a control chart of the weight fluctuation, identify abnormal fluctuation points and potential weight drift, and trigger weight smoothing. The weight smoothing process adopts a weighted moving average (Weighted Moving Average, WMA) algorithm, combines historical weight data and current dynamic weight, generates a smoothed weight value, and reduces short-term abnormal influence. Meanwhile, the step executes consistency verification on the weight parameters, calculates the relevance among different weight dimensions by adopting a relevance coefficient matrix, identifies highly relevant or conflicting weight combinations, and adjusts weight distribution by combining with a preset weight constraint rule to prevent excessive concentration or mutual exclusion of the weights. The step further converts the smoothed and verified dynamic weight parameters into a weight distribution matrix structure, wherein the weight matrix is a two-dimensional array, rows represent various evaluation index categories, columns represent time sequences or iterative batches, and the weight history tracking and version management are supported. The weight matrix comprises metadata fields, and weight sources, stability indexes and timestamp information are recorded, so that subsequent model audit and dynamic adjustment are facilitated. The step is also integrated with an exception log recording module, which is used for recording exception types, time and processing measures in detail aiming at exception events found in the weight verification process and supporting subsequent problem tracing. After the step of weight stability verification and matrix generation are completed, a final weight distribution matrix is output, the output field name is ' final weight distribution matrix ', the weight matrix is used as the input of the weight matrix ' in the subsequent step S510, and meanwhile, a stable and structured weight basis is provided for the S600 anti-interference scoring aggregation module, so that the robustness of the scoring result is improved.

In another embodiment, in step S410, the optimization model parameters and the real-time environmental monitoring data output in step S330 are received first, and a state space of the reinforcement learning agent is constructed:

Formula (VI) ⑤

Wherein:

the state vector at the time t is the evaluation task type data and the optimization model weight from the environment parameter cache region;

Representing a state coding function, wherein the realization mode is a three-layer fully-connected neural network;

the environment parameter vector comprises data distribution characteristics and historical scoring trend indexes;

The weight parameters of the optimization model are derived from global parameters after federal learning aggregation;

Representing a vector concatenation operation;

the state vector initialization strategy explores space 10 Weight adjustment strategies are included. Wherein, the Represent the firstThe number of the seed weight adjustment action; representing the mth optional action belonging to the space ;

This step outputs a field named "policy space" that is consumed by the "policy space" of S420.

Step S420, strategy iteration is carried out by adopting a modified Q-learning algorithm:

formula (VI) ⑥

Wherein:

For dynamic learning rate, the stability of the state vector generated according to the formula ⑤ is automatically adjusted;

Is a discount factor for balancing instant rewards with long term benefits;

Scoring errors derived from XGBoost models for immediate rewards;

Is shown in the state Execute action downwardsIs a desired value of (2);

indicating in the next state All possible actions are as followsThe highest Q value estimate of (a);

Is an action space Is used for traversing to obtain the maximum value;

Representing old action cost functions prior to updating;

Generating dynamic weight parameters after 20 rounds of iteration . This step outputs a field name "dynamic weight" that is consumed by the "dynamic weight" of S430.

Step S430 performs an improved weighted moving average process on the dynamic weights:

Wherein:

distributing a matrix for the final weight;

As a smoothing coefficient, dynamically adjusting according to a control diagram monitoring result;

For a weight matrix at the time T, k=8 represents the number of evaluation index categories, and t=50 is the time window length;

The method has the technical effects that weight strategy iteration is realized through state space coding and improved Q-learning, and a weight matrix after stability verification is generated by combining time-varying smoothing coefficients, so that a structured dynamic weight basis is provided for a blockchain verification module.

Step 500 comprises at least steps S510-S530:

s510, acquiring a final weight distribution matrix, and performing hash encryption processing to generate an encrypted data packet;

Specifically, the step takes the final weight distribution matrix output in S430 as input, and first receives transmission data including the local model update result and the weight distribution information from each federal node, where the transmission data includes encryption gradients, weight matrix snapshots, and related metadata, and the federal node includes a plurality of distributed computing units corresponding to different institutions or data holders, respectively. For the transmitted data, the steps firstly execute data integrity verification, and perform summary calculation on the data content by adopting a Hash Function (Hash Function) to generate a unique Hash value for subsequent data consistency verification. The hash encryption process is based on cryptographic hash algorithms, such as SHA-256 (Secure Hash Algorithm bits, secure hash algorithm 256 bits), ensuring the non-tamper-resistance and uniqueness of the data fingerprint. Further, the transmission data and the corresponding hash value are bound to form a blockchain transaction data packet structure comprising a data packet head, a data body and a hash abstract. The data packet structure design supports multi-layer nesting and comprises a transaction identifier, a time stamp, sending node identity verification information and a digital signature, wherein the digital signature is realized by adopting an asymmetric encryption algorithm, and the legality and non-repudiation of a data source are ensured. The step also integrates a data desensitization module, and performs desensitization processing on sensitive information such as personal identification, business secrets and other fields, including data masking, anonymization and generalization technologies, so as to meet the requirement of cross-institution data privacy protection. The anomaly detection mechanism passes through the data packet generation process, and triggers retransmission or alarm flow aiming at mismatch of hash values, signature verification failure or data format anomaly, and all anomaly events are recorded in the security log system. The step transmits the generated blockchain transaction data packet to the blockchain network node through a secure communication protocol, such as TLS (Transport Layer Security, transport layer security protocol)/SSL (Secure Sockets Layer, secure socket layer), so as to support asynchronous transmission and batch processing and consider transmission efficiency and security. Finally, the step S510 outputs the generated blockchain transaction data packet, and outputs a field name of "encrypted data packet", which is used as an input of the "encrypted data packet" in the subsequent step S520, and provides an encrypted data base for cross-mechanism data sharing and certification, so as to support the operation of the blockchain certification module.

S520, extracting key parameters from the encrypted data packet, performing intelligent contract verification processing, and generating block data;

Specifically, the step takes the "encrypted data packet" output in S510 as input, and first performs format analysis on the encrypted data packet, and extracts key parameters such as a transaction identifier, a timestamp, sending node identity information, a hash digest, and a digital signature. The steps execute verification logic by invoking a blockchain intelligent contract (Smart Contract) that approximates an automated code module pre-deployed on the blockchain, including data validity checking, entitlement verification, transaction sequence management, and exception handling rules. Specifically, the intelligent contract firstly performs uniqueness check on the transaction identifier to prevent repeated transaction or replay attack, then verifies the validity of the digital signature, and confirms the identity validity and the authorization range of the data packet sending node. Further, the intelligent contract compares the hash digests, ensures that the content of the data packet is not tampered, and performs consistency check in combination with the on-chain certificate-storing history records to prevent data conflict and counterfeiting. The intelligent contract also comprises a permission control module, and the verification flow and the certificate storage strategy are dynamically adjusted according to the node roles and the access permissions to support the requirements of multi-level permission management and compliance audit. The exception handling mechanism is integrated in the execution process of the intelligent contract, and aiming at the transaction with verification failure, the intelligent contract automatically triggers the exception event record and pushes exception information to the off-chain monitoring system for security audit. The step further encapsulates the verified encrypted data packet into verification block data, wherein the verification block data comprises transaction data, verification results, block chain node signatures and timestamp information, and a standard block chain block structure is formed. The data of the certification block supports cross-chain interaction and multi-chain synchronization, is compatible with a main chain and side chain architecture, and meets the multi-party cooperation requirement of a cross-organization. After the step is completed, the generated block data of the certificate is submitted to a consensus mechanism module of the block chain network to prepare for on-chain certificate. Finally, the step S520 outputs the generated certificate block data, and outputs the field name of the generated certificate block data, which is used as the input of the block data in the subsequent step S530, and provides basic data for forming the chain certificate record structure, thereby ensuring the automation and the reliability of the certificate flow.

S530, performing consensus mechanism verification processing on the block data to generate a certificate record;

Specifically, the step takes the "block data" output in S520 as input, and the block data is broadcast to each consensus node in the blockchain network, where the consensus node verifies and validates the block data according to a preset consensus algorithm, where the consensus algorithm includes, but is not limited to, multiple implementations such as Proof of interests (PoS), proof of works (PoW), and bayer fault tolerance algorithm (Byzantine Fault Tolerance, BFT). The step carries out multi-round verification on the validity, the integrity and the transaction legitimacy of the block data through the consensus node, and ensures that the block data accords with the network protocol specification by combining the time stamp and the node signature. Further, the consensus mechanism performs a chain linking operation on the block data, and associates the new block with the previous block through a hash pointer to form a tamper-proof block chain structure. The step comprises a bifurcation detecting and processing module, and aiming at bifurcation conditions on the chain, the longest chain rule or the weight chain rule is executed, so that the uniqueness and consistency of the chain are ensured. The steps are combined with an excitation and punishment mechanism in the consensus process, so that node behaviors are regulated, and network safety and stable operation are promoted. The exception handling mechanism monitors the abnormal node behavior, delay and attack attempts in the consensus process, triggering automatic isolation and alarm measures. After the step is completed with the consensus confirmation, the block data is written into an on-chain certificate storage database to generate an on-chain certificate storage record structure, wherein the certificate storage record structure comprises block header information, a transaction list, a verification state and an on-chain index, and supports efficient inquiry and audit. The on-chain certificate storing record structure provides access right control through an intelligent contract interface and supports cross-mechanism data sharing and dispute resolution. Finally, the step S530 outputs the generated on-chain certificate-storing record structure, and outputs the field name "certificate-storing record", which is used as the input of the subsequent step S610, and provides on-chain evidence and traceability basis for the security assurance and credibility evaluation of the whole data of the system.

Step S600 includes at least steps S610-S630:

s610, acquiring a weight matrix and an optimized evaluation model structure, and performing fuzzy membership calculation processing to generate a fuzzy score;

Specifically, the step takes the "weight matrix" output in S430 and the "optimized evaluation model structure" output in S330 as inputs, and firstly reads the weight distribution of each evaluation index in the weight matrix element by element, and combines the feature mapping relation and the grading strategy configuration in the optimization model to construct a multidimensional evaluation index space. The Fuzzy membership calculation is realized based on Fuzzy Set Theory (Fuzzy Set Theory), the membership is defined as the membership of the evaluation index on different grading grades, and the Fuzzy membership calculation specifically comprises three Fuzzy subsets of low, medium and high, and the numerical mapping is carried out by adopting a triangular or trapezoidal membership function. The step is to form a fuzzy evaluation matrix by carrying out weighted combination on the weight values in the weight matrix and the scoring output of the corresponding features in the optimization model, wherein matrix elements represent the attribution degree of each index weight to different scoring grades. Further, the steps introduce a fuzzy rule base containing fuzzy inference rules summarized based on expert experience and historical data, the rules describe fuzzy relation of weights and scores in a form of 'if-then', and the fuzzy rule base supports multi-condition and multi-result inference. The fuzzy reasoning process adopts a Mamdani type reasoning machine, specifically takes a fuzzy evaluation matrix as input, and obtains a fuzzy output set through rule matching, fuzzy synthesis and membership aggregation. The step comprises the normalization processing of the fuzzy membership degree, and the maximum-minimum method is adopted to adjust the membership degree value range, so that the numerical consistency of the output result is ensured. The anomaly detection module runs through the calculation process, performs interpolation filling or weight distribution aiming at abnormal values or missing data in the input weight matrix, and ensures continuity and integrity of membership calculation. The step of simultaneously recording fuzzy calculation logs, including input weight, membership function parameters, inference rule triggering conditions and output fuzzy sets, and supporting subsequent debugging and rule optimization. Finally, the step S610 outputs an anti-interference scoring sequence generated by fuzzy membership calculation, the output field name is 'fuzzy scoring', the scoring sequence is used as the input of the subsequent step S620 'fuzzy scoring', and a multi-dimensional scoring basis based on fuzzy logic is provided for an anti-interference scoring aggregation module, so that the robustness of a scoring result is improved.

S620, positive and negative ideal solutions are extracted from the fuzzy scores, TOPSIS distance measurement processing is carried out, and an evaluation result is generated;

Specifically, the step takes the "fuzzy score" output in S610 as input, and firstly performs data preprocessing on the fuzzy score sequence, including filling of missing values and removal of abnormal values, identifies extreme values in the score by adopting a statistical method, and corrects the extreme values by interpolation or weighted average of adjacent values to ensure the data integrity. Further, the steps construct a decision matrix according to a TOPSIS (Technique for Order Preference by Similarity to Ideal Solution, approach to ideal solution ordering) algorithm framework in the multi-attribute decision method, wherein rows of the matrix correspond to different evaluation objects, and columns correspond to fuzzy scoring indexes. The positive ideal solution is defined as the maximum score value in each index, representing the optimal evaluation level, and the negative ideal solution is defined as the minimum score value in each index, representing the worst evaluation level. The step is to normalize the decision matrix, and map the index scores to the [0,1] interval by adopting a vector normalization method, so as to eliminate the difference of different index dimensions and magnitudes. Further, the step combines the "weight matrix" output in S430, and assigns corresponding weights to each index in the normalized matrix, so as to form a weighted normalized decision matrix. The distance measurement process comprises the steps of calculating Euclidean distances from each evaluation object to positive ideal solutions and negative ideal solutions, respectively marking as D+ and D-, and comprehensively considering the scoring difference after weighting each index by adopting Euclidean distance formulas in a multidimensional space for distance calculation. The step calculates a relative proximity index, defined as the ratio of D-to D+ sum, according to the distance measurement result, wherein the numerical range is 0 to 1, and the closer the numerical value is to 1, the closer the evaluation object is to an ideal scheme. The method comprises the steps of detecting distance calculation abnormality, and executing an abnormality processing mechanism aiming at the numerical value overflow, zero distance or invalid distance condition in the calculation process, wherein the abnormality processing mechanism comprises renormalization, weight adjustment or elimination of abnormality indexes. The steps are integrated with a multi-round iteration mechanism, the weight matrix and the fuzzy scoring parameters are dynamically adjusted according to the historical evaluation result and feedback, and the stability and the accuracy of TOPSIS calculation are improved. The step records TOPSIS calculation logs, including normalized parameters, weight distribution, distance values and proximity indexes, and facilitates analysis and verification of subsequent evaluation results. Finally, the step S620 outputs a standardized evaluation result, and outputs a field name of "evaluation result", and the result is used as an input of the "evaluation result" of the subsequent step S630, and provides comprehensive score data based on the TOPSIS algorithm for the anti-interference score aggregation module, so as to support objective sorting and decision of final evaluation.

S630, performing visual rendering processing on the evaluation result to generate a final evaluation report structure;

Specifically, the step takes the "evaluation result" output in S620 as input, and firstly performs format conversion on the evaluation result data, adapts the data structure required by the visual rendering engine, includes the key value pair mapping and the multidimensional array representation in JSON format, and supports various front-end display frameworks. Further, the steps automatically generate structured report contents according to a predefined evaluation report template and by combining key indexes and comprehensive scores in evaluation results, wherein the contents comprise total scores, score distribution of each dimension, weight distribution description and historical trend comparison. The report structure supports multi-level catalogs and chapter divisions, facilitating quick browsing and in-depth analysis by users. The visual rendering process adopts various graphic representations including a histogram, a line graph, a radar graph and a thermodynamic diagram, and the specific graphic type is dynamically selected according to the nature of the evaluation index and the user configuration. The steps realize chart drawing through a graph rendering engine, support interactive operations such as zooming, suspension prompting and data screening, and improve the readability and user experience of a report. Further, the step is fused with an abnormal data labeling function, abnormal points or abnormal indexes detected in the evaluation process are marked in a report in a highlighting, icon or annotation mode, and a user is assisted in identifying potential problems. The step is also integrated with an improvement suggestion generation module, a preset improvement strategy library is called according to the evaluation result and the historical data, a targeted suggestion text is automatically generated, and the suggestion content covers the technical optimization direction, the data quality improvement and the model adjustment scheme. The report generation process comprises a permission management and data desensitization mechanism, and the visible range of report content and the display mode of sensitive information are dynamically adjusted according to different user roles. The exception handling module monitors format errors, data loss and rendering failures in the report generation process, executes automatic repair or manual alarm, and ensures the integrity and accuracy of report output. Finally, the step S630 outputs the generated final evaluation report structure, the output field is named as "final evaluation report structure", and the report is used as direct data at the output end of the system for display, archiving and subsequent analysis of the user interface, and provides closed-loop feedback data for the overall evaluation flow of the system.

FIG. 2 is a block diagram illustrating a multi-modal and reinforcement learning intelligent assessment system according to an embodiment of the present invention. As shown in fig. 2, the structure may include:

The environment calibration module 01 is used for acquiring the estimated environment parameters and the optimization model and completing the system initialization calibration. Specifically, the method comprises the steps of receiving the evaluation environment parameters from the system environment monitoring module, wherein the parameters comprise the evaluation task type, the data distribution characteristics, the historical scoring trend, the external interference index and other multidimensional information, and continuously updating and storing the multidimensional information into an environment parameter buffer area through a real-time data acquisition interface. And loading a reinforcement learning framework by combining the structure and weight configuration of the optimization model, and constructing an intelligent body of a strategy gradient method, wherein the intelligent body comprises a state representation layer, a strategy network and a value network, and supports multidimensional state input and continuous action output. The state representation layer fuses and encodes the current environment parameters and the model weight parameters to form a high-dimensional state vector, the strategy network maps the state vector to the action probability distribution, and the action space is defined as a weight adjustment strategy set. The value network estimates the expected return of the current policy, providing a value benchmark. And generating a strategy exploration space comprising an initial strategy set and probability distribution thereof, supporting an epsilon-greedy strategy selection mechanism, and ensuring the diversity and exploratory property of strategies. The state normalization module performs normalization processing on the input parameters to avoid gradient disappearance or explosion. The anomaly detection mechanism penetrates through the initialization flow, and triggers early warning and records an anomaly log aiming at environmental parameter anomaly fluctuation or model weight anomaly. After the initialization of the intelligent agent is completed, outputting a strategy exploration space, and transmitting the strategy exploration space to the strengthening arbitration module to serve as basic data of strategy iteration.

The multi-modality acquisition module 02 is used for acquiring patent images, test texts and experimental data and outputting standardized multi-modality data sets within a defined framework. Specifically, the multi-source patent image data from the patent image acquisition interface is received, and the image comprises a technical scheme schematic diagram, a structural diagram, a flow chart and other types, and comprises a bitmap, a vector diagram and a scanning image. Meanwhile, test text information related to the patent is acquired from a test text receiving end, and the text covers patent specifications, claims, abstract and related technical documents, and supports plain text, rich text and structured text formats. The experimental data acquisition system provides experimental data related to the patent technology, including numerical experimental results, time sequence data and multidimensional signals acquired by the sensor. And performing data cleaning processing on the acquired multi-mode original data, detecting the quality of the patent image, preprocessing the abnormal image through an image enhancement algorithm, marking the image with serious blurring or missing as abnormal, and storing the abnormal image into an abnormal log. And executing format unified conversion on the test text, confirming text language consistency by adopting an NLP technology, triggering an exception handling mechanism by the exception text, and recording. And (3) carrying out missing value detection and abnormal value elimination on experimental data, filling abnormal data by an interpolation method or a regression model, and marking and isolating unrepairable data. The data cleaning combines a rule engine and a machine learning model, and dynamically adjusts the cleaning strategy to adapt to the diversity of different patent fields. After data cleaning is completed, standardized processing is carried out on the patent image, the test text and the experimental data, so that the compatibility and consistency of multi-mode data are ensured. The standardized process is controlled by a configuration parameter manager, supports a plurality of preset standards and custom standards, records operation logs and abnormal information, and is convenient for tracing and debugging. And outputting a standardized multi-modal data set, and transmitting the standardized multi-modal data set to a feature processing module for cross-modal feature joint extraction.

And the feature processing module 03 is used for extracting the text feature vector and the image feature vector and generating a joint feature matrix. Specifically, a standardized multimodal data set is received from a multimodal acquisition module, and text feature extraction is performed on a test text portion by adopting a BERT model based on a transducer architecture. The BERT model loads a patent field corpus through a pre-training language model, combines word embedding and context semantic coding to generate a high-dimensional semantic vector, and the text feature vector comprises multi-layer semantic expressions of word level and sentence level. For patent images, a convolutional neural network structure is adopted to extract features, including a plurality of convolutional layers, a pooling layer and a full-connection layer, texture features, edge information and structural layout features of the images are extracted, and image feature vectors characterize authenticity judgment information and key visual elements of the images. The text feature vector and the image feature vector are subjected to feature normalization processing respectively, so that feature dimensions and numerical ranges of different modes are comparable. And mapping and fusing text and image features in a semantic space by adopting a cross-modal alignment algorithm, constructing a multi-modal attention matrix, dynamically capturing the correlation between the text and the image, and eliminating the heterogeneous influence among modes. After alignment is completed, the aligned text feature vectors and the image feature vectors are spliced according to a predefined feature dimension sequence to form a unified joint feature matrix, multi-dimensional tensor representation is supported, and interaction information of multi-mode data is reflected. The input data anomaly detection and fault tolerance mechanism ensures the integrity and accuracy of the joint feature matrix. And outputting the joint feature matrix, and transmitting the joint feature matrix to the strengthening arbitration module for strategy iterative computation.

The strengthening arbitration module 04 is used for executing iterative calculation of the strategy exploration space and outputting dynamic weight parameters. Specifically, policy selection and action execution are completed based on a joint feature matrix from a feature processing module, firstly, a weight adjustment policy is selected from a policy space by adopting an epsilon-greedy policy selection method according to a current state vector and in combination with policy probability distribution, and the policy is defined as adjustment operation of evaluation index weight, including weight increment adjustment, weight proportion weight distribution, weight threshold constraint and the like. Based on the Q-learning algorithm, an action cost function Q (s, a) is constructed, wherein the state s consists of current environmental parameters and optimization model weights, and the action a corresponds to a selected weight adjustment strategy. The Q-learning iterative process comprises state evaluation, action execution, return calculation and Q value update, and after executing the selected weight adjustment strategy, the system calculates new scoring errors and model performance indexes through the evaluation module and feeds back the new scoring errors and model performance indexes to the intelligent agent as instant reward signals. And updating the Q value by adopting a time difference learning method, and integrating historical experience and future expectation by using the learning rate and the discount factor. And an experience playback mechanism is introduced, history state-action-rewarding-new state quadruples are stored in an experience pool, random sampling is used for Q value updating, and sample correlation is reduced. And combining a double-network structure of the target network and the main network, periodically synchronizing the parameters of the target network, and stabilizing the learning process. The method comprises an action constraint module, wherein an upper limit constraint and a lower limit constraint are applied to a weight adjustment strategy, so that model instability caused by excessive weight deviation is prevented. The anomaly monitoring module monitors Q value fluctuation and rewarding anomaly in real time, performs strategy backspacing or resampling, and ensures robustness of the iterative process. After the multiple Q-learning iterations are completed, dynamic weight parameters are generated, and the one-dimensional array contains real-time weight values of all evaluation indexes and confidence marks thereof. And outputting dynamic weight parameters, and transmitting the dynamic weight parameters to a decision execution module for generating and using the multi-mode evaluation result.

The decision execution module 05 is configured to receive the dynamic weight parameter and generate a multi-modal evaluation result. Specifically, the generation of the multi-mode evaluation result is completed based on the dynamic weight parameters from the reinforcement arbitration module, the time sequence stability analysis is firstly executed on the dynamic weight parameters, the mean value, variance and autocorrelation coefficient of the weight parameters are calculated by utilizing a sliding window technology, and the fluctuation range and trend of the weight sequence are judged. And combining a statistical process control method, monitoring a control chart of the weight fluctuation, identifying abnormal fluctuation points and potential weight drift, and triggering weight smoothing. The weight smoothing processing adopts a weighted moving average algorithm, combines historical weight data and current dynamic weight, generates a smoothed weight value, and reduces short-term abnormal influence. And performing consistency verification on the weight parameters, calculating the correlation among different weight dimensions by adopting a correlation coefficient matrix, identifying highly correlated or conflicting weight combinations, and adjusting weight distribution by combining a preset weight constraint rule to prevent excessive concentration or mutual exclusion of the weights. And converting the smoothed and verified dynamic weight parameters into a weight distribution matrix structure, wherein the two-dimensional array rows represent each evaluation index category, the columns represent time sequences or iteration batches, and the history tracking and version management of the weights are supported. The weight matrix comprises metadata fields, and weight sources, stability indexes and timestamp information are recorded, so that subsequent model audit and dynamic adjustment are facilitated. The integrated exception log recording module is used for recording exception types, time and processing measures in detail aiming at exception events found in the weight verification process and supporting subsequent problem tracing. After the weight stability verification and matrix generation are completed, a final weight distribution matrix is output and transmitted to a parameter updating module for weight distribution timestamp recording and strategy exploration space parameter updating.

And the parameter updating module 06 is used for recording the weight distribution time stamp and updating the strategy exploration space parameter. Specifically, the final weight distribution matrix from the decision execution module and the dynamic weight parameters from the reinforcement arbitration module are received, and the weight distribution timestamp record and the strategy exploration space parameter update are executed. Firstly, performing time sequence analysis on a weight distribution matrix, combining historical data and current time stamps of weight distribution, generating a weight distribution time sequence, and recording time sequence change and version information of weight distribution. And updating the strategy probability distribution and the action cost function in the strategy exploration space by combining the dynamic weight parameters, adjusting the search range and the priority of the strategy space, and supporting the dynamic optimization and exploration of the strategy. The anomaly monitoring module detects the anomaly condition in the time sequence record and strategy updating process, and executes an anomaly processing mechanism comprising time sequence reconstruction, strategy space reconfiguration and weight parameter redistribution, so as to ensure the integrity and accuracy of the updating process. After the weight distribution time stamp record and the strategy exploration space parameter are updated, the updated strategy exploration space is output and is used for parameter recharging and link refreshing by the environment calibration module, so that a complete closed-loop control link is formed.

Claims

1. An intelligent assessment method for multi-modal and reinforcement learning, comprising:

2. The method of claim 1, wherein generating the feature tensor structure further comprises:

3. The method of claim 1, wherein generating the weight matrix to be optimized further comprises:

4. The method of claim 1, wherein generating the optimized evaluation model structure further comprises:

5. The method of claim 1, wherein generating the final weight distribution matrix further comprises:

6. The method of claim 1, wherein generating a blockchain certification record further comprises:

7. The method of claim 1, wherein generating the final assessment report structure further comprises:

8. The method of claim 1, wherein generating an expression for the feature tensor structure further comprises:

Based on the quality assessment in the form of an improved chebyshev norm, an image quality coefficient is defined: ;

When (when) Triggering an image enhancement algorithm;

Language consistency scores were calculated using improved shannon entropy: ;

The cross-modal alignment is achieved with an improved attention mechanism: ;

,

Performing improved dimension normalization processing on the joint feature matrix to construct a feature tensor:,

9. The method of claim 1, wherein generating the final weight distribution matrix further comprises:

constructing a state space of the reinforcement learning agent:

,

state vector initialization policy exploration space :

,

Generating dynamic weight parameters after 20 rounds of iteration ;

,

10. A multi-modal and reinforcement learning intelligent assessment system for use in the method of any of claims 1-9, comprising: