Disclosure of Invention
The invention provides an intelligent contract classification method, system and electronic equipment based on locality sensitive hashing, and aims to solve the problems that an intelligent contract classification method based on machine learning in the existing block chain is difficult in model training and inaccurate in classification.
According to the embodiment of the application, the intelligent contract classification method based on the locality sensitive hashing is provided, and comprises the following steps: step S1: acquiring a plurality of intelligent contracts based on a block chain, and extracting code information and transaction information in each intelligent contract; step S2: constructing a code characteristic matrix based on the code information, and constructing a transaction characteristic matrix based on the transaction information; step S3: performing local sensitive hashing on the code characteristic matrix to obtain a code characteristic local sensitive hashing matrix, and performing local sensitive hashing on the transaction characteristic matrix to obtain a transaction characteristic local sensitive hashing matrix; step S4: splicing the code characteristic local sensitive hash matrix and the transaction characteristic local sensitive hash matrix to obtain an intelligent contract local sensitive hash matrix; and step S5: and classifying the vectors of each line based on the intelligent contract local sensitive hash matrix to obtain various intelligent contracts of different types.
Preferably, the step S1 specifically includes: step S11: acquiring a plurality of intelligent contracts of a block chain, and generating an abstract syntax tree by all the intelligent contracts; and step S12: and traversing the abstract syntax tree, analyzing and recording the information of each node on the abstract syntax tree, and respectively acquiring corresponding code information and transaction information.
Preferably, the step S2 specifically includes: step S21: obtaining corresponding code feature matrixes in all intelligent contracts based on nodes of the abstract syntax tree and corresponding code information; and step S22: and obtaining corresponding transaction characteristic matrixes in all intelligent contracts based on the nodes of the abstract syntax tree and the corresponding transaction information.
Preferably, the step S21 specifically includes: step S211: acquiring all nodes of the abstract syntax tree, extracting the type of each node, and removing repeated nodes; and step S212: and counting the types of the residual nodes and constructing a corresponding code characteristic matrix in the intelligent contract.
Preferably, the step S3 specifically includes: step S31: generating a random matrix, wherein the row number of the random matrix is equal to the row number of the code characteristic matrix, and the column number is a user preset value; step S32: multiplying the code characteristic matrix by a random matrix to obtain a code characteristic random matrix, and carrying out normalization processing on the code characteristic random matrix to obtain a code characteristic local sensitive hash matrix; and step S33: and performing point multiplication on the transaction characteristic matrix by using a random matrix to obtain a transaction characteristic random matrix, and performing normalization processing on the transaction characteristic random matrix to obtain a transaction characteristic local sensitive hash matrix.
The invention also provides an intelligent contract classification system based on locality sensitive hashing, which comprises: the intelligent contract acquisition unit is used for acquiring a plurality of intelligent contracts based on the block chain and extracting code information and transaction information in each intelligent contract; the characteristic matrix constructing unit is used for constructing a code characteristic matrix based on the code information and constructing a transaction characteristic matrix based on the transaction information; the local sensitive hash unit is used for carrying out local sensitive hash on the code characteristic matrix to obtain a code characteristic local sensitive hash matrix and carrying out local sensitive hash on the transaction characteristic matrix to obtain a transaction characteristic local sensitive hash matrix; the matrix splicing unit is used for splicing the code characteristic local sensitive hash matrix and the transaction characteristic local sensitive hash matrix to obtain an intelligent contract local sensitive hash matrix; and the contract classification unit is used for classifying the vectors in each row based on the intelligent contract local sensitive hash matrix to obtain various intelligent contracts of different types.
Preferably, the intelligent contract obtaining unit further includes: the syntax tree construction unit is used for acquiring a plurality of intelligent contracts of the block chain and generating an abstract syntax tree from all the intelligent contracts; and the node traversing unit is used for traversing the abstract syntax tree, analyzing and recording the information of each node on the abstract syntax tree, and respectively acquiring corresponding code information and transaction information.
Preferably, the locality-sensitive hashing unit further comprises: the random matrix generating unit is used for generating a random matrix, the row number of the random matrix is equal to the row number of the code characteristic matrix, and the column number is a user preset value; the code matrix point multiplication unit is used for point multiplication of the code characteristic matrix by a random matrix to obtain a code characteristic random matrix, and normalization processing is carried out on the code characteristic random matrix to obtain a code characteristic local sensitive hash matrix; and the transaction matrix dot multiplication unit is used for dot multiplication of the transaction characteristic matrix by the random matrix to obtain a transaction characteristic random matrix, and normalization processing is carried out on the transaction characteristic random matrix to obtain a transaction characteristic local sensitive hash matrix.
The invention also provides electronic equipment, which comprises a memory and a processor, wherein the memory stores a computer program, and the computer program is set to execute the intelligent contract classification method based on locality sensitive hashing in any item when running; the processor is configured to execute, by the computer program, the smart contract classification method based on locality-sensitive hashing according to any one of the above.
The intelligent contract classification method, system and electronic equipment based on locality sensitive hashing have the following beneficial effects:
1. constructing a code feature matrix and a transaction feature matrix by combining the code information and the transaction information of the plurality of intelligent contracts, and carrying out locality sensitive hash calculation on the obtained data to obtain corresponding code characteristic locality sensitive hash matrix and transaction characteristic locality sensitive hash matrix, respectively representing the similarity of code information and transaction information in the intelligent contract, finally splicing the two sensitive hash matrixes to obtain a final intelligent contract local sensitive hash matrix, compared with the technology of clustering the intelligent contracts by using machine learning, the method does not need a label data training model, does not need to consume manpower to finish data labeling work, and reduces the dependence on a large number of training samples when the machine learning classification model is trained. Meanwhile, the method has stronger adaptability to the new type of intelligent contracts, and based on the machine learning method, the new type of intelligent contracts also need to be subjected to sample learning again to refine characteristics. The method only needs to calculate the local sensitive hash value of the new type of contract according to the rule, so that the local sensitive hash value is different from the existing local sensitive hash value, then the local sensitive hash value is classified into one type, and finally the new type of intelligent contract can be classified according to the user check. Furthermore, the method performs classification based on the form of the matrix through a calculation method of the local sensitive hash, namely, a technology for linearly calculating the similarity of the intelligent contract, so that the efficiency of classification calculation is higher, and the method is more suitable for large-scale and frequent intelligent contract clustering scenes. In addition, the method combines the code information and the transaction information, and compared with the existing non-machine learning intelligent contract clustering technology, the used characteristics are richer and more comprehensive.
2. Repeated node types are removed, the repeated nodes are classified into the same type, subsequent calculation is reduced, and classification efficiency is improved.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.
Referring to fig. 1, a first embodiment of the present invention discloses an intelligent contract classification method based on locality sensitive hashing, which includes the following steps:
step S1: and acquiring a plurality of intelligent contracts based on the block chain, and extracting code information and transaction information in each intelligent contract.
Step S2: and constructing a code characteristic matrix based on the code information, and constructing a transaction characteristic matrix based on the transaction information.
Step S3: and carrying out local sensitive hashing on the code characteristic matrix to obtain a code characteristic local sensitive hashing matrix, and carrying out local sensitive hashing on the transaction characteristic matrix to obtain a transaction characteristic local sensitive hashing matrix.
Step S4: and splicing the code characteristic local sensitive hash matrix and the transaction characteristic local sensitive hash matrix to obtain an intelligent contract local sensitive hash matrix. And
step S5: and classifying the vectors of each line based on the intelligent contract local sensitive hash matrix to obtain various intelligent contracts of different types.
It is understood that in step S1, the smart contract is a program running on the blockchain, which is event driven, has a status, and can hold assets on the ledger. The user may write asset transaction logic into the intelligent contract to effect value transfer. The transaction information refers to information stored in the transaction, such as transfer amount and the like, and the code information refers to functions of a programming language of the intelligent contract, such as conditional branching, circulation, transfer and the like.
It is understood that, in step S2, the code feature matrix characterizes which agent feature exists in which intelligent contract or not among all the intelligent contracts collected currently, and is expressed in the form of a matrix, and the transaction feature matrix is the same as the code feature matrix.
It is understood that in step S3, the locality sensitive hashing is a method for fast clustering of large-scale data, and its basic idea is: if the two data points are similar in the original data space, the data points still have high similarity after being respectively subjected to the hash function conversion. The method comprises the steps of adopting a local sensitive Hash algorithm to solve the problem of comparing massive intelligent contracts, namely, carrying out Hash function conversion through the local sensitive Hash algorithm, and then keeping original similarity for reservation.
It can be understood that, in step S4, the code feature locality sensitive hash matrix and the transaction feature locality sensitive hash matrix are concatenated to classify the intelligent contract based on the combination of the code information and the transaction information, so that the classification of the intelligent contract is more accurate, and the classification accuracy is improved.
It is to be understood that, in step S5, the vectors in each row of the smart contract locality-sensitive hash matrix represent locality-sensitive hash values of one smart contract, and different types of smart contracts can be accurately classified based on similarity of the vectors.
It is to be understood that, in step S5, the ith row of the locality-sensitive hash matrix H' represents the locality-sensitive hash value of the ith intelligent contract, and intelligent contracts (highly-similar intelligent contracts) having the same locality-sensitive hash value are grouped into one class. The user may randomly pick an intelligent contract from each cluster to deduce the type of intelligent contract in the cluster. If a new locality sensitive hash value is found, it means that a new intelligent contract class appears on the blockchain.
It will be appreciated that by constructing a code signature matrix and a transaction signature matrix from the code information and transaction information for a plurality of intelligent contracts, and carrying out locality sensitive hash calculation on the obtained data to obtain corresponding code characteristic locality sensitive hash matrix and transaction characteristic locality sensitive hash matrix, respectively representing the similarity of code information and transaction information in the intelligent contract, finally splicing the two sensitive hash matrixes to obtain a final intelligent contract local sensitive hash matrix, compared with the technology of clustering the intelligent contracts by using machine learning, the method does not need a label data training model, does not need to consume manpower to finish data labeling work, and reduces the dependence on a large number of training samples when the machine learning classification model is trained. Meanwhile, the method has stronger adaptability to the new type of intelligent contracts, and based on the machine learning method, the new type of intelligent contracts also need to be subjected to sample learning again to refine characteristics. The method only needs to calculate the local sensitive hash value of the new type of contract according to the rule, so that the local sensitive hash value is different from the existing local sensitive hash value, then the local sensitive hash value is classified into one type, and finally the new type of intelligent contract can be classified according to the user check. Furthermore, the method performs classification based on the form of the matrix through a calculation method of the local sensitive hash, namely, a technology for linearly calculating the similarity of the intelligent contract, so that the efficiency of classification calculation is higher, and the method is more suitable for large-scale and frequent intelligent contract clustering scenes. In addition, the method combines the code information and the transaction information, and compared with the existing non-machine learning intelligent contract clustering technology, the used characteristics are richer and more comprehensive.
Referring to fig. 2, the step S1 specifically includes:
step S11: acquiring a plurality of intelligent contracts of a block chain, and generating an abstract syntax tree by all the intelligent contracts; and
step S12: and traversing the abstract syntax tree, analyzing and recording the information of each node on the abstract syntax tree, and respectively acquiring corresponding code information and transaction information.
It is to be understood that in step S11, specifically, an abstract syntax tree is generated for all intelligent contracts using the solid-Antlr 4 tool, where the abstract syntax tree has a plurality of nodes, and each node represents a syntax structure in the contract.
It is understood that in step S12, the abstract syntax tree is traversed, and the information in each node in the syntax tree is parsed and recorded, including querycoctrbalance representing a query contract balance operation, Transfer representing a Transfer operation, IfStatement representing a conditional statement, and so on.
It is understood that the transaction information is also obtained according to the above steps S11-S12, and will not be described herein.
It is understood that steps S11-S12 are only one embodiment of this example, and the embodiment is not limited to steps S11-S12.
Referring to fig. 3, the step S2 specifically includes:
step S21: obtaining corresponding code feature matrixes in all intelligent contracts based on nodes of the abstract syntax tree and corresponding code information; and
step S22: and obtaining corresponding transaction characteristic matrixes in all intelligent contracts based on the nodes of the abstract syntax tree and the corresponding transaction information.
It can be understood that in step S21, statistics is made about which node classes each contract has, resulting in a 0, 1 code feature matrix McThe size of the code feature matrix is z multiplied by m, z represents the number of contracts, and m represents the number of node types, so that the ith and jth elements of the code feature matrix represent whether the ith intelligent contract has the jth node type, if so, the jth element is equal to 1, otherwise, the jth element is 0.
It is understood that steps S21-S22 are only one embodiment of this example, and the embodiment is not limited to steps S21-S22.
Optionally, referring to fig. 4, as an embodiment, the step S21 specifically includes:
step S211: acquiring all nodes of the abstract syntax tree, extracting the type of each node, and removing repeated nodes; and
step S212: and counting the types of the residual nodes and constructing a corresponding code characteristic matrix in the intelligent contract.
It can be understood that in step S211, the repeated node types are removed, and the repeated nodes are classified into the same class, so that the subsequent calculation is reduced, and the classification efficiency is improved.
Referring to fig. 5, the step S3 specifically includes:
step S31: and generating a random matrix, wherein the row number of the random matrix is equal to the row number of the code characteristic matrix, and the column number is a user preset value.
Step S32: and multiplying the code characteristic matrix by a random matrix to obtain a code characteristic random matrix, and carrying out normalization processing on the code characteristic random matrix to obtain a code characteristic locality sensitive hash matrix. And
step S33: and performing point multiplication on the transaction characteristic matrix by using a random matrix to obtain a transaction characteristic random matrix, and performing normalization processing on the transaction characteristic random matrix to obtain a transaction characteristic local sensitive hash matrix.
It is understood that in step S31, a 0, 1 random matrix V of m x r dimensions is generatedcWhere M represents a code feature matrix McThe number of rows (r) represents the degree of loose clustering, the larger r represents the stricter, and the looser r the representation.
It is to be understood that in step S32, the code feature matrix M is appliedcDot-by-dot random matrix VcObtaining a random matrix H of code characteristicscThe size is z × r. Adjusting code feature locality sensitive hash matrix HcIs adjusted to 1 if the element is greater than the threshold value t, otherwise is 0, and a final locality sensitive hash matrix H 'is obtained'cLocal sensitive Hash matrix H'cThe vector of row i is the locally sensitive hash value of the code feature of the intelligent contract of row i. H'cIs HcThe result of normalization can accelerate the clustering efficiency.
It is understood that steps S31-S33 are only one embodiment of this example, and the embodiment is not limited to steps S31-S33.
Referring to fig. 6, a second embodiment of the present invention provides a locality-sensitive-hash-based intelligent contract classification system, which uses the locality-sensitive-hash-based intelligent contract classification method provided in the first embodiment, and includes:
the intelligent contract acquisition unit 1 is used for acquiring a plurality of intelligent contracts based on a block chain, and extracting code information and transaction information in each intelligent contract.
And the feature matrix constructing unit 2 is used for constructing a code feature matrix based on the code information and constructing a transaction feature matrix based on the transaction information.
And the locality sensitive hashing unit 3 is used for performing locality sensitive hashing on the code feature matrix to obtain a code feature locality sensitive hashing matrix, and performing locality sensitive hashing on the transaction feature matrix to obtain a transaction feature locality sensitive hashing matrix.
And the matrix splicing unit 4 is used for splicing the code characteristic local sensitive hash matrix and the transaction characteristic local sensitive hash matrix to obtain an intelligent contract local sensitive hash matrix. And
and the contract classification unit 5 is used for classifying the vectors in each row based on the intelligent contract local sensitive hash matrix to obtain various intelligent contracts of different types.
Referring to fig. 7, the intelligent contract obtaining unit 1 further includes:
a syntax tree construction unit 11, configured to obtain multiple intelligent contracts of a block chain, and generate an abstract syntax tree for all the intelligent contracts; and
and the node traversing unit 12 is configured to traverse the abstract syntax tree, analyze and record information of each node on the abstract syntax tree, and obtain corresponding code information and transaction information, respectively.
Referring to fig. 8, the locality-sensitive hashing unit 3 further includes:
the random matrix generating unit 31 is configured to generate a random matrix, where the number of rows of the random matrix is equal to the number of rows of the code feature matrix, and the number of columns is a user preset value.
And the code matrix dot multiplication unit 32 is configured to multiply the code feature matrix dot by the random matrix to obtain a code feature random matrix, and perform normalization processing on the code feature random matrix to obtain a code feature locality sensitive hash matrix. And
and the transaction matrix dot multiplication unit 33 is configured to dot-multiply the transaction feature matrix by the random matrix to obtain a transaction feature random matrix, and perform normalization processing on the transaction feature random matrix to obtain a transaction feature locality sensitive hash matrix.
Referring to fig. 9, a third embodiment of the present invention provides an electronic device, where the electronic device includes a memory 10 and a processor 20, and the memory 10 stores therein an arithmetic machine program, where the arithmetic machine program is configured to execute, when running, the steps in any one of the embodiments of the smart contract classification method based on locality sensitive hashing. The processor 20 is configured to execute the steps of any one of the embodiments of the smart contract classification method based on locality-sensitive hashing by using the arithmetic machine program.
Optionally, in this embodiment, the electronic device may be located in at least one network device of a plurality of network devices of an operating machine network.
The intelligent contract classification method, system and electronic equipment based on locality sensitive hashing have the following beneficial effects:
1. constructing a code feature matrix and a transaction feature matrix by combining the code information and the transaction information of the plurality of intelligent contracts, and carrying out locality sensitive hash calculation on the obtained data to obtain corresponding code characteristic locality sensitive hash matrix and transaction characteristic locality sensitive hash matrix, respectively representing the similarity of code information and transaction information in the intelligent contract, finally splicing the two sensitive hash matrixes to obtain a final intelligent contract local sensitive hash matrix, compared with the technology of clustering the intelligent contracts by using machine learning, the method does not need a label data training model, does not need to consume manpower to finish data labeling work, and reduces the dependence on a large number of training samples when the machine learning classification model is trained. Meanwhile, the method has stronger adaptability to the new type of intelligent contracts, and based on the machine learning method, the new type of intelligent contracts also need to be subjected to sample learning again to refine characteristics. The method only needs to calculate the local sensitive hash value of the new type of contract according to the rule, so that the local sensitive hash value is different from the existing local sensitive hash value, then the local sensitive hash value is classified into one type, and finally the new type of intelligent contract can be classified according to the user check. Furthermore, the method performs classification based on the form of the matrix through a calculation method of the local sensitive hash, namely, a technology for linearly calculating the similarity of the intelligent contract, so that the efficiency of classification calculation is higher, and the method is more suitable for large-scale and frequent intelligent contract clustering scenes. In addition, the method combines the code information and the transaction information, and compared with the existing non-machine learning intelligent contract clustering technology, the used characteristics are richer and more comprehensive.
2. Repeated node types are removed, the repeated nodes are classified into the same type, subsequent calculation is reduced, and classification efficiency is improved.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.