CN118796932A

CN118796932A - Data synchronization method, device, equipment and storage medium

Info

Publication number: CN118796932A
Application number: CN202311338585.3A
Authority: CN
Inventors: 孙宇; 朱沛东; 刘春游; 金文萱; 张清华; 王飞; 牛亚宾; 严明; 李�杰
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Financial Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Financial Technology Co Ltd
Priority date: 2023-10-16
Filing date: 2023-10-16
Publication date: 2024-10-18

Abstract

The invention belongs to the technical field of data processing, and discloses a data synchronization method, a device, equipment and a storage medium. According to the method, when the data increment message is generated in any data middleware, a data synchronization strategy among all data centers is obtained; and realizing data synchronization among the data centers according to the data increment message based on the data synchronization strategy. When the data increment message is detected to be generated in the data middleware, the data synchronization strategy is set according to the architecture identification result among the data centers, and then the data synchronization among the data centers is realized based on the data synchronization strategy and the data increment message, so that a plurality of different architectures can be supported, the architectures can be switched only by carrying out a small amount of processing, and the difficulty of architecture switching is reduced.

Description

Data synchronization method, device, equipment and storage medium

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a data synchronization method, apparatus, device, and storage medium.

Background

The multi-center system architecture is a set of complete and relatively independent running systems established among a plurality of data centers in different geographic positions, and data among the systems are synchronized with low delay so as to ensure high availability and data consistency of the systems, and the main scheme nowadays is a core data center, a unitized database and table, a distributed database and the like, and if necessary, manual intervention is adopted to solve data conflict.

However, the core data center adopts a unidirectional synchronous data mode, which can solve the problem of synchronous row conflict, and is usually used as a master-slave architecture (disaster recovery, cold recovery) scheme, which cannot meet the requirement of synchronous data among multiple master systems, and all data modification operations are completed on a master library according to the requirement of strong consistency, so that the pressure of the master library can be increased, and if a master node is down, a slave node faces the problem of faults; although the unitized data slicing can solve the synchronous conflict problem and meet the requirements of multiple data centers, the response time of complex queries such as aggregation operation is increased, and the complexity of operations such as data migration and index reconstruction is increased; the performance of the distributed database in all aspects is very high, but the existing service system is changed, the distributed database is put into a large stream, a quite long time is needed, the cost is very high, the overall realization difficulty is too high, the conditions are too severe, and the distributed database is impractical for most of non-reconstruction requirements; and manual intervention requires operation and maintenance personnel in projects, so that the labor cost is extremely high.

Therefore, when the enterprises face different scenes, the connection schemes among the data centers are required to be adjusted, common data system construction is mainly based on a certain architecture and a synchronization scheme, when the projects face various requirements such as performance bottlenecks, technical reconstruction and multi-center construction, a great deal of resources are required to be applied to make great system changes, such as double-center system construction, the projects are often provided with schemes such as main and standby (different-place disaster facilities and same-city disaster facilities), if the projects are required to be adjusted to be double-active systems due to factors such as business development, the original system needs to give a great deal of resources and manpower to coordinate, such as application for measuring and calculating database network, synchronous bandwidth, special lines, construction of data bidirectional synchronization, additional construction caused by data conflict and the like, upgrading work is very complicated, and the adjustment of the architecture is very difficult.

Disclosure of Invention

The invention mainly aims to provide a data synchronization method, a device, equipment and a storage medium, and aims to solve the technical problem that the architecture is difficult to switch due to complicated flow when the architecture of a data center is switched in the prior art.

To achieve the above object, the present invention provides a data synchronization method, including the steps of:

When a data increment message is generated in any data middleware, acquiring a data synchronization strategy among the data centers, wherein the data middleware is respectively arranged in the data centers, and when the data middleware is subjected to data change in the data centers, the data increment message is generated according to the changed data, and the data synchronization strategy is set according to a framework identification result among the data centers;

and realizing data synchronization among the data centers according to the data increment message based on the data synchronization strategy.

Optionally, the data synchronization policy includes a unidirectional synchronization policy;

the step of implementing data synchronization between the data centers according to the data increment message based on the data synchronization strategy comprises the following steps:

generating a transaction message according to the data increment message;

and adding the transaction message into a message queue so as to enable a consumer of the message queue to perform data synchronization on a data center corresponding to the consumer according to the transaction message.

Optionally, the data synchronization policy includes a bidirectional synchronization policy;

Acquiring current time information and generating a random number;

constructing a high-precision time stamp according to the current time information and the random number;

And adding the data increment message into an ordered set according to the high-precision time stamp as a sequencing basis, so that a consumer corresponding to the ordered set can sequentially broadcast and distribute the data increment message in the ordered set according to the set sequence of the ordered set, and perform data synchronization on each data center.

Optionally, the data synchronization policy includes a star synchronization policy;

Acquiring a distributed lock and detecting whether the distributed lock exists in a shared cache;

If not, writing the distributed lock into the shared cache;

after the writing is successful, broadcasting and distributing the data increment message so as to synchronize the data of each data center;

After synchronization is complete, the distributed lock is removed from the shared cache.

Optionally, the step of obtaining the distributed lock includes:

Acquiring a data increment type corresponding to the data increment message;

determining a lock construction field and a lock construction mode according to the data increment type;

acquiring field data corresponding to each lock construction field;

and splicing the field data based on the lock construction mode to obtain the distributed lock.

Optionally, the step of acquiring a data synchronization policy between each data center includes:

Acquiring data center codes corresponding to all connected data centers, wherein the data center codes are unique identification codes for identifying the data centers;

Aggregating the data center codes corresponding to the data centers to obtain a center code set;

performing data deduplication on the center code set to obtain a deduplication code set;

detecting the code quantity in the duplicate code set;

If the number of codes is larger than the preset number, judging that the data synchronization strategy among the data centers is a star-shaped synchronization strategy.

Optionally, the step of detecting the number of codes in the deduplication code set includes:

If the number of codes is smaller than or equal to the preset number, analyzing the master-slave configuration files in each data center to obtain master-slave configuration information corresponding to each data center;

determining master-slave directional relation among all data centers according to the master-slave configuration information;

if the master-slave directional relation is a unidirectional directional relation, judging that the data synchronization strategy among the data centers is a unidirectional synchronization strategy;

and if the master-slave directional relation is a bidirectional directional relation, judging that the data synchronization strategy among the data centers is a bidirectional synchronization strategy.

In addition, in order to achieve the above object, the present invention also provides a data synchronization device, which includes the following modules:

the detection module is used for acquiring a data synchronization strategy among the data centers when detecting that a data increment message is generated in any data middleware, wherein the data middleware is respectively arranged in the data centers, the data increment message is generated according to changed data when the data middleware is changed in the data centers, and the data synchronization strategy is set according to a framework identification result among the data centers;

And the synchronization module is used for realizing data synchronization among the data centers according to the data increment message based on the data synchronization strategy.

In addition, to achieve the above object, the present invention also proposes a data synchronization apparatus including: a processor, a memory and a data synchronization program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the data synchronization method as described above.

In addition, in order to achieve the above object, the present invention also proposes a computer-readable storage medium having stored thereon a data synchronization program which, when executed, implements the steps of the data synchronization method as described above.

According to the method, when the data increment message is generated in any data middleware, a data synchronization strategy among all data centers is obtained; and realizing data synchronization among the data centers according to the data increment message based on the data synchronization strategy. When the data increment message is detected to be generated in the data middleware, the data synchronization strategy is set according to the architecture identification result among the data centers, and then the data synchronization among the data centers is realized based on the data synchronization strategy and the data increment message, so that a plurality of different architectures can be supported, the architectures can be switched only by carrying out a small amount of processing, and the difficulty of architecture switching is reduced.

Drawings

FIG. 1 is a schematic diagram of an electronic device of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flowchart of a data synchronization method according to a first embodiment of the present invention;

FIG. 3 is a schematic diagram of an apparatus according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating attributes of a log resolution object according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a unidirectional synchronization process according to an embodiment of the invention;

FIG. 6 is a schematic diagram of a bidirectional synchronization flow according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a star synchronization flow according to an embodiment of the present invention;

FIG. 8 is a flowchart of a second embodiment of the data synchronization method of the present invention;

fig. 9 is a block diagram of a first embodiment of a data synchronization device according to the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Referring to fig. 1, fig. 1 is a schematic diagram of a data synchronization device structure of a hardware running environment according to an embodiment of the present invention.

As shown in fig. 1, the electronic device may include: a processor 1001, such as a central processing unit (Central Processing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, a memory 1005. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a Wireless interface (e.g., a Wireless-Fidelity (WI-FI) interface). The Memory 1005 may be a high-speed random access Memory (Random Access Memory, RAM) or a stable nonvolatile Memory (NVM), such as a disk Memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.

Those skilled in the art will appreciate that the structure shown in fig. 1 is not limiting of the electronic device and may include more or fewer components than shown, or may combine certain components, or may be arranged in different components.

As shown in fig. 1, an operating system, a network communication module, a user interface module, and a data synchronization program may be included in the memory 1005 as one type of storage medium.

In the electronic device shown in fig. 1, the network interface 1004 is mainly used for data communication with a network server; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 in the electronic device of the present invention may be disposed in a data synchronization device, where the electronic device invokes a data synchronization program stored in the memory 1005 through the processor 1001 and executes a data synchronization method provided by an embodiment of the present invention.

An embodiment of the present invention provides a data synchronization method, referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of the data synchronization method of the present invention.

In this embodiment, the data synchronization method includes the following steps:

step S10: and when the generation of the data increment message in any data middleware is detected, acquiring a data synchronization strategy among all the data centers.

It should be noted that, the execution body of the embodiment may be the data synchronization device, or a cluster formed by a plurality of data synchronization devices, where the data synchronization device may be an electronic device such as a personal computer, a server, or other electronic devices capable of implementing the same or similar functions, and the embodiment is not limited, and in the embodiment and the embodiments below, the data synchronization method of the present invention is described by taking the data synchronization device as an example.

It should be noted that, the data synchronization device may be connected to a plurality of data centers at the same time, and each data center may be provided with a data middleware, and when data change occurs in the data center, the data middleware generates a data increment message according to the changed data. The data synchronization device can automatically identify the connection architecture among the connected data centers and set a corresponding data synchronization strategy according to the architecture identification result.

For example: assuming that the database cluster set in the data center is a mysql data cluster, the data middleware set in the data center may be a cananal middleware, the cananal middleware may disguise itself as MYSQL SLAVE, take the database cluster in the data center as MYSQL MASTER, simulate an interaction protocol of MYSQL SLAVE, send a synchronization (dump) request to the database cluster, send a data change log (binary log) to the cananal middleware after MYSQL MASTER receives the dump request, and analyze the binary log to generate a data increment message according to the binary log object.

Wherein generating the data delta message from the binary log object may be formatting the binary log object to generate the data delta message, for example: formatting the binary log object into json format, and taking the generated json character string as a data increment message.

For ease of understanding, the description will now be given with reference to fig. 3 and 4, but the present solution is not limited thereto. Fig. 3 is a schematic diagram of an apparatus structure of the present embodiment, and fig. 4 is a schematic diagram of a log parsing object attribute of the present embodiment.

As shown in fig. 3, the data synchronization device or a cluster composed of a plurality of data synchronization devices may include rabbitmq clusters, a process module, a redis cluster and a plurality of agents, and is connected with the personal middleware and the database cluster of each data center through the agents;

wherein, database cluster: the main stream relational databases Mysql, oracle and the like are contained, and databases synchronized by using binlog logs are supported in principle, and the default database is Mysql because the canal middleware is used for collecting the database binlog;

canal middleware: providing incremental data subscription and consumption based on MySQL database incremental log parsing;

The Agent component can be realized by java language and is a key component for data synchronization. A queue storing sql and node information of the process are maintained in the Agent component. The method comprises the steps of periodically detecting and activating a node state between a process synchronization module and a cache middleware, generating a corresponding data output stream after the cache analyzes binlog, determining a table corresponding to operation after capturing the data output stream, executing sql and the like by an Agent component, caching data to an internal queue, sending a request to a redis cluster, performing hash calculation according to the table name, determining a data falling point, and storing zset data structures. The Agent component sends a data processing request to the Process module, and all agents in the system need to be called back to complete data synchronization after the Process processes the data;

Process synchronization module: the Process module is realized by java language, is a synchronous processing center, and processes the data pulling, the data distributing, the resource releasing and the log recording. The cas optimistic lock control synchronization logic is adopted in the system, a ecache in-Process cache framework is built in to store data, the Process module probes the Agent component once every 5 seconds to check the system state, and after receiving the data processing request of the Agent, the system can be sequentially synchronized to all the Agent components in the system.

Redis Cluster: the Redis cluster can adopt a six-master and six-slave structure, data is divided into 16384 hash slots (hash slots), each hash slot has a number, from 0 to 16383, the hash slots are distributed on different Redis nodes, and when an Agent stores the data to the Redis, the hash value (which can be calculated by using a CRC16 algorithm) can be calculated according to a key value (table name) to judge which slot position is located.

As shown in fig. 4, the binary log object after parsing may include a plurality of fields, where a database field stores a database identifier indicating that a change occurs, a table field is used to store a table name of the data table indicating that a change occurs, a type field is used to indicate a type (may include INSERT, UPDATE, DELETE types) of the data change, an es field is used to store a time spent by the database cluster to synchronize the data change, an sql may be used to store an sql statement that causes the data change, a data field is used to store the data after the change, and an old is used to store the data before the change.

Step S20: and realizing data synchronization among the data centers according to the data increment message based on the data synchronization strategy.

It should be noted that the data synchronization policy may include a unidirectional synchronization policy, a bidirectional synchronization policy, a star synchronization policy, and the like, and of course, more synchronization policies may be set according to actual needs, which is not limited in this embodiment.

In practical use, different data synchronization strategies can correspond to different data synchronization modes, and the data synchronization between the data centers can be realized according to the data increment information based on the data synchronization strategies.

In a specific implementation, if the data synchronization policy is a unidirectional synchronization policy, it indicates that the data center performs data synchronization in a master-slave mode (i.e. one data center is a master node, and the other data centers are slave nodes, and data is synchronized from the master node to the slave nodes), where step S20 in this embodiment may include:

generating a transaction message according to the data increment message;

It should be noted that, the message queue may be a message queue set in the rabbitmq clusters, and generating the transaction message according to the data increment message may be converting the data increment message into a data structure that may be stored in the message queue, so as to generate the transaction message. The consumer-corresponding data center may be a data center that interfaces with the consumer.

In practical use, when a unidirectional synchronization strategy is adopted, the data center performs data synchronization in a master-slave mode, at this time, a data increment message data center is generated as a master node, the data centers which are in butt joint with each other are producers of a message queue, the other data centers which need to be synchronized are slave nodes, the data centers which are in butt joint with each other are consumers of the message queue, after transaction messages are stored in the message queue, the consumers of the message queue can extract the data increment message from the message queue, and perform data synchronization on the data centers corresponding to the consumers according to the data increment message (for example, the data centers corresponding to the consumers are controlled to execute SQL sentences in the data increment message).

For easy understanding, based on the above fig. 3, a description will be made with reference to fig. 5, and fig. 5 is a schematic diagram of a unidirectional synchronization flow in this embodiment, as shown in fig. 5, each data center interfaces with an Agent, where the Agent is started as an MQ client, except that the Agent that interfaces with a master node (as the data center of the master node) accesses a message producer, and the Agent that interfaces with a slave node (as the data center of the slave node) accesses a message consumer.

After acquiring the data increment message transmitted by the signal middleware, the Agent docked by the master node transmits the generated transaction message to the MQ server (namely rabbitmq clusters in FIG. 3) after processing, and stores the transaction message in a message queue. The data table with more traffic and higher writing request can be preset with a queue in the MQ server, so that one table corresponds to one queue, and the order of the queues is utilized to make the order synchronization of the data.

The consumer (Agent docked from the node) may listen to the message queue, pull the transaction message from the message queue, and synchronize locally based on the transaction message (i.e., synchronize data for the docked slave node based on the transaction message). In order to ensure accurate delivery and accurate consumption of the messages, the message queue can close automatic ACK of the messages and open transaction messages, the transaction messages are removed from the message queue after local data synchronization by a consumer Agent, and in order to ensure reliability of data synchronization, the consumer Agent can also save synchronous progress information of each table and send the synchronous progress information to a Process module through an asynchronous request.

In a specific implementation, if the data synchronization policy is a bidirectional synchronization policy, it indicates that the synchronization mode is that the two data centers synchronize with each other, and step S20 in this embodiment may include:

Acquiring current time information and generating a random number;

It should be noted that, the current time information may be a timestamp corresponding to the current time, and the ordered set may be a zset set in redis.

In practical use, if the data synchronization policy is a bidirectional synchronization policy, it means that bidirectional synchronization is performed between two data centers at this time, for example: assuming that the data center includes a and B, if the data synchronization policy is a bidirectional synchronization policy, the data may be synchronized from a to B or from B to a at this time.

The data synchronization is continuously performed during the process, in order to avoid data disorder, the time sequence accuracy of the data (that is, the sequence of the data before and after normal update) needs to be ensured, then the current time information can be acquired at this time, and since there may be a plurality of data changes at the same time, in order to ensure that the current time information is also distinguished, a random number can be generated while the current time information is acquired, and then a high-precision timestamp is constructed according to the current time information and the random number, wherein the high-precision timestamp can be constructed according to the current time information and the random number, for example, after the random number is spliced in the current time information: assuming that the current time information may be "a" and the random number is "336", the high-precision time stamp may be "a336" at this time.

In a specific implementation, the adding of the data increment message to the ordered set based on the high-precision timestamp may be adding the high-precision timestamp to the ordered set as a score, and then the consumer corresponding to the ordered set (such as the Process module in fig. 3 above) may sequentially extract the data increment message from the ordered set according to the set order of the ordered set, and sequentially broadcast and distribute the data increment message, and perform data synchronization on each data center, thereby implementing data synchronization between each data center, and ensuring timing accuracy in the data bidirectional synchronization Process of the data center.

For easy understanding, referring to fig. 3, fig. 6 is a schematic diagram of a bidirectional synchronization flow in this embodiment, as shown in fig. 6, an Agent may use a time command to obtain a timestamp (i.e. current time information) from a Redis cluster, generate a high-precision timestamp in combination with a local random number, then send a data increment message (or just an sql statement in the data increment message) to zset (ordered set) in the Redis for storage, and when storing, score (score) is made by using the timestamp, zset orders the stored data according to the score, a data table to which the operation data belongs may be distributed to each node of the Redis according to a hash algorithm, and then the Process starts to pull the data from the ordered set and distribute the data to each Agent node in the system by broadcasting for data synchronization. When the Process pulls the data, ZREVRANGEBYSCORE commands can be used to obtain the data corresponding to the latest timestamp from the ordered set.

In a specific implementation, if the data synchronization policy is a star synchronization policy, it means that data synchronization is performed between three or more data centers at this time, and step S20 in this embodiment may include:

If not, writing the distributed lock into the shared cache;

It should be noted that the shared cache may be a redis cluster as described above. If the data synchronization policy is a star synchronization policy, it means that data synchronization is performed between three or more data centers, for example: assuming that the data centers include A, B and C, the data may be synchronized from a to B and C, from B to a and C, and from C to a and B, and the problem of data synchronization collision between the data centers may be avoided by the distributed lock, so the distributed lock may be acquired first, and then whether the distributed lock exists in the shared cache may be detected.

It will be appreciated that if the distributed lock is not present in the shared cache, this indicates that the data is not synchronized at this time, so that a data synchronization process may be performed, where in order to avoid a conflict in synchronization, the distributed lock may be written into the shared cache, and then other synchronization processes may suspend synchronization of the data when the distributed lock in the shared cache is detected, so that after the distributed lock is successfully written into the shared cache, a data increment message may be broadcast and distributed, so that data is synchronized for each data center, and after synchronization is completed, in order to avoid an influence on subsequent synchronization, the distributed lock may be removed from the shared cache.

In a specific implementation, in order to ensure that the problem of row collision can be avoided, the distributed lock may acquire different fields according to different scenes to form, and then the step of acquiring the distributed lock in this embodiment may include:

Acquiring a data increment type corresponding to the data increment message;

acquiring field data corresponding to each lock construction field;

It should be noted that the data increment types may include multiple types of data modification, table structure modification, and server command setting, where the data modification may be further divided into insert, delete and update types, each different data increment type may correspond to a different lock construction field and a lock construction mode, and a corresponding relationship between the data increment type and the lock construction field and the lock construction mode may be preset by a manager of the data synchronization device according to actual needs.

For example: for insert, delete and other data increment types, the lock construction field can comprise a table name and a row ID, the lock construction mode can be set as a table name and a row ID, the table name is assumed to be User at this time, the row ID is 1000, the distributed lock is the User 1000 at this time, and if the database is provided with self-increment operation during insert, the self-increment sequence is required to be acquired firstly (secquence);

For the update data increment type, the lock construction field can comprise three types of table names, row IDs and fields, wherein the lock construction mode can be set as the table names of row IDs and fields, when the number of the fields is multiple, the values of the fields are spliced by underlining, the table names are assumed to be User, the row IDs are 1000, the fields have three corresponding values respectively are A, B and C, and at the moment, the distributed lock is the User 1000:A_B_C;

modifying the data increment type for the table structure, wherein the table name can be used as a distributed lock;

For set-up server commands (e.g., set-up session, global properties, etc.), the properties may be distributed locks.

In order to facilitate understanding, the description is now based on the above-mentioned fig. 3, and fig. 7 is a schematic diagram of a star-shaped synchronization flow in this embodiment, as shown in fig. 7, an Agent analyzes a scene to generate a distributed lock, requests redis to determine whether the distributed lock exists, if so, determines that acquiring the distributed lock fails, and indicates that a process is synchronizing the data, and at this time, a data strong consistency mode can be started, and spin waiting for acquiring the lock again;

If the data is not available, it means that no Process succeeds in acquiring the distributed lock in synchronizing the data of the line change, then the distributed lock can be written into the redis cluster, and the Process module is informed to start processing data synchronization, after the Process receives the notification, the data to be synchronized (namely the data increment message) can be acquired from the ecache cache for broadcasting and distributing, after the Agent receives the distributed data increment message, the Agent performs data synchronization according to the data increment message, submits the local transaction, returns an ACK after submitting the local transaction, and considers that the synchronization is completed after the nodes return the ACK, and the Process releases the local cache and the distributed lock.

If most nodes return ACK, the synchronization can be judged to be completed, and cas optimistic lock limiting synchronization data logic is adopted in the Process module.

In the process, the multiplexing technology of redis and a single-thread model are utilized, so that the high performance of data operation and the thread safety are ensured. The consistent hash algorithm is realized through the redis cluster, each table is mapped to the hash ring through hash calculation, and data accumulation can be prevented.

According to the embodiment, when the generation of the data increment message in any data middleware is detected, a data synchronization strategy among all data centers is obtained; and realizing data synchronization among the data centers according to the data increment message based on the data synchronization strategy. When the data increment message is detected to be generated in the data middleware, the data synchronization strategy is set according to the architecture identification result among the data centers, and then the data synchronization among the data centers is realized based on the data synchronization strategy and the data increment message, so that a plurality of different architectures can be supported, the architectures can be switched only by carrying out a small amount of processing, and the difficulty of architecture switching is reduced.

Referring to fig. 8, fig. 8 is a flowchart of a second embodiment of a data synchronization method according to the present invention.

Based on the above-mentioned first embodiment, the step S10 of the data synchronization method of this embodiment includes:

step S101: and when detecting that the data increment message is generated in any data middleware, acquiring the data center codes corresponding to the connected data centers.

It should be noted that the data center code may be an identification code that identifies the uniqueness of the data center.

In practical use, the data center code corresponding to each data center to which the data synchronization device is connected may be the data center code of each data center to which the data synchronization device is connected.

For example: as shown in fig. 3, the data centers are connected to the agents, and each Agent has a globally unique ID and a data center code of the data center connected thereto, so that the data center code can be read from the agents at this time, thereby obtaining the data center code corresponding to each connected data center.

Step S102: and aggregating the data center codes corresponding to the data centers to obtain a center code set.

In practical use, the data center codes corresponding to the data centers are aggregated to obtain a center code set, and the data center codes corresponding to the data centers are added to the same set, so as to obtain the center code set.

Step S103: and carrying out data deduplication on the center code set to obtain a deduplication code set.

It can be understood that the same data center may be connected with a plurality of agents in practical application, in order to accurately distinguish that a plurality of data centers are connected with the data synchronization device, data deduplication may be performed on the central code set, so that the data central codes in the set are not repeated, and then the central code set after deduplication is used as a deduplication code set.

Step S104: the number of codes in the set of deduplication codes is detected.

It should be noted that, detecting the number of codes in the set of deduplication codes may be counting the number of data center codes in the set of deduplication codes, thereby obtaining the number of codes.

Step S105: if the number of codes is larger than the preset number, judging that the data synchronization strategy among the data centers is a star-shaped synchronization strategy.

It should be noted that the preset number may be preset by a manager of the data synchronization device. For example: the data center is generally connected in a master-slave architecture or a dual-active architecture when two data centers are connected, and is generally connected in a multi-active architecture when more than two data centers are connected, so that the preset number can be set to be2.

It can be understood that if the number of codes is greater than the preset number, it means that the data centers connected by the data synchronization device are connected by a multi-activity architecture, and data synchronization is performed between three or more data centers at this time, so that it can be determined that the data synchronization policy between the data centers is a star-shaped synchronization policy.

If the number of codes is less than or equal to the preset number, it is further required to distinguish whether the connection architecture between the data centers is a master-slave architecture or a dual-active architecture, and after step S104 in this embodiment, the method may further include:

It should be noted that the master-slave configuration file may be a configuration file set in a data center cluster, such as a my.cnf configuration file. The master-slave configuration file at least comprises fields such as server-id, log-bin, binlog-do-db, duplicate-do-db, read_only and the like;

The server-id is used for specifying a unique value of the server node; log-bin is used to specify whether binary logging is enabled; the read_only is used for specifying whether read-only control is performed, and the master node is read-write permission, the slave node is read-only permission, 1 is read-only, and 0 is read-write; the duplicate-do-db is used to specify the database name that needs to be replicated.

Additional fields may also be included, if necessary: master-host is used to specify a host address or host name; the master-user is used for designating a host user name; the master-password is used for specifying a host password; master-port is used to designate a host port.

In actual use, after the master-slave configuration information of each data center is obtained, the master-slave directional relation among the data centers can be determined according to the master-slave configuration information, and if the master-slave directional relation is a unidirectional directional relation, one data center is designated as a host in a unidirectional way, and the data centers are connected through a master-slave architecture, so that the data synchronization strategy among the data centers can be judged to be a unidirectional synchronization strategy;

If the master-slave directional relation is a bidirectional directional relation, the two data centers are mutually designated as the host, and the data centers are connected through a dual-active framework, so that the data synchronization strategy among the data centers can be judged to be a bidirectional synchronization strategy.

For example: the Agent executes SHOW MASTER STATUS the command on the node of data center a, noting the return values, such as File and Position. The SHOW SLAVE STATUS commands are executed on the nodes of the data center B to check parameters such as master_host, master_user, master_log_file, read_master_log_pos, etc. If the values corresponding to these parameters match the master node of data center A, then the synchronization of data centers A to B is unidirectional.

Otherwise, execute SHOW MASTER STATUS command on the master node of data center B, note the File and Position values. SHOW SLAVE STATUS is executed on the slave node of the data center a to check parameters such as master_host, master_user, master_log_file, read_master_log_pos and the like. If the values corresponding to these parameters match the master node of data center B, then the synchronization of data center B to a is unidirectional.

If the data centers a to B and B to a are directional, the master-slave directional relationship between the two data centers is a bidirectional directional relationship, which means that a dual active architecture is adopted between the two data centers. On the contrary, if only one direction is synchronous, a main and standby architecture is adopted between the two.

Adopting a unidirectional synchronization strategy aiming at a main and standby architecture; the dual-active architecture adopts a bidirectional synchronization strategy; the multi-active architecture employs a star synchronization strategy.

In order to avoid the need of acquiring the synchronization policy each time, the Process caches policy information into the memory after determining the overall architecture, and meanwhile calls an Agent notification interface, and the Agent caches the policy information into the memory to prepare for subsequent synchronization data.

The embodiment obtains the data center codes corresponding to the connected data centers; aggregating the data center codes corresponding to the data centers to obtain a center code set; performing data deduplication on the center code set to obtain a deduplication code set; detecting the number of codes in the duplicate code set; if the number of codes is greater than the preset number, determining that the data synchronization strategy among the data centers is a star synchronization strategy. The number of the data centers which are specifically connected with the data synchronization equipment can be determined according to the data center codes, whether the data centers are connected in a multi-activity architecture or not is identified according to the number, and the corresponding data synchronization strategies are set according to the identification results, so that the proper data synchronization strategies can be selected according to the connection architecture among the data centers.

In addition, the embodiment of the invention also provides a storage medium, wherein the storage medium stores a data synchronization program, and the data synchronization program realizes the steps of the data synchronization method when being executed by a processor.

Referring to fig. 9, fig. 9 is a block diagram of a first embodiment of a data synchronization device according to the present invention.

As shown in fig. 9, a data synchronization apparatus according to an embodiment of the present invention includes:

the detection module 10 is configured to obtain a data synchronization policy between each data center when detecting that a data increment message is generated in any data middleware, where each data center is respectively provided with a data middleware, and when the data middleware changes in the data center, the data middleware generates a data increment message according to the changed data, and the data synchronization policy is set according to a structure identification result between each data center;

And the synchronization module 20 is used for realizing data synchronization among the data centers according to the data increment message based on the data synchronization strategy.

Further, the data synchronization policy includes a unidirectional synchronization policy;

the synchronization module 20 is further configured to generate a transaction message according to the data increment message; and adding the transaction message into a message queue so as to enable a consumer of the message queue to perform data synchronization on a data center corresponding to the consumer according to the transaction message.

Further, the data synchronization policy includes a bidirectional synchronization policy;

The synchronization module 20 is further configured to obtain current time information and generate a random number; constructing a high-precision time stamp according to the current time information and the random number; and adding the data increment message into an ordered set according to the high-precision time stamp as a sequencing basis, so that a consumer corresponding to the ordered set can sequentially broadcast and distribute the data increment message in the ordered set according to the set sequence of the ordered set, and perform data synchronization on each data center.

Further, the data synchronization policy includes a star synchronization policy;

The synchronization module 20 is further configured to acquire a distributed lock, and detect whether the distributed lock exists in the shared cache; if not, writing the distributed lock into the shared cache; after the writing is successful, broadcasting and distributing the data increment message so as to synchronize the data of each data center; after synchronization is complete, the distributed lock is removed from the shared cache.

Further, the synchronization module 20 is further configured to obtain a data increment type corresponding to the data increment message; determining a lock construction field and a lock construction mode according to the data increment type; acquiring field data corresponding to each lock construction field; and splicing the field data based on the lock construction mode to obtain the distributed lock.

Further, the detection module 10 is further configured to obtain a data center code corresponding to each connected data center, where the data center code is an identification code for identifying uniqueness of the data center; aggregating the data center codes corresponding to the data centers to obtain a center code set; performing data deduplication on the center code set to obtain a deduplication code set; detecting the code quantity in the duplicate code set; if the number of codes is larger than the preset number, judging that the data synchronization strategy among the data centers is a star-shaped synchronization strategy.

Further, the detection module 10 is further configured to parse the master-slave configuration file in each data center if the number of codes is less than or equal to a preset number, so as to obtain master-slave configuration information corresponding to each data center; determining master-slave directional relation among all data centers according to the master-slave configuration information; if the master-slave directional relation is a unidirectional directional relation, judging that the data synchronization strategy among the data centers is a unidirectional synchronization strategy; and if the master-slave directional relation is a bidirectional directional relation, judging that the data synchronization strategy among the data centers is a bidirectional synchronization strategy.

It should be understood that the foregoing is illustrative only and is not limiting, and that in specific applications, those skilled in the art may set the invention as desired, and the invention is not limited thereto.

It should be noted that the above-described working procedure is merely illustrative, and does not limit the scope of the present invention, and in practical application, a person skilled in the art may select part or all of them according to actual needs to achieve the purpose of the embodiment, which is not limited herein.

In addition, technical details not described in detail in this embodiment may refer to the data synchronization method provided in any embodiment of the present invention, which is not described herein.

Furthermore, it should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. Read Only Memory)/RAM, magnetic disk, optical disk) and including several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. A data synchronization method, characterized in that the data synchronization method comprises the steps of:

2. The data synchronization method of claim 1, wherein the data synchronization policy comprises a one-way synchronization policy;

generating a transaction message according to the data increment message;

3. The data synchronization method of claim 1, wherein the data synchronization policy comprises a bi-directional synchronization policy;

Acquiring current time information and generating a random number;

4. The data synchronization method of claim 1, wherein the data synchronization policy comprises a star synchronization policy;

If not, writing the distributed lock into the shared cache;

5. The data synchronization method of claim 4, wherein the step of acquiring a distributed lock comprises:

Acquiring a data increment type corresponding to the data increment message;

acquiring field data corresponding to each lock construction field;

6. The data synchronization method according to any one of claims 1 to 5, wherein the step of acquiring a data synchronization policy between data centers comprises:

detecting the code quantity in the duplicate code set;

7. The data synchronization method of claim 6, wherein the step of detecting the number of codes in the set of deduplication codes comprises:

8. A data synchronization device, characterized in that it comprises the following modules:

9. A data synchronization device, the data synchronization device comprising: a processor, a memory and a data synchronization program stored on the memory and executable on the processor, which when executed by the processor performs the steps of the data synchronization method according to any one of claims 1-7.

10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a data synchronization program, which when executed implements the steps of the data synchronization method according to any of claims 1-7.