CN120371875A - Data processing method, apparatus, electronic device and computer program product - Google Patents
Data processing method, apparatus, electronic device and computer program productInfo
- Publication number
- CN120371875A CN120371875A CN202510457906.4A CN202510457906A CN120371875A CN 120371875 A CN120371875 A CN 120371875A CN 202510457906 A CN202510457906 A CN 202510457906A CN 120371875 A CN120371875 A CN 120371875A
- Authority
- CN
- China
- Prior art keywords
- data
- dimension table
- incremental
- real
- dimension
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24568—Data stream processing; Continuous queries
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24552—Database cache management
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
- G06F16/275—Synchronous replication
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/542—Event management; Broadcasting; Multicasting; Notifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/546—Message passing systems or structures, e.g. queues
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the disclosure provides a data processing method, a data processing device, electronic equipment and a computer program product, and relates to the technical field of big data. The method comprises the steps of obtaining stream data to be processed, determining a service database table corresponding to the stream data to be processed, enabling the service database table to be used for generating an associated dimension table, subscribing incremental data broadcasting information according to data mapping information between the service database table and a message queue subject, determining service database table incremental data corresponding to the service database table in response to the received incremental data broadcasting information, synchronizing the service database table incremental data to a local cache of dimension table association logic, and executing dimension table association tasks based on the local cache of the dimension table association logic to obtain a real-time associated dimension table. The method and the device perform maintenance table association through data broadcasting and local caching, can achieve data millisecond level updating on the premise of guaranteeing task performance, achieve the purpose of service real-time data updating, and guarantee stability and consistency of maintenance table data.
Description
Technical Field
Embodiments of the present disclosure relate to the field of big data technology, and more particularly, to a data processing method, a data processing apparatus, an electronic device, and a computer program product.
Background
This section is intended to provide a background or context to the embodiments of the disclosure recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.
Streaming tasks, also known as real-time tasks or streaming processing tasks, are used to process computing tasks of unbounded datasets. An unbounded data set refers to a data set that has a beginning but not an end, which does not terminate and provide data at the time of generation, and has to continuously process an unbounded stream, that is, has to process the related event (event) immediately after acquisition. For an unbounded data stream it is generally not possible to wait for all data to arrive, because the input is unbounded and will not complete at any point in time. Processing unbounded data typically requires that events be acquired in a particular order (e.g., the order in which events occur) in order to be able to infer result integrity.
In the field of stream data processing, tasks often need to perform dimension table association to expand dimension information of related data, and subsequent processing logic is performed after the related data is widened. The dimension table data are generally stored in a service library, and the large data stream data processing task needs to acquire related information by reading the service library.
Disclosure of Invention
In the related stream processing task, the update of the associated dimension table can be realized by adding a layer of local cache in the dimension table association logic. However, the above scheme has the problem of inconsistent data caused by cache update and elimination. The data in the local cache and the database are inconsistent in the updating interval time period, and cannot be suitable for business scenes sensitive to the accuracy of the data.
Therefore, the present disclosure proposes an improved data processing method, so that on the premise of ensuring task performance, data millisecond level update can be realized, the purpose of service real-time data update is achieved, and stability and consistency of dimension table data are ensured.
In this context, embodiments of the present disclosure desirably provide a data processing method, a data processing apparatus, a computer readable storage medium, an electronic device, and a computer program product.
In a first aspect of the disclosed embodiments, a data processing method is provided, which includes obtaining stream data to be processed, determining a service database table corresponding to the stream data to be processed, wherein the service database table is used for generating an associated dimension table, subscribing an incremental data broadcast message according to data mapping information between the service database table and a message queue subject, determining service database table incremental data corresponding to the service database table in response to the received incremental data broadcast message, synchronizing the service database table incremental data to a local cache of dimension table association logic, and executing a dimension table association task based on the local cache of dimension table association logic to obtain a real-time associated dimension table.
In one embodiment of the disclosure, the service database table comprises a data stream table and a dimension table to be associated, and the determining of the service database table corresponding to the stream data to be processed comprises determining a service processing requirement of the stream data to be processed, determining a message queue theme corresponding to the stream data to be processed in a stream processing task according to the service processing requirement, and determining the data stream table and the dimension table to be associated from a service database according to the message queue theme.
In one embodiment of the disclosure, subscribing to the incremental data broadcast message according to the data mapping information between the service database table and the message queue topic includes obtaining a pre-constructed metadata database for storing the pre-written data mapping information, obtaining the data mapping information based on the metadata database, and subscribing to the incremental data broadcast message according to the data mapping information.
In one embodiment of the disclosure, before subscribing to the incremental data broadcast message according to the data mapping information, the method further comprises receiving a broadcast start operation for the service database table, generating a broadcast start configuration statement based on the broadcast start operation, and starting an incremental data broadcast function of the service database table according to the broadcast start configuration statement, wherein the incremental data broadcast function is used for supporting a message subscription operation of the incremental data broadcast message.
In one embodiment of the disclosure, synchronizing the business library table delta data to the local cache of the dimension table association logic comprises determining the business library table delta data, writing the business library table delta data to an external data storage component, writing the business library table delta data to a message queue subject corresponding to an incremental message queue based on the incremental data broadcast message, and synchronizing the business library table delta data to the local cache of the dimension table association logic through the incremental message queue.
In one embodiment of the disclosure, the local cache based on the dimension table association task performs the dimension table association task to obtain a real-time association dimension table, and the method comprises the steps of obtaining a message queue theme corresponding to incremental data of a service library table from a pre-constructed metadata library, starting an incremental data writing thread, synchronizing the incremental data of the service library table in the message queue theme to the local cache through the incremental data writing thread, updating the service database table in the local cache to obtain a real-time service database table, and performing the dimension table association task based on the real-time service database table to obtain the real-time association dimension table.
In one embodiment of the disclosure, the real-time service database table comprises a real-time data stream table and a real-time dimension table to be associated, and the dimension table associating task is executed based on the real-time service database table to obtain the real-time associating dimension table, wherein the real-time service database table comprises a data stream table field corresponding to the real-time data stream table and a dimension table field to be associated corresponding to the real-time dimension table to be associated, a data table connection statement is determined based on a service processing request, and the data stream table field and the dimension table field to be associated are associated according to the data table connection statement to obtain the real-time associating dimension table.
In one embodiment of the disclosure, the method further comprises the steps of obtaining monitoring index data corresponding to the real-time associated dimension table, wherein the monitoring index data comprises one or more of incremental data consumption quantity, data consumption delay time, update data quantity and data health degree, obtaining a preconfigured monitoring alarm condition, wherein the monitoring alarm condition comprises a monitoring index reference threshold value corresponding to each monitoring index data, determining a task processing state according to the monitoring index data and the monitoring alarm condition, and generating alarm prompt information based on the monitoring index data and displaying the alarm prompt information when the task processing state is an unsafe state.
In a second aspect of the disclosed embodiments, a data processing apparatus is provided, which includes a service table determining module configured to obtain to-be-processed flow data, determine a service database table corresponding to the to-be-processed flow data, where the service database table is configured to generate an association dimension table, a message subscribing module configured to subscribe to an incremental data broadcast message according to data mapping information between the service database table and a message queue subject, a data synchronizing module configured to determine service database table incremental data corresponding to the service database table in response to the received incremental data broadcast message, synchronize the service database table incremental data to a local cache of dimension table association logic, and a dimension table association module configured to perform a dimension table association task based on the local cache of dimension table association logic, to obtain a real-time association dimension table.
In one embodiment of the disclosure, the service database table comprises a data flow table and a dimension table to be associated, and the service table determining module comprises a service table determining unit, which is used for determining service processing requirements of the stream data to be processed, determining corresponding message queue subjects of the stream data to be processed in a stream processing task according to the service processing requirements, and determining the data flow table and the dimension table to be associated from a service database according to the message queue subjects.
In one embodiment of the disclosure, the message subscription module comprises a message subscription unit, wherein the message subscription unit is used for acquiring a pre-built metadata base, the metadata base is used for storing the pre-written data mapping information, acquiring the data mapping information based on the metadata base, and subscribing the incremental data broadcasting message according to the data mapping information.
In one embodiment of the disclosure, the data processing apparatus further includes a broadcast function turning-on module configured to receive a broadcast turning-on operation for the service database table, generate a broadcast turning-on configuration statement based on the broadcast turning-on operation, and turn on an incremental data broadcasting function of the service database table according to the broadcast turning-on configuration statement, where the incremental data broadcasting function is configured to support a message subscription operation of the incremental data broadcasting message.
In one embodiment of the disclosure, the data synchronization module comprises a data synchronization unit, a message queue theme, and a local cache, wherein the data synchronization unit is used for determining the business library table increment data, writing the business library table increment data into an external data storage component, writing the business library table increment data into the message queue theme corresponding to an increment message queue based on the increment data broadcast message, and synchronizing the business library table increment data to the local cache of the dimension table association logic through the increment message queue.
In one embodiment of the disclosure, the dimension table association module comprises a dimension table association unit, wherein the dimension table association unit is used for acquiring a message queue theme corresponding to business table increment data from a pre-constructed metadata base, starting an increment data writing thread, synchronizing the business table increment data in the message queue theme to the local cache through the increment data writing thread, updating a business database table in the local cache to obtain a real-time business database table, and executing dimension table association tasks based on the real-time business database table to obtain the real-time association dimension table.
In one embodiment of the disclosure, the real-time service database table comprises a real-time data stream table and a real-time dimension table to be associated, and the dimension table association unit comprises a dimension table association subunit, wherein the dimension table association subunit is used for determining a data stream table field corresponding to the real-time data stream table and a dimension table field to be associated corresponding to the real-time dimension table to be associated, determining a data table connection statement based on a service processing request, and carrying out association processing on the data stream table field and the dimension table field to be associated according to the data table connection statement to obtain the real-time association dimension table.
In one embodiment of the disclosure, the data processing device further comprises a monitoring alarm module, wherein the monitoring alarm module is used for acquiring monitoring index data corresponding to the real-time associated dimension table, the monitoring index data comprise one or more of increment data consumption quantity, data consumption delay time, update data quantity and data health degree, acquiring a preconfigured monitoring alarm condition, the monitoring alarm condition comprises a monitoring index reference threshold value corresponding to each monitoring index data, determining a task processing state according to the monitoring index data and the monitoring alarm condition, and generating alarm prompt information based on the monitoring index data and displaying the alarm prompt information when the task processing state is an unsafe state.
In a third aspect of the disclosed embodiments, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, implements a data processing method as described above.
In a fourth aspect of the embodiments of the present disclosure, there is provided an electronic device comprising a processor, and a memory having stored thereon computer readable instructions that when executed by the processor implement the data processing method described above.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the data processing method described above.
According to the technical scheme of the embodiment of the disclosure, on one hand, relevant data change is monitored in real time through subscription broadcast information, change increment data is synchronized to a local cache, and on the premise of ensuring task performance, data millisecond level update can be realized, so that the purpose of service real-time data update is achieved. On the other hand, the stability and consistency of the dimension table data can be ensured by carrying out dimension table association through data broadcasting and local caching.
Drawings
The above, as well as additional purposes, features, and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description when read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings, in which:
FIG. 1 is a flow diagram of a dimension table association process in a correlation scheme;
FIG. 2 schematically illustrates a schematic block diagram of a system architecture of an exemplary application scenario according to some embodiments of the present disclosure;
FIG. 3 schematically illustrates a flow diagram of a data processing method according to some embodiments of the present disclosure;
FIG. 4 schematically illustrates a logical implementation diagram of a broadcast-based real-time dimension table association scheme in accordance with some embodiments of the present disclosure;
FIG. 5 schematically illustrates a processing logic diagram of an incremental subscription function in a dimension table association scheme in accordance with some embodiments of the present disclosure;
FIG. 6 schematically illustrates a processing logic diagram of real-time dimension table association in a dimension table association scheme in accordance with some embodiments of the present disclosure;
FIG. 7 schematically illustrates a schematic block diagram of a data processing apparatus according to some embodiments of the present disclosure;
FIG. 8 schematically illustrates a schematic diagram of a storage medium according to an example embodiment of the present disclosure;
fig. 9 schematically shows a block diagram of an electronic device according to an example embodiment of the invention.
In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
Detailed Description
The principles and spirit of the present disclosure will be described below with reference to several exemplary embodiments. It should be understood that these embodiments are presented merely to enable one skilled in the art to better understand and practice the present disclosure and are not intended to limit the scope of the present disclosure in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Those skilled in the art will appreciate that embodiments of the present disclosure may be implemented as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of entirely hardware, entirely software (including firmware, resident software, micro-code, etc.) or in a combination of hardware and software.
According to an embodiment of the present disclosure, a data processing method, a data processing apparatus, a computer-readable storage medium, an electronic device, and a computer program product are provided.
In this context, it is to be understood that the term referred to, e.g., an unbounded data set, refers to a data set having a beginning but not an end, which does not terminate and provide data at the time of generation, and an unbounded stream must be continuously processed, that is, an event must be processed immediately after acquisition. For an unbounded data stream we cannot wait for all data to arrive because the input is unbounded and will not complete at any point in time. Processing unbounded data typically requires that events be acquired in a particular order (e.g., the order in which events occur) in order to be able to infer result integrity.
Streaming tasks, also known as real-time tasks, streaming processing tasks, are computing tasks for processing unbounded data sets. Database master-slave replication is an indispensable technique, the master being responsible for receiving all write operations and the slave being responsible for replicating these changes in real time. When the master library executes an INSERT (INSERT), UPDATE (UPDATE) or DELETE (DELETE) statement, it will generate a transaction log (binlog) that appears as a detailed operation guide telling the slave library how to UPDATE the data. Through reasonable configuration and maintenance, data consistency can be ensured while utilizing availability and load balancing of the slave library promotion system.
Dimension table association, real-time query of dimension table refers to that a user directly accesses an external database in a link operator, such as using database MySQL to perform association, in a synchronous manner, and data guarantee is up-to-date. Flink is an open source stream processing framework whose core is a distributed stream data stream engine written in the programming language Java and Scala. The flank executes any stream data program in a data parallel and pipeline manner, and the pipeline runtime system of the flank can execute batch processing and stream processing programs.
The remote dictionary service (Remote Dictionary Server, redis) database is a network-enabled, memory-based, distributed, optionally persistent key-value pair storage database. HBase is an open-source non-relational distributed database (NOSQL) running on a hadoop distributed file system (Hadoop Distributed FILE SYSTEM, HDFS) that can provide very high fault tolerance for sparse files. TiDB is a distributed relational database that supports Hybrid Transaction and Analytics Processing (HTAP) workloads that are compatible with MySQL and can provide horizontal extensibility, strong consistency, and high availability.
Furthermore, any number of elements in the figures is for illustration and not limitation, and any naming is used for distinction only and not for any limiting sense.
The principles and spirit of the present disclosure are explained in detail below with reference to several representative embodiments thereof.
Summary of The Invention
In the field of stream data processing, tasks often need to perform dimension table association to expand dimension information of related data, and subsequent processing logic is performed after the related data is widened. The dimension table data are generally stored in a service library, and the large data stream data processing task needs to acquire related information by reading the service library.
For example, the advertisement user behavior log includes user behavior data on the end such as user Identification (ID), resource bit, schedule ID, number of clicks, number of exposures, etc. When the related behavior statistics is performed in the real-time processing link, more dimension information of the user and the schedule needs to be acquired, and a dimension table association technology is needed to be used at this time, and the user information table and the schedule information table in the database are associated through the user ID and the schedule ID, and then subsequent data processing is performed.
In a related scheme, the dimension table association processing is performed by directly associating and querying a service library. If in the stream processing task, directly querying the service library, and acquiring other dimension information through the primary keys such as user ID, schedule ID and the like. The slave library is generally queried, if the master library is queried, the online service may be affected. Referring to fig. 1, fig. 1 shows a schematic flow diagram of a dimension table association process in a related scheme. In the scheme, each record is processed in a stream processing task, a service slave library is directly inquired, and corresponding dimension table information is acquired.
However, the scheme has the following problems that 1) the performance of the database slave library is influenced by the query of the slave library, so that the slave library is abnormal or delayed in synchronization, 2) the resource cost of the database is high, and 3) the concurrent query capability of the database is limited, so that the query requirement in a large-flow scene can not be met.
In order to solve the problem of the first technical solution, data in the service database can be generally synchronized to a third party storage, such as HBase/Redis, etc., then the data in the storage is queried in a big data task, the pressure is transferred from the service database to the third party storage, specifically, as shown in the second technical solution in fig. 1, the service database data is synchronized to the third party storage in a full and incremental mode, and the third party storage is queried in a vitamin table association mode. A layer of storage, such as Redis/HBase/TiDB, is erected between a big data task and a service library so as to achieve the purposes of coupling and improving performance, data in the service database is synchronized into storage with stronger concurrent query capability, such as Redis/HBase/TiDB, through a third party synchronization tool, and then the third party storage is queried in a streaming task so as to obtain corresponding data.
By adopting the second technical scheme, when the query level per second is improved, the following two problems are brought about, namely 1) the pressure of the external storage engine is increased, the query delay is increased, and 2) the query delay is increased, so that the query performance requirement cannot be met. Based on the problem of the second technical scheme, a layer of local cache can be added in the association logic of the dimension table to reduce the query pressure and improve the query performance, the result of the query is cached in the local memory, the strategy of LRU and the like is used for carrying out elimination and updating, and the core logic is shown as a third technology in fig. 1.
The third technical scheme has a larger problem of timing problem of cache update and elimination and data inconsistency caused by the timing problem. The data in the local cache and the database are inconsistent in the updating interval time period, and the scheme cannot be used in a business scenario sensitive to the data accuracy.
Based on the above, the basic idea of the disclosure is to obtain stream data to be processed, determine a service database table corresponding to the stream data to be processed, wherein the service database table is used for generating an associated dimension table, subscribe to an incremental data broadcast message according to data mapping information between the service database table and a message queue subject, determine service database table incremental data corresponding to the service database table in response to the received incremental data broadcast message, synchronize the service database table incremental data to a local cache of dimension table association logic, and execute a dimension table association task based on the local cache of dimension table association logic to obtain a real-time associated dimension table. The method and the device perform maintenance table association through data broadcasting and local caching, can achieve data millisecond level updating on the premise of guaranteeing task performance, achieve the purpose of service real-time data updating, and guarantee stability and consistency of maintenance table data.
Having described the basic principles of the present disclosure, various non-limiting embodiments of the present disclosure are specifically described below.
Application scene overview
Referring first to fig. 2, fig. 2 is a schematic block diagram illustrating a system architecture of an exemplary application scenario to which a data processing method and apparatus of an embodiment of the present disclosure may be applied.
As shown in fig. 2, system architecture 200 may include one or more of terminal devices 210, a network, and a server 220. The network is used to provide a medium for communication links between terminal devices 210 and server 220. The network may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others. The terminal device 210 may be a variety of electronic devices with a display screen including, but not limited to, a desktop computer, a portable computer, a smart phone, a tablet computer, and the like. It should be understood that the number of terminal devices, networks and servers in fig. 2 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, the server 220 may be a server cluster formed by a plurality of servers.
The data processing method provided by the embodiments of the present disclosure is generally performed by the server 220, and accordingly, the data processing apparatus is generally disposed in the server 220. However, it will be readily understood by those skilled in the art that the data processing method provided in the embodiment of the present disclosure may be performed by the terminal device 210, and accordingly, the data processing apparatus may also be provided in the terminal device 210, which is not specifically limited in the present exemplary embodiment. For example, in an exemplary embodiment, the user may perform a service operation through the terminal device 210, upload data generated by the service operation to the server 220, and, because there are a large number of users continuously performing the service operation through the terminal device, the generated service data is transferred as stream data to the stream processing task, and the server performs real-time dimension table association processing on the relevant service database table according to the real-time service requirement through the data processing method provided by the embodiment of the present disclosure, so as to obtain a real-time associated dimension table, and return the real-time associated dimension table to the terminal device 210 so as to enable the terminal device 210 to display relevant data in the real-time associated dimension table to the user.
It should be understood that the application scenario illustrated in fig. 2 is only one example in which embodiments of the present disclosure may be implemented. The application scope of the embodiments of the present disclosure is not limited by any aspect of the application scenario.
Exemplary method
A data processing method according to an exemplary embodiment of the present disclosure is described below with reference to fig. 3 in conjunction with the application scenario of fig. 2. It should be noted that the above application scenario is only shown for the convenience of understanding the spirit and principles of the present disclosure, and the embodiments of the present disclosure are not limited in any way in this respect. Rather, embodiments of the present disclosure may be applied to any scenario where applicable.
The present disclosure first provides a data processing method, where the method execution body may be a terminal device or a server, and this disclosure is not limited in particular, and the method executed by the server is described as an example in this exemplary embodiment.
Referring to fig. 3, the data processing method may include the following steps S310 to S340:
step S310, obtaining stream data to be processed, determining a service database table corresponding to the stream data to be processed, wherein the service database table is used for generating an association dimension table;
Step S320, subscribing the incremental data broadcasting message according to the data mapping information between the business database table and the message queue subject;
Step S330, in response to the received incremental data broadcast message, determining the incremental data of the service database table corresponding to the service database table, and synchronizing the incremental data of the service database table into the local cache of the dimension table association logic;
And step S340, executing the dimension table association task based on the local cache of the dimension table association logic to obtain the real-time association dimension table.
According to the data processing method provided by the example embodiment, on one hand, relevant data changes are monitored in real time through subscription broadcast messages, change incremental data are synchronized to a local cache, and on the premise that task performance is guaranteed, data millisecond level updating can be achieved, and the purpose of service real-time data updating is achieved. On the other hand, the stability and consistency of the dimension table data can be ensured by carrying out dimension table association through data broadcasting and local caching.
Next, the above steps of the present exemplary embodiment will be described in more detail.
In step S310, the flow data to be processed is obtained, and a service database table corresponding to the flow data to be processed is determined, where the service database table is used to generate an association dimension table.
In one embodiment of the present disclosure, the stream data to be processed may be data to be processed in a stream processing task. The service database table may be a data table in the service database relating to stream data to be processed. The associated dimension table can be a data table obtained by carrying out dimension table association processing on table information of relevant dimensions in data tables in different service databases.
Referring to fig. 4, fig. 4 schematically illustrates a logical implementation diagram of a broadcast-based real-time dimension table association scheme in accordance with some embodiments of the present disclosure. The log receiving service in fig. 4 continuously receives the data stream to be processed and sends the data stream to be processed to the stream processing task through the message queue. After receiving the stream data to be processed, the stream processing task can determine a service database table corresponding to the stream data to be processed according to real-time service requirements.
When a producer of data transmits stream data, information about a service module, a table name, and the like to which the data belongs may be added to a message header or a specific field. By analyzing the metadata, a service database table corresponding to the stream data to be processed can be directly obtained in combination with the real-time service requirement.
In step S320, the incremental data broadcast message is subscribed to according to the data mapping information between the service database table and the message queue topic.
In one embodiment of the present disclosure, the message queue Topic (Topic) may be a logical message classification mechanism in streaming task processing. The message queue Topic may be an identification in the message queue for distinguishing between different types of messages. The data mapping information may be a mapping relationship between the traffic database table and the message queue topic. The incremental data broadcast message may be a broadcast message transmitted as a result of the service data table generating the incremental data.
The stream processing task is performed based on a plurality of service databases, and one service database can contain a plurality of data tables. Different business database tables can send incremental data broadcasts to corresponding message queue topics, and stream processing tasks can subscribe to the incremental data broadcast messages through the incremental message queues. Message queue topics are just like a class label, classifying messages of the same nature or origin. For example, in an e-commerce system, there may be different topics such as "order creation", "merchandise inventory change", "user login" and the like, which are respectively used to process messages generated in different business scenarios.
Therefore, in order to make the stream processing task subscribe to the incremental data broadcast messages of different service database tables, in the data synchronization processing logic, the data mapping relation between the service database tables and the message queue subjects is synchronized to the metadata database in advance, and the stream processing task can start the thread to subscribe to the incremental data broadcast message corresponding to the relevant service database tables through the data mapping information in the metadata database.
In step S330, in response to the received incremental data broadcast message, the service database table incremental data corresponding to the service database table is determined, and the service database table incremental data is synchronized into the local cache of the dimension table association logic.
In one embodiment of the present disclosure, the business library table delta data may be delta data generated by a data table in a business database. The dimension table association logic may be processing logic for performing dimension table association operations. The local cache of the dimension table association logic may be a portion of space divided by a physical memory local to the dimension table association logic for caching real-time data.
After receiving the incremental data broadcast message, the stream processing task can determine the incremental data of the service database table corresponding to the service data table, and synchronize the incremental data of the service database table to the local cache of the dimension table association logic under the stream processing task, so as to update the service database table in the local cache based on the incremental data of the service database table, and synchronize the data of the service data table in the local cache of the dimension table association logic with the data of the service database.
If the dimension table association logic needs to open the local cache, a configuration structured query language (Structured Query Language, SQL) statement needs to be added, for example, the SQL statement may be configured to SET ad_schedule_detail.
In step S340, the dimension table association task is performed based on the local cache of the dimension table association logic, and the real-time associated dimension table is obtained.
In one embodiment of the present disclosure, the dimension table association task may be an operation task that performs association processing on data tables in different service databases. The real-time associated dimension table may be a real-time dimension table obtained by performing a dimension table association task.
Because the local cache of the dimension table association logic is synchronous with the data of the data table in the service database, the dimension table association task is executed in the local cache based on the dimension table association logic, and the dimension table association operation is executed by adopting the real-time data synchronous with the service database, so that the real-time association dimension table can be obtained.
The present disclosure designs a set of real-time dimension table association schemes based on broadcasting based on business scene requirements. In order to achieve the purpose of synchronizing data to a local cache in real time, the scheme pushes data which are changed in real time to an incremental message queue in a broadcasting mode while synchronizing the data to an external storage component, and the information of the incremental message queue is subscribed in a dimension table association module of a stream processing task, so that relevant data change is monitored in real time, and relevant changed data are updated to the local cache. Through the mode, the data millisecond level is updated, and the purpose of updating the service real-time data is achieved.
In one embodiment of the disclosure, for step S310, determining a service database table corresponding to the to-be-processed flow data includes determining a service processing requirement of the to-be-processed flow data, determining a message queue topic corresponding to the to-be-processed flow data in a flow processing task according to the service processing requirement, and determining a data flow table and a to-be-associated dimension table from the service database according to the message queue topic.
The service processing requirement may be a service requirement determined according to a specific application scenario of the stream processing task, for example, a service data table in the stream processing task that needs to be associated with the stream processing task may be explicitly determined according to the service processing requirement.
After receiving the stream data to be processed, the stream processing task can determine the service processing requirement of the stream data to be processed. For example, taking the business processing requirement of advertisement user behavior statistics as an example, the advertisement user behavior log contains user ID, resource bit, schedule ID, click times, exposure times and other end user behavior data. When the related behavior statistics is carried out in the real-time processing link, more dimension information of the user and the schedule is required to be acquired, and the dimension table association technology is required to be used at this time, and the user information table and the schedule information table in the database are associated through the user ID and the schedule ID, and then the subsequent data processing is carried out. Referring to fig. 4, a service database in an advertisement user behavior statistics scenario may include a user information base and a scheduling information base, which may each include one or more data tables.
Because the stream data to be processed received by the log receiving service is transmitted to the stream processing task through the message queue, the message queue theme corresponding to the stream data to be processed in the stream processing task can be determined according to the service processing requirement, for example, the message queue theme can comprise advertisement scheduling and the like. The flow processing task can query the corresponding data flow table and the dimension table to be associated from the service database according to the message queue subject corresponding to the received flow data to be processed.
The data flow table may be a table structure based on flow data for storing and processing data flowing in real time, and it may be regarded as a dynamic table in which data is updated and changed continuously over time. The data flow table has the characteristics of data real-time performance, sequence performance, unbounded performance and the like. For example, a data flow table in an "advertisement scheduling" scenario may include, but is not limited to, a user behavior data table.
An associative dimension list is a relatively static data list for storing dimension information related to stream data. It typically contains some attributes and metadata describing the characteristics of the stream data, which information can be used to enrich the stream data for more in-depth analysis and processing.
The main purpose of the associative dimension list is to provide additional context information for the stream data. By associating the data in the flow table with the association dimension table, more information can be obtained to better understand and process the flow data. For example, when analyzing order stream data, detailed information of commodities involved in each order, such as brands, specifications and the like of the commodities, can be known through the related commodity information dimension table, which is very helpful for carrying out accurate sales analysis and marketing strategy establishment.
The dimension table to be associated in the present disclosure may be an associated dimension table to be associated with a data flow table. For example, the dimension tables to be associated in an "advertisement scheduling" scenario may include, but are not limited to, a schedule information table. And determining a data flow table corresponding to the flow data to be processed and a dimension table to be associated, and performing dimension table association operation based on the data table.
In one embodiment of the present disclosure, for step S320, subscribing to the incremental data broadcast message according to the data mapping information between the service database table and the message queue topic includes obtaining a pre-built metadata database for storing pre-written data mapping information, obtaining the data mapping information based on the metadata database, subscribing to the incremental data broadcast message according to the data mapping information.
Wherein the metadata base is a database dedicated to storing and managing metadata, for example, the present disclosure may store data mapping information into the metadata base.
In order to solve the problem of real-time data in the dimension table association process, the present disclosure proposes to synchronize the data table in the service database to the dimension table association logic in real time by broadcasting. Referring to fig. 5, fig. 5 schematically illustrates a processing logic diagram of an incremental subscription function in a dimension table association scheme according to some embodiments of the present disclosure. When realizing data synchronization based on broadcasting, the data mapping relation between the business database table and the message queue theme can be written into the metadata base in advance, and external inquiry subscription is provided for use, for example, the dimension table association logic in the stream processing task can acquire the data mapping information through the metadata base.
And the dimension table association logic starts the thread to subscribe the incremental data broadcast message corresponding to the business database table according to the data mapping information obtained from the metadata database, and obtains the incremental data in the incremental message queue so as to update the incremental data to the local cache in real time. The incremental data broadcast message is subscribed through the data mapping information, so that the dimension table association logic can acquire the incremental data of the service data table in real time, and the data synchronization is realized.
In one embodiment of the present disclosure, before subscribing to an incremental data broadcast message according to data mapping information, a broadcast start operation for a service database table is received, a broadcast start configuration statement is generated based on the broadcast start operation, and an incremental data broadcast function of the service database table is started according to the broadcast start configuration statement, the incremental data broadcast function being used to support a message subscription operation for the incremental data broadcast message.
The broadcast start operation may be an operation of starting a data table in a certain service database to perform a message broadcast function. The broadcast start configuration statement may be a database configuration statement for controlling a broadcast start operation. The incremental data broadcasting function may be a function of broadcasting an incremental data message by a data table in a certain service database. The message subscription operation may be an operation in which the data consumer subscribes to the incremental data broadcast message.
In order to solve the problem of data real-time by broadcasting, the present disclosure provides an incremental data broadcasting function of a service database table. For example, for a data table in a business database, a user may perform a broadcast start operation with respect to the business database table, and generate a broadcast start configuration statement for the business database table based on the broadcast start operation. And after SQL sentences for configuring broadcast starting are added in the service database table, the incremental data broadcasting function of the service database table can be started.
In addition, a configuration SQL statement, set_schedule_detail.connector.binder.list=true, can be added in the stream processing task to start the function in the stream processing task, and support the dimension table association logic in the stream processing task to carry out the message subscription operation of the incremental data broadcast message. Meanwhile, the broadcasting function is started in the incremental data synchronization module in fig. 4, that is, the real-time dimension table association capability of second-level delay can be used to solve the data synchronization problem of the service database and dimension table association logic, the user can start the broadcasting message subscription function through simple parameter configuration, and the broadcasting subscription starting function has usability.
In one embodiment of the present disclosure, for step S330, synchronizing business library table delta data into the local cache of the dimension table association logic includes determining business library table delta data, writing the business library table delta data to an external data storage component, writing the business library table delta data to a message queue subject corresponding to an incremental message queue based on an incremental data broadcast message, and synchronizing the business library table delta data to the local cache of the dimension table association logic through the incremental message queue.
The external data storage component may be a storage component external to the stream processing task and the traffic database, among other things. The delta message queue may be an asynchronous communication container for storing delta data, and the delta message queue of the present disclosure may be used to store delta data that needs to be transferred between the association dimension table logic and the business database.
When the data table in the service database is updated, the updated data in the data table can be used as incremental data of the service database table. After the business library table delta data is determined, it is written to an external data storage component, which may include, for example, but is not limited to, a database such as Redis, HBase, tiDB.
For the incremental data of the service database table, if the user starts the broadcasting capability of the appointed service database table, the incremental subscription function is responsible for writing the incremental data of the database table into an incremental message queue for timely consumption by the dimension table association logic. With continued reference to fig. 4 and 5, the business library table incremental data is written into a message queue topic corresponding to the incremental message queue, and if the dimension table association logic subscribes to the incremental data broadcast message of the business database table, the incremental message queue may synchronize the business library table incremental data to the local cache of the dimension table association logic. The incremental message queue writes the incremental data of the service library table into the local cache of the dimension table association logic based on the incremental data broadcast message, so that the data synchronization between the dimension table association logic and the service database can be realized.
It will be readily appreciated by those skilled in the art that the present disclosure includes message queues that pass stream data to be processed in a log receiving service to stream processing tasks, and also includes incremental message queues in an incremental subscription function, which may be two different message queues, being containers for passing messages or data between different data producers and data consumers for asynchronous communication.
In one embodiment of the disclosure, for step S340, performing a dimension table association task based on a local cache of the dimension table association task to obtain a real-time association dimension table, including obtaining a message queue subject corresponding to incremental data of a service library table from a pre-constructed metadata library, starting an incremental data writing thread, synchronizing the incremental data of the service library table in the message queue subject to the local cache through the incremental data writing thread, updating the service database table in the local cache to obtain a real-time service database table, and performing the dimension table association task based on the real-time service database table to obtain a real-time association dimension table.
The incremental data writing thread may be a thread for executing an incremental data writing operation, and is used for writing the incremental data of the business base table of the relevant message queue subject in the incremental message queue into the local cache of the dimension table association logic. The real-time service database table may be a service data table after the real-time data update process.
Referring to fig. 6, fig. 6 schematically illustrates a processing logic diagram of real-time dimension table association in a dimension table association scheme according to some embodiments of the present disclosure. Referring to fig. 6, if a user starts a data monitoring function, the dimension table association logic may start a core function of acquiring a message queue theme corresponding to the incremental data of the designated service library table from a pre-constructed metadata database according to a data mapping relationship between the service database table and the message queue theme, and starting a background thread, such as an incremental data writing thread, to consume the incremental data of the service library table in the designated message queue theme in real time through the incremental data writing thread, and to process and write the relevant data into a local cache, because new data mapping information is pre-stored in the metadata database.
Writing the processed incremental data of the service database table into a local cache, so that the data of the appointed service database table in the local cache can be updated to obtain a real-time service database table, and then executing a dimension table association task based on the real-time service database table to obtain a real-time association dimension table, thereby ensuring the stability and consistency of dimension table data.
In one embodiment of the disclosure, a dimension table association task is executed based on a real-time service database table to obtain a real-time association dimension table, and the real-time association dimension table is obtained by determining a data stream table field corresponding to the real-time data stream table and a dimension table field to be associated corresponding to the real-time dimension table to be associated, determining a data table connection statement based on a service processing request, and carrying out association processing on the data stream table field and the dimension table field to be associated according to the data table connection statement.
The real-time data flow table may be a flow table after the real-time data update process. The real-time dimension table to be associated can be the dimension table to be associated after the real-time data updating process.
The real-time service database table may include a real-time data flow table and a real-time dimension table to be associated, and the data flow table field corresponding to the real-time data flow table and the dimension table field to be associated corresponding to the real-time dimension table to be associated are respectively determined.
Taking the advertisement user behavior statistics of "advertisement schedule" as an example, the data flow table may be a user behavior data table, and the definition of the data flow table (such as the user behavior data table) is as follows:
The dimension table to be associated may be a schedule information table, and details of the dimension table to be associated (such as the schedule information table) are defined as follows:
Based on the definition of the data flow table and the dimension table to be associated, a data table connection statement for performing association operation on the two data tables can be determined based on the service processing request, and the method specifically comprises the following steps:
Dimension table association real-time sql:
LEFT JOIN catalog.db.ad_schedule_detail FOR SYSTEM_TIME AS OF t1.proctime AS t2
ON t1.schedule_id=t2.schedule_id;
in the above way, a Kafka stream table ad_user_action and a MySQL dimension table ad_schedule_detail are defined, and then the stream table association dimension table is implemented through the LEFT JOIN syntax. Kafka is an open source streaming platform that aims to provide a unified, high throughput, low latency platform for processing real-time data.
And the data table connection statement carries out association processing on the data flow table field and the dimension table field to be associated to obtain a real-time association dimension table after management processing on the two data tables. The dimension table association operation is carried out based on the real-time service database table, so that the incremental data can be used in the dimension table association in time, the accuracy of the data is ensured, and the end-to-end data delay within seconds can be realized.
In one embodiment of the disclosure, monitoring index data corresponding to a real-time associated dimension table is acquired, the monitoring index data comprises one or more of incremental data consumption quantity, data consumption delay time, updated data quantity and data health degree, a preconfigured monitoring alarm condition is acquired, the monitoring alarm condition comprises a monitoring index reference threshold value corresponding to each monitoring index data, a task processing state is determined according to the monitoring index data and the monitoring alarm condition, and alarm prompt information is generated based on the monitoring index data and displayed when the task processing state is an unsafe state.
The monitoring index data may be specific data for measuring and reflecting task execution states and performances of the stream processing tasks. The monitoring alarm condition may be a decision condition that triggers the generation of an alarm alert message. The monitor indicator reference threshold may be a reference threshold for comparison with a specific value of the monitor indicator data. The task processing state may be a state of health that reflects the streaming task. The alarm prompt information can be information automatically generated and displayed by a system when abnormal conditions occur in the stream processing task, and is used for prompting the information that the stream processing task needs to pay attention to and take corresponding measures in the executing process.
In the whole data processing scheme, when the incremental subscription function writes incremental data of the business library table into the incremental message queue, the relevant monitoring index data of the incremental data can also be written into the monitoring alarm system, and the monitoring index data related to the incremental data, reported by the incremental subscription function, can include, but is not limited to, updated data quantity, data health degree and the like. The amount of update data, i.e., the number of incremental data stripes, may be a specific amount of incremental data. The data health may be used to reflect an indicator of the health status of the relevant data in the stream processing task.
When the incremental data of the service library table is written into the local cache by the dimension table association logic, the relevant monitoring data of the incremental data can be reported to the monitoring alarm system, and the monitoring index data related to the incremental data, which is reported by the dimension table association logic, can include, but is not limited to, the consumption quantity of the incremental data, the consumption delay time of the data and the like. The incremental data consumption amount may be incremental data that has been consumed by a data consumer. The data consumption delay time may be a delay time resulting from the consumption of incremental data by the data consumer.
The monitoring alarm system carries out full-link monitoring alarm according to the received monitoring index data so as to ensure accurate and stable operation of the data and tasks, and timely senses and informs relevant personnel to process when related problems occur. The monitoring alarm condition can be preconfigured through the monitoring alarm system, and comprises a monitoring index reference threshold value corresponding to each monitoring index data, so that whether the running state of each part is normal or not is judged according to the monitoring index reference threshold value.
For example, monitoring alarm conditions may include, but are not limited to, whether the health of the delta subscription module and the dimension table association module is operating properly, whether the dimension table association module consumes delay of threads in real time, whether the matching of delta data writing and consumption levels are consistent, ensuring data accuracy, delay of data from production to end use end to end, etc.
And if the task processing state is detected to be unsafe, generating alarm prompt information based on the comparison result of the monitoring index data and the monitoring alarm condition, and sending the alarm prompt information to related personnel for timely processing by the related personnel. By being provided with a perfect monitoring alarm mechanism, the incremental data is ensured to be effectively processed in time, and the stability and the operability of the whole data link are ensured.
In summary, the data processing method disclosed by the invention obtains stream data to be processed, determines a service database table corresponding to the stream data to be processed, wherein the service database table is used for generating an associated dimension table, subscribes to incremental data broadcast messages according to data mapping information between the service database table and a message queue subject, determines service database table incremental data corresponding to the service database table in response to the received incremental data broadcast messages, synchronizes the service database table incremental data to a local cache of dimension table association logic, and executes dimension table association tasks based on the local cache of the dimension table association logic to obtain a real-time associated dimension table. On the one hand, the relevant data change is monitored in real time through subscribing the broadcast message, and the change incremental data is synchronized to the local cache, so that the data millisecond level update can be realized on the premise of ensuring the task performance, the end-to-end delay of the data to the second level is ensured, and the second level dimension table data update capability is achieved. On the other hand, the stability and consistency of the dimension table data can be ensured by carrying out dimension table association through data broadcasting and local caching. In yet another aspect, a user may turn on a broadcast message subscription function through a simple parameter configuration, the broadcast subscription on function having ease of use.
Exemplary apparatus
Having described the method of exemplary embodiments of the present disclosure, next, a data processing apparatus of an exemplary embodiment of the present disclosure will be described with reference to fig. 7.
In fig. 7, the data processing apparatus 700 may include a service table determination module 710, a message subscription module 720, a data synchronization module 730, and a dimension table association module 740.
The service table determining module 710 is configured to obtain the to-be-processed stream data, determine a service database table corresponding to the to-be-processed stream data, wherein the service database table is used for generating an association dimension table, the message subscribing module 720 is configured to subscribe to an incremental data broadcast message according to data mapping information between the service database table and a message queue subject, the data synchronizing module 730 is configured to determine service database table incremental data corresponding to the service database table in response to the received incremental data broadcast message, synchronize the service database table incremental data to a local cache of dimension table association logic, and the dimension table association module 740 is configured to execute a dimension table association task based on the local cache of dimension table association logic, so as to obtain a real-time association dimension table.
In one embodiment of the present disclosure, the service database table includes a data flow table and a dimension table to be associated, and the service table determining module 710 includes a service table determining unit, configured to determine a service processing requirement of the flow data to be processed, determine a message queue topic corresponding to the flow data to be processed in the flow processing task according to the service processing requirement, and determine the data flow table and the dimension table to be associated from the service database according to the message queue topic.
In one embodiment of the present disclosure, the message subscription module 720 includes a message subscription unit for obtaining a pre-built metadata database for storing pre-written data mapping information, obtaining the data mapping information based on the metadata database, and subscribing to incremental data broadcast messages according to the data mapping information.
In one embodiment of the present disclosure, the data processing apparatus 700 further includes a broadcast function turning-on module for receiving a broadcast turning-on operation for the service database table, generating a broadcast turning-on configuration statement based on the broadcast turning-on operation, and turning on an incremental data broadcasting function of the service database table according to the broadcast turning-on configuration statement, the incremental data broadcasting function being for supporting a message subscription operation of the incremental data broadcasting message.
In one embodiment of the present disclosure, the data synchronization module 730 includes a data synchronization unit configured to determine business library table delta data, write the business library table delta data to an external data storage component, write the business library table delta data to a message queue topic corresponding to a delta message queue based on a delta data broadcast message, and synchronize the business library table delta data to a local cache of a dimension table association logic through the delta message queue.
In one embodiment of the disclosure, the dimension table association module 740 includes a dimension table association unit configured to obtain a message queue theme corresponding to incremental data of a service library table from a pre-constructed metadata database, start an incremental data writing thread, synchronize the incremental data of the service library table in the message queue theme to a local cache through the incremental data writing thread, update the service database table in the local cache to obtain a real-time service database table, and execute a dimension table association task based on the real-time service database table to obtain a real-time association dimension table.
In one embodiment of the disclosure, the real-time service database table comprises a real-time data stream table and a real-time dimension table to be associated, and the dimension table association unit comprises a dimension table association subunit, and is used for determining a data stream table field corresponding to the real-time data stream table and a dimension table field to be associated corresponding to the real-time dimension table to be associated, determining a data table connection statement based on a service processing request, and carrying out association processing on the data stream table field and the dimension table field to be associated according to the data table connection statement to obtain the real-time association dimension table.
In one embodiment of the present disclosure, the data processing apparatus 700 further includes a monitoring alarm module configured to obtain monitoring index data corresponding to the real-time associated dimension table, where the monitoring index data includes one or more of an incremental data consumption amount, a data consumption delay time, an update data amount, and a data health degree, obtain a preconfigured monitoring alarm condition, where the monitoring alarm condition includes a monitoring index reference threshold corresponding to each monitoring index data, determine a task processing state according to the monitoring index data and the monitoring alarm condition, and generate alarm prompt information based on the monitoring index data and display the alarm prompt information when the task processing state is an unsafe state.
Since each functional module of the data processing apparatus according to the exemplary embodiment of the present disclosure corresponds to a step of the foregoing exemplary embodiment of the data processing method, for details not disclosed in the embodiments of the present disclosure, please refer to the foregoing embodiment of the data processing method of the present disclosure, which is not described herein again.
It should be noted that although in the above detailed description several modules or units of the data processing apparatus are mentioned, this division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.
Exemplary Medium
Having described an apparatus of an exemplary embodiment of the present disclosure, a storage medium of an exemplary embodiment of the present disclosure is next described with reference to fig. 8.
In some embodiments, aspects of the present disclosure may also be implemented as a medium having stored thereon program code for carrying out the steps in a data processing method according to various exemplary embodiments of the present disclosure described in the "exemplary methods" section of the present description when executed by a processor of a device.
For example, when executing the program code, the processor of the device may implement step S310 as described in fig. 3, obtain the to-be-processed flow data, determine a service database table corresponding to the to-be-processed flow data, where the service database table is used to generate an association dimension table, step S320 subscribes to an incremental data broadcast message according to data mapping information between the service database table and a message queue subject, step S330 determines service library table incremental data corresponding to the service database table in response to the received incremental data broadcast message, synchronizes the service library table incremental data to a local cache of the dimension table association logic, and step S340 performs a dimension table association task based on the local cache of the dimension table association logic to obtain a real-time association dimension table.
Referring to fig. 8, a program product 800 for implementing the above-described data processing method or implementing the above-described data processing method according to an embodiment of the present disclosure is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of a readable storage medium include an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. The readable signal medium may also be any readable medium other than a readable storage medium.
Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the context of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN).
Exemplary computing device
Having described the data processing method, the data processing apparatus, and the storage medium of the exemplary embodiments of the present disclosure, next, an electronic device of the exemplary embodiments of the present disclosure will be described with reference to fig. 9.
Those skilled in the art will appreciate that the various aspects of the present disclosure may be implemented as a system, method, or program product. Accordingly, aspects of the present disclosure may be embodied in the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects that may be referred to herein generally as a "circuit," module, "or" system.
In some possible embodiments, an electronic device according to the present disclosure may include at least one processing unit, and at least one storage unit. Wherein the storage unit stores program code which, when executed by the processing unit, causes the processing unit to perform the steps in the data processing method according to the various exemplary embodiments of the present disclosure described in the section "exemplary method" above in the present specification. For example, the processing unit may execute step S310 shown in fig. 3 to obtain the flow data to be processed, determine a service database table corresponding to the flow data to be processed, where the service database table is used to generate an association dimension table, step S320 to subscribe to an incremental data broadcast message according to data mapping information between the service database table and a message queue subject, step S330 to determine service database table incremental data corresponding to the service database table in response to the received incremental data broadcast message, synchronize the service database table incremental data to a local cache of the dimension table association logic, and step S340 to execute a dimension table association task based on the local cache of the dimension table association logic to obtain a real-time association dimension table.
An electronic device 900 according to an example embodiment of the present disclosure is described below with reference to fig. 9. The electronic device 900 shown in fig. 9 is merely an example and should not be construed to limit the functionality and scope of use of the disclosed embodiments.
As shown in fig. 9, the electronic device 900 is embodied in the form of a general purpose computing device. The components of the electronic device 900 may include, but are not limited to, the at least one processing unit 901 described above, the at least one storage unit 902 described above, a bus 903 connecting the different system components (including the storage unit 902 and the processing unit 901), and a display unit 907.
Bus 903 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus architectures.
The storage unit 902 may include a readable medium in the form of volatile memory, such as Random Access Memory (RAM) 921 and/or cache memory 922, and may further include Read Only Memory (ROM) 923.
The storage unit 902 may also include a program/utility 925 having a set (at least one) of program modules 924, such program modules 924 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
The electronic device 900 may also communicate with one or more external devices 904 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with the electronic device 900, and/or any devices (e.g., routers, modems, etc.) that enable the electronic device 900 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 905. Also, the electronic device 900 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through the network adapter 906. As shown, the network adapter 906 communicates with other modules of the electronic device 900 over the bus 903. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 900, including, but not limited to, microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
It should be noted that although in the above detailed description several units/modules or sub-units/modules of a data processing apparatus are mentioned, such a division is only exemplary and not mandatory. Indeed, the features and functionality of two or more units/modules described above may be embodied in one unit/module in accordance with embodiments of the present disclosure. Conversely, the features and functions of one unit/module described above may be further divided into ones that are embodied by a plurality of units/modules.
Furthermore, although the operations of the methods of the present disclosure are depicted in the drawings in a particular order, this is not required or suggested that these operations must be performed in this particular order or that all of the illustrated operations must be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.
While the spirit and principles of the present disclosure have been described with reference to several particular embodiments, it is to be understood that this disclosure is not limited to the particular embodiments disclosed nor does it imply that features in these aspects are not to be combined to benefit from this division, which is done for convenience of description only. The disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202510457906.4A CN120371875A (en) | 2025-04-11 | 2025-04-11 | Data processing method, apparatus, electronic device and computer program product |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202510457906.4A CN120371875A (en) | 2025-04-11 | 2025-04-11 | Data processing method, apparatus, electronic device and computer program product |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN120371875A true CN120371875A (en) | 2025-07-25 |
Family
ID=96444717
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202510457906.4A Pending CN120371875A (en) | 2025-04-11 | 2025-04-11 | Data processing method, apparatus, electronic device and computer program product |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN120371875A (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN121118857A (en) * | 2025-11-10 | 2025-12-12 | 北京飞书科技有限公司 | Tabular data processing methods, devices, media, electronic equipment and software products |
-
2025
- 2025-04-11 CN CN202510457906.4A patent/CN120371875A/en active Pending
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN121118857A (en) * | 2025-11-10 | 2025-12-12 | 北京飞书科技有限公司 | Tabular data processing methods, devices, media, electronic equipment and software products |
| CN121118857B (en) * | 2025-11-10 | 2026-03-24 | 北京飞书科技有限公司 | Form data processing method, apparatus, medium, electronic device and program product |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11860874B2 (en) | Multi-partitioning data for combination operations | |
| US11586692B2 (en) | Streaming data processing | |
| US20250094469A1 (en) | Query based dynamic partition allocation | |
| US11151137B2 (en) | Multi-partition operation in combination operations | |
| US11163758B2 (en) | External dataset capability compensation | |
| US10762539B2 (en) | Resource estimation for queries in large-scale distributed database system | |
| US10121169B2 (en) | Table level distributed database system for big data storage and query | |
| JP6266630B2 (en) | Managing continuous queries with archived relations | |
| US20110208695A1 (en) | Data synchronization between a data center environment and a cloud computing environment | |
| US11544229B1 (en) | Enhanced tracking of data flows | |
| JP2019503525A (en) | Event batch processing, output sequencing, and log-based state storage in continuous query processing | |
| US11977862B2 (en) | Automatically cataloging application programming interface (API) | |
| US10691653B1 (en) | Intelligent data backfill and migration operations utilizing event processing architecture | |
| CN115168440A (en) | Data read-write method, distributed storage system, device, equipment and storage medium | |
| CN115587118A (en) | Task data dimension table association processing method and device and electronic equipment | |
| CN112860343A (en) | Configuration changing method, system, device, electronic equipment and storage medium | |
| US8539492B1 (en) | Managing data dependencies among multiple jobs using separate tables that store job results and dependency satisfaction | |
| CN111459931A (en) | Data duplication checking method and data duplication checking device | |
| CN120371875A (en) | Data processing method, apparatus, electronic device and computer program product | |
| CN109902067B (en) | File processing method, device, storage medium and computer equipment | |
| CN113886500A (en) | Data processing method, device, server and storage medium | |
| CN118261703A (en) | Full-link transaction view construction method and device, electronic equipment and storage medium | |
| WO2024103898A1 (en) | Database cluster management method and apparatus | |
| CN114020731B (en) | Data quality verification method, device, storage medium and electronic equipment | |
| Pal et al. | Near real-time big data stream processing platform using cassandra |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |