CN111897877B - High-performance high-reliability data sharing system and method based on distributed ideas - Google Patents
High-performance high-reliability data sharing system and method based on distributed ideas Download PDFInfo
- Publication number
- CN111897877B CN111897877B CN202010805266.9A CN202010805266A CN111897877B CN 111897877 B CN111897877 B CN 111897877B CN 202010805266 A CN202010805266 A CN 202010805266A CN 111897877 B CN111897877 B CN 111897877B
- Authority
- CN
- China
- Prior art keywords
- task
- data
- data sharing
- database
- sharing system
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3006—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/176—Support for shared access to files; File sharing support
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
- H04L63/0428—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
- H04L63/0428—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
- H04L63/0442—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload wherein the sending and receiving network entities apply asymmetric encryption, i.e. different keys for encryption and decryption
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Security & Cryptography (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Computer Hardware Design (AREA)
- Data Mining & Analysis (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Storage Device Security (AREA)
Abstract
本发明公开了基于分布式思想的高性能高可靠数据共享系统及方法,属于信息技术领域,要解决的技术问题为大规模数据共享效率低且不安全,同时存在数据共享难以管理、共享方式不直观及共享内容难监控,技术方案为:系统包括数据库连接管理单元、定时任务管理单元、任务日志管理单元、任务统计单元、任务监控单元及失败报告单元。方法如下:S1、配置启动数据共享任务,任务定时的从源数据库并通过定义的规则抽取数据形成数据文本文件并对文件进行压缩、加密后上传到文档中心并记录访问链接;S2、对外提供数据共享接口,使用方通过数据共享接口获取到下载链接;S3、调用下载链接去文档中心下载加密文件,并通过事先分配好的密钥进行解密从而获取数据。
The invention discloses a high-performance and high-reliability data sharing system and method based on distributed ideas, which belongs to the field of information technology. The technical problems to be solved are that large-scale data sharing is inefficient and unsafe, and data sharing is difficult to manage and the sharing method is inconsistent. Intuitive and shared content is difficult to monitor. The technical solution is as follows: the system includes a database connection management unit, a scheduled task management unit, a task log management unit, a task statistics unit, a task monitoring unit and a failure reporting unit. The method is as follows: S1. Configure and start a data sharing task. The task regularly extracts data from the source database through defined rules to form a data text file, compresses and encrypts the file, then uploads it to the document center and records the access link; S2. Provides data to the outside world. Sharing interface, the user obtains the download link through the data sharing interface; S3, calls the download link to download the encrypted file in the document center, and decrypts it through the pre-assigned key to obtain the data.
Description
技术领域Technical field
本发明涉及信息技术领域,主要涉及多信息系统间高效、安全的数据抽取与数据共享技术,具体地说是一种基于分布式思想的高性能高可靠数据共享系统及方法。The invention relates to the field of information technology, and mainly relates to efficient and safe data extraction and data sharing technology between multiple information systems. Specifically, it is a high-performance and high-reliability data sharing system and method based on distributed thinking.
背景技术Background technique
目前各个企业的信息化建设都达到了一定的规模,沉淀了大量的具有极高价值的数据,数据是一个企业的重要资产,是企业赖以生存和持久发展的灵魂,通常各个系统开发商(以下简称开发商)之间存在着各种原因导致这些数据资源被各个开发商独占。但企业内数据的分析又需要不同开发商之间开放一些数据来配合。为了使各个开发商能够方便快捷的共享数据,目前企业一般采用三种方式共享数据:At present, the informatization construction of various enterprises has reached a certain scale, and a large amount of highly valuable data has been accumulated. Data is an important asset of an enterprise and the soul for its survival and sustainable development. Usually various system developers ( There are various reasons why these data resources are exclusive to each developer. However, the analysis of data within the enterprise requires different developers to open some data to cooperate. In order to enable various developers to share data conveniently and quickly, companies currently generally use three methods to share data:
第一种方式:企业通常需要协调各个开发商以接口的形式共享数据,这种方式可以解决小量的数据共享问题,但对于数据量比较大的数据就束手无策了。例如每天需要同步全国烟草行业零售终端销售单数据(每天约千万订单)到烟草营销分析系统,普通的方法就捉襟见肘了。况且接口一般由各个开发商自己开发,自己持有,企业一般无法做到共享数据的实时管理与监控。The first method: Enterprises usually need to coordinate various developers to share data in the form of interfaces. This method can solve the problem of small amounts of data sharing, but it is helpless for large amounts of data. For example, it is necessary to synchronize the sales order data of retail terminals in the national tobacco industry (approximately tens of millions of orders per day) to the tobacco marketing analysis system every day, and ordinary methods are stretched. Moreover, interfaces are generally developed and held by each developer themselves, and enterprises generally cannot achieve real-time management and monitoring of shared data.
第二种方式:企业需要协调开发商,开放数据库只读权限以达到数据共享的目的,这种方式需要把数据库表结构的详细设计公开,以便其他厂商能够提取自己需要的数据。但是这种方式容易泄漏开发商数据库的底层设计,不仅不安全而且对参与数据共享的开发商来说也不公平。The second method: Enterprises need to coordinate developers and open read-only permissions on the database to achieve the purpose of data sharing. This method requires the detailed design of the database table structure to be made public so that other manufacturers can extract the data they need. However, this method can easily leak the underlying design of the developer's database, which is not only unsafe but also unfair to developers participating in data sharing.
第三种方式:企业需要协调开发商,采用ETL等现有数据抽取工具将需要共享的数据抽取到单独的共享数据库。这种方式工作量大,不但耗费大量人力资源而且不安全、不直观。The third way: Enterprises need to coordinate with developers and use existing data extraction tools such as ETL to extract the data that needs to be shared into a separate shared database. This method requires a lot of work, consumes a lot of human resources, and is unsafe and unintuitive.
综上所述,现有技术存在如下问题:长期以来企业内部数据共享困难,大规模数据难以共享,传统共享方式效率低且不安全,同时存在数据共享难以管理、共享方式不直观以及共享内容难监控。To sum up, the existing technology has the following problems: it has long been difficult to share data within enterprises, large-scale data is difficult to share, traditional sharing methods are inefficient and unsafe, data sharing is difficult to manage, the sharing method is not intuitive, and the content is difficult to share. monitor.
发明内容Contents of the invention
本发明的技术任务是提供一种基于分布式思想的高性能高可靠数据共享系统及方法,来解决大规模数据共享效率低且不安全,同时存在数据共享难以管理、共享方式不直观及共享内容难监控的问题。The technical task of the present invention is to provide a high-performance and high-reliability data sharing system and method based on distributed thinking to solve the problem of low efficiency and insecurity of large-scale data sharing. At the same time, data sharing is difficult to manage, the sharing method is not intuitive, and the sharing content is difficult to manage. Difficult to monitor issues.
本发明的技术任务是按以下方式实现的,一种基于分布式思想的高性能高可靠数据共享系统,该数据共享系统包括,The technical task of the present invention is achieved in the following manner: a high-performance and high-reliability data sharing system based on distributed thinking. The data sharing system includes:
数据库连接管理单元,用于新增待共享的源数据库并修改或删除数据库连接;Database connection management unit, used to add source databases to be shared and modify or delete database connections;
定时任务管理单元,用于新增、修改、启动、停止、删除以及手动补偿定时任务;Scheduled task management unit, used to add, modify, start, stop, delete and manually compensate scheduled tasks;
任务日志管理单元,用于查看每个历史任务的详细执行过程、每个历史任务的执行过程以及生成的加密数据文件在文档中心的存放位置,同时提供查看SQL、删除日志以及高级操作;The task log management unit is used to view the detailed execution process of each historical task, the execution process of each historical task, and the storage location of the generated encrypted data files in the document center. It also provides viewing SQL, deleting logs, and advanced operations;
任务统计单元,用于对每个任务执行过程进行统计;The task statistics unit is used to collect statistics on the execution process of each task;
任务监控单元,用于对任何一个任务进行实时监控,通过可视化图表的形式查看分析任务运行状态,为管理员提供全方位的数据共享任务监控信息;The task monitoring unit is used to monitor any task in real time, view and analyze the task running status in the form of visual charts, and provide administrators with comprehensive data sharing task monitoring information;
失败报告单元,用于对失败的任务生成报告,管理员直观的看到失败的任务出问题的环节。The failure reporting unit is used to generate reports on failed tasks. Administrators can intuitively see the problem areas of failed tasks.
作为优选,所述数据库连接管理单元包括,Preferably, the database connection management unit includes:
数据库连接新增模块,用于通过数据库连接管理功能新增待共享的源数据库,同时提供测试连接按钮,随时测试数据库连通性并管理多种数据库连接;The new database connection module is used to add source databases to be shared through the database connection management function. It also provides a test connection button to test database connectivity and manage multiple database connections at any time;
数据库连接修改删除模块,用于随时修改或删除已有的数据库连接。The database connection modification and deletion module is used to modify or delete existing database connections at any time.
更优地,所述待共享的源数据库新增的内容包括连接名称、连接地址、连接端口号、数据库名称、数据库用户名、数据库密码以及数据库类型。More preferably, the new content of the source database to be shared includes connection name, connection address, connection port number, database name, database user name, database password and database type.
作为优选,所述定时任务管理单元包括,Preferably, the scheduled task management unit includes:
定时任务新增模块,用于通过定时任务管理的新增功能新增定时任务;A new module for scheduled tasks is added, which is used to add scheduled tasks through the new functions of scheduled task management;
定时任务修改模块,用于对已经存在的数据共享定时任务进行修改;The scheduled task modification module is used to modify existing data sharing scheduled tasks;
定时任务启动、停止或删除模块,用于对已经存在的数据共享定时任务进行相应的启动、停止或删除操作;The scheduled task start, stop or delete module is used to start, stop or delete existing data sharing scheduled tasks accordingly;
手动补偿任务模块,用于迅速、自由的对历史数据进行手动补偿。The manual compensation task module is used to quickly and freely perform manual compensation on historical data.
更优地,所述新增定时任务需要维护的信息包括任务名称、约定表名、连接名称、cron表达式、任务步长、抽取时格式、抽取起始时间、时间偏移量、密钥类型、SQL类型、任务SQL以及时间戳字段全名;More preferably, the information that needs to be maintained by the new scheduled task includes task name, agreed table name, connection name, cron expression, task step size, extraction format, extraction start time, time offset, and key type. , SQL type, task SQL and the full name of the timestamp field;
其中,约定表名用于保护开发商底层设计而设计的策略,开发商随意填写该字段;Among them, the agreed table name is used to protect the developer's underlying design strategy, and the developer can fill in this field at will;
cron表达式是指linux操作系统下常用的时间指令表达式;The cron expression refers to the time command expression commonly used under the Linux operating system;
任务步长是指执行本次任务时需要从源数据库取多长时间的数据,以分钟计量;Task step size refers to how long the data needs to be fetched from the source database when executing this task, measured in minutes;
时间偏移量是指选取任务步长时需要往前偏移多久,用于解决使用数据库事务处理复杂业务时,数据入库延时问题;Time offset refers to how far forward it needs to be when selecting a task step. It is used to solve the problem of data entry delay when using database transactions to process complex businesses;
密钥类型是指对文件进行加密时能够选择不同的公私钥密钥对进行加密;Key type refers to the ability to choose different public and private key pairs for encryption when encrypting files;
SQL类型允许系统管理员告诉系统执行sql任务时的参数填充策略,具体包括自动拼接参数和手动定义参数;The SQL type allows the system administrator to tell the system the parameter filling strategy when executing SQL tasks, including automatic splicing parameters and manually defined parameters;
任务SQL是指数据共享任务的结构化查询语言;Task SQL refers to the structured query language for data sharing tasks;
时间戳字段全名用于解决各个开发商时间戳字段命名多样的问题,同时用于指明任务sql中哪个字段代表时间戳。The full name of the timestamp field is used to solve the problem of diverse naming of timestamp fields by various developers, and is also used to indicate which field in the task sql represents the timestamp.
作为优选,所述任务日志管理单元包括,Preferably, the task log management unit includes:
任务日志清空模块,用于将选中的任务的所有历史进行清空,并清空不需要的历史记录;The task log clearing module is used to clear all the history of the selected task and clear unnecessary history records;
SQL查看模块,用于查看选中任务的具体执行SQL,并协助分析数据共享过程中出现的问题;The SQL viewing module is used to view the specific execution SQL of the selected task and assist in analyzing problems that arise during the data sharing process;
日志删除模块,用于逐条删除任务日志;Log deletion module, used to delete task logs one by one;
高级操作模块,用于提供对日志管理更高级的操作。Advanced operation module, used to provide more advanced operations for log management.
作为优选,所述任务统计单元统计的内容包括共享的数据行数、共享数据文本文件大小、共享数据加密文件大小以及连接数据库时长。Preferably, the content counted by the task statistics unit includes the number of shared data rows, the size of the shared data text file, the size of the shared data encrypted file, and the duration of the connection to the database.
作为优选,该数据共享系统提供两种工作模式,具体如下:As an option, the data sharing system provides two working modes, as follows:
①、单Master模式:数据共享系统工作在一台服务器上,会自动通过多线程的方式模拟多个Slave子节点;①. Single Master mode: The data sharing system works on one server and will automatically simulate multiple Slave sub-nodes through multi-threading;
②、Master-Slave模式:将数据共享系统手动拆分为Master与Slave,其中Master节点工作在一台服务器上,Slave节点工作在其他多台服务器上,Master节点与Slave节点之间采用MQ消息中间件进行通信。②. Master-Slave mode: Manually split the data sharing system into Master and Slave. The Master node works on one server and the Slave node works on multiple other servers. The MQ message intermediate is used between the Master node and Slave node. to communicate.
一种基于分布式思想的高性能高可靠数据共享方法,该方法具体如下:A high-performance and high-reliability data sharing method based on distributed thinking. The method is as follows:
S1、系统管理员配置启动数据共享任务,任务定时的从源数据库并通过定义的规则抽取数据形成数据文本文件并对文件进行压缩、加密后上传到文档中心(企业内部非结构化数据库管理系统)并记录访问链接;S1. The system administrator configures and starts the data sharing task. The task regularly extracts data from the source database through defined rules to form a data text file, compresses and encrypts the file, and then uploads it to the document center (internal unstructured database management system of the enterprise). and record the access link;
S2、数据共享系统对外提供数据共享接口,使用方通过数据共享接口获取到下载链接;S2. The data sharing system provides a data sharing interface to the outside world, and the user obtains the download link through the data sharing interface;
S3、通过调用下载链接去文档中心下载加密文件,并通过事先分配好的密钥进行解密从而获取数据。S3. Go to the document center to download the encrypted file by calling the download link, and decrypt it with the pre-assigned key to obtain the data.
作为优选,所述数据共享系统能够按照管理员设置的任务步长将一个数据量巨大的共享任务按照时间维度进行自动拆分,分散执行,降低源数据库的压力;As an option, the data sharing system can automatically split a sharing task with a huge amount of data according to the time dimension according to the task step set by the administrator, and perform distributed execution to reduce the pressure on the source database;
数据共享系统从源数据库中获取的数据形成文本文件并通过RSA非对称加密算法进行加密形成加密文件,载发送加密文件到文档中心存储,需要共享数据的用户直接通过链接从文档中心下载数据,从而转移数据共享系统的压力;The data sharing system forms a text file from the source database and encrypts it through the RSA asymmetric encryption algorithm to form an encrypted file. The encrypted file is sent to the document center for storage. Users who need to share data directly download the data from the document center through the link. pressure to shift data-sharing systems;
需要共享数据的用户需要分两步获取目标数据,具体如下:Users who need to share data need to obtain the target data in two steps, as follows:
①、先调用数据共享系统提供的接口;① First call the interface provided by the data sharing system;
②、根据第一步获取的结果调用文档中心的接口取得加密文件。②. Based on the result obtained in the first step, call the interface of the document center to obtain the encrypted file.
本发明的基于分布式思想的高性能高可靠数据共享系统及方法具有以下优点:The high-performance and high-reliability data sharing system and method based on distributed thinking of the present invention have the following advantages:
(一)本发明为企业内部数据共享提供了一种全新的解决方案,打破传统数据共享困难的壁垒,共享方式非常直观、共享过程高效安全可靠、可以实现便捷高效的管理共享任务以及能够做到数据共享任务的实时监控,从而能够节省大量的人力资源成本,对企业内部数据共享尤其是超大规模数据的快速共享提供强有力的支撑;(1) The present invention provides a brand new solution for internal data sharing within the enterprise, breaking the barriers of traditional data sharing difficulties. The sharing method is very intuitive, the sharing process is efficient, safe and reliable, and it can realize convenient and efficient management sharing tasks and be able to Real-time monitoring of data sharing tasks can save a lot of human resource costs and provide strong support for internal data sharing within the enterprise, especially the rapid sharing of ultra-large-scale data;
(二)本发明可以有效的解决企业内部数据共享的问题,尤其是大规模数据共享的问题,并且可以做到方便快捷的管理数据共享任务、高效的、安全的做到大规模数据的共享,为企业内部数据共享尤其是大量数据的共享提供安全、高效的解决方案;(2) The present invention can effectively solve the problem of data sharing within the enterprise, especially the problem of large-scale data sharing, and can manage data sharing tasks conveniently and quickly, and achieve large-scale data sharing efficiently and safely. Provide safe and efficient solutions for intra-enterprise data sharing, especially the sharing of large amounts of data;
(三)本发明为了减少数据汇总过程中对源数据库的访问压力,避免影响其他业务系统的正常运行,将一次大量的数据共享任务划分成N次小型的数据共享任务,可以通过数据库连接池连接数据库,每隔相应的时间执行一次数据抽取任务,一次抽取步长(时间长度)的数据,抽取的这些数据被经过压缩、加密后上传到文档中心存放;(3) In order to reduce the access pressure on the source database during the data aggregation process and avoid affecting the normal operation of other business systems, the present invention divides a large number of data sharing tasks into N small data sharing tasks, which can be connected through the database connection pool The database performs a data extraction task at corresponding intervals, extracting data of step size (time length) at a time. The extracted data is compressed, encrypted and uploaded to the document center for storage;
(四)奔赴马宁除了具有普通数据抽取和共享工具的特性外还具有以下特性:(4) In addition to the characteristics of ordinary data extraction and sharing tools, Go to Maning also has the following characteristics:
①安全性:数据共享接口的调用需要平台授予开发商权限才能调用,保证了接口的安全性;数据文件以RSA加密的方式存放在文档中心,开发商必须申请获取RSA私钥才能对获取到的数据文件进行解密;① Security: The platform must grant the developer permission to call the data sharing interface, which ensures the security of the interface; the data files are stored in the document center in an RSA encrypted manner, and the developer must apply to obtain the RSA private key to access the obtained data. Decrypt data files;
②高性能:采用分布式思想,基于Master-Slave模式(一主多从模式),主节点仅用来生成和派发任务,从节点用来处理这些任务。从节点可以有多个,可以同时处理大规模数据抽取和共享任务;② High performance: Adopting distributed thinking and based on the Master-Slave mode (one master and multiple slaves mode), the master node is only used to generate and dispatch tasks, and the slave nodes are used to process these tasks. There can be multiple slave nodes, which can handle large-scale data extraction and sharing tasks at the same time;
③高可用:主节点派发任务存储在消息中间件中(以下简称MQ)同时从节点有多个节点且部署在不同的机器,保障系统的高可用;③ High availability: The tasks dispatched by the master node are stored in the message middleware (hereinafter referred to as MQ). At the same time, the slave nodes have multiple nodes and are deployed on different machines to ensure the high availability of the system;
④便捷管理:通过本发明系统管理员可以随时新增数据共享任务,随时修改数据抽取规则、随时启动数据共享任务、随时停止数据共享任务、随时删除数据共享任务;④ Convenient management: Through the present invention, the system administrator can add data sharing tasks at any time, modify data extraction rules at any time, start data sharing tasks at any time, stop data sharing tasks at any time, and delete data sharing tasks at any time;
⑤可监控统计:本发明可以实时查看任务的运行情况,监控任务的成功与失败;同时提供各种维度任务统计报表与可视化图表,实时掌握任务的运行情况;⑤ Monitorable statistics: This invention can check the running status of tasks in real time and monitor the success and failure of tasks; at the same time, it provides various dimensional task statistical reports and visual charts to grasp the running status of tasks in real time;
⑥可补偿:针对失败的任务,本发明兼有自动补偿与手动补偿的功能,当检测到任务失败后,会立即启动补偿机制,对失败的任务进行补偿,同时管理员也可以手动进行补偿,操作方便快捷;⑥ Compensable: For failed tasks, the present invention has both automatic and manual compensation functions. When a task failure is detected, the compensation mechanism will be started immediately to compensate for the failed task. At the same time, the administrator can also manually compensate. Easy and fast operation;
⑦源数据库降压:本发明提供对同一个数据共享任务拆分的能力,将一个数据共享任务分成n多个小任务,实现分时,分段去数据库读取数据,以免对源数据库造成压力,同时提高数据抽取与共享速度;⑦ Source database pressure reduction: The present invention provides the ability to split the same data sharing task, dividing a data sharing task into n multiple small tasks, realizing time sharing and segmenting the database to read data, so as to avoid putting pressure on the source database. , while improving the speed of data extraction and sharing;
⑧数据共享系统降压:将从源数据库中获取的数据处理后发送到文档中心存储,需要共享数据的系统开发商直接通过链接从文档中心下载数据,从而转移数据共享系统的压力。⑧Reducing the pressure on the data sharing system: The data obtained from the source database is processed and sent to the document center for storage. System developers who need to share data directly download the data from the document center through the link, thus shifting the pressure on the data sharing system.
附图说明Description of the drawings
下面结合附图对本发明进一步说明。The present invention is further described below in conjunction with the accompanying drawings.
附图1为基于分布式思想的高性能高可靠数据共享系统的结构框图;Figure 1 is a structural block diagram of a high-performance and highly reliable data sharing system based on distributed ideas;
附图2为附图1中数据共享系统的工作原理的架构图;Figure 2 is an architectural diagram of the working principle of the data sharing system in Figure 1;
附图3为新增数据库连接的界面截图;Figure 3 is a screenshot of the interface for adding a new database connection;
附图4为新增定时任务模块的界面截图一;Figure 4 is a screenshot of the interface of the new scheduled task module;
附图5为新增定时任务模块的界面截图二;Figure 5 is the second screenshot of the interface of the new scheduled task module;
附图6为任务日志管理单元的界面截图一;Figure 6 is a screenshot of the interface of the task log management unit;
附图7为任务日志管理单元的界面截图二;Figure 7 is the second screenshot of the interface of the task log management unit;
附图8为任务统计单元的界面截图;Figure 8 is a screenshot of the interface of the task statistics unit;
附图9为任务监控单元的折线图的界面截图;Figure 9 is a screenshot of the interface of the line chart of the task monitoring unit;
附图10为任务监控单元的饼图的界面截图;Figure 10 is a screenshot of the pie chart interface of the task monitoring unit;
附图11为失败报告的界面截图。Figure 11 is a screenshot of the failure report interface.
具体实施方式Detailed ways
参照说明书附图和具体实施例对本发明的基于分布式思想的高性能高可靠数据共享系统及方法作以下详细地说明。The high-performance and high-reliability data sharing system and method based on the distributed idea of the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
实施例1:Example 1:
如附图1所示,本发明的基于分布式思想的高性能高可靠数据共享系统,该数据共享系统包括,As shown in Figure 1, the high-performance and high-reliability data sharing system of the present invention is based on distributed thinking. The data sharing system includes:
数据库连接管理单元,用于新增待共享的源数据库并修改或删除数据库连接;数据库连接管理单元包括,The database connection management unit is used to add source databases to be shared and modify or delete database connections; the database connection management unit includes,
数据库连接新增模块,用于通过数据库连接管理功能新增待共享的源数据库,同时提供测试连接按钮,随时测试数据库连通性并管理多种数据库连接,如附图3所示;The new database connection module is used to add source databases to be shared through the database connection management function. It also provides a test connection button to test database connectivity and manage multiple database connections at any time, as shown in Figure 3;
数据库连接修改删除模块,用于随时修改或删除已有的数据库连接。The database connection modification and deletion module is used to modify or delete existing database connections at any time.
其中,待共享的源数据库新增的内容包括连接名称、连接地址、连接端口号、数据库名称、数据库用户名、数据库密码以及数据库类型。Among them, the new content of the source database to be shared includes connection name, connection address, connection port number, database name, database user name, database password and database type.
定时任务管理单元,用于新增、修改、启动、停止、删除以及手动补偿定时任务;定时任务管理单元包括,The scheduled task management unit is used to add, modify, start, stop, delete and manually compensate scheduled tasks; the scheduled task management unit includes,
定时任务新增模块,用于通过定时任务管理的新增功能新增定时任务;新增定时任务需要维护的信息包括任务名称、约定表名、连接名称、cron表达式、任务步长、抽取时格式、抽取起始时间、时间偏移量、密钥类型、SQL类型、任务SQL以及时间戳字段全名,如附图4和5所示;The new scheduled task module is used to add scheduled tasks through the new functions of scheduled task management; the information that needs to be maintained for new scheduled tasks includes task name, agreed table name, connection name, cron expression, task step size, extraction time Format, extraction start time, time offset, key type, SQL type, task SQL and the full name of the timestamp field, as shown in Figures 4 and 5;
其中,约定表名用于保护开发商底层设计而设计的策略,开发商随意填写该字段;Among them, the agreed table name is used to protect the developer's underlying design strategy, and the developer can fill in this field at will;
cron表达式是指linux操作系统下常用的时间指令表达式;The cron expression refers to the time command expression commonly used under the Linux operating system;
任务步长是指执行本次任务时需要从源数据库取多长时间的数据,以分钟计量;Task step size refers to how long the data needs to be fetched from the source database when executing this task, measured in minutes;
时间偏移量是指选取任务步长时需要往前偏移多久,用于解决使用数据库事务处理复杂业务时,数据入库延时问题;Time offset refers to how far forward it needs to be when selecting a task step. It is used to solve the problem of data entry delay when using database transactions to process complex businesses;
密钥类型是指对文件进行加密时能够选择不同的公私钥密钥对进行加密;Key type refers to the ability to choose different public and private key pairs for encryption when encrypting files;
SQL类型允许系统管理员告诉系统执行sql任务时的参数填充策略,具体包括自动拼接参数和手动定义参数;The SQL type allows the system administrator to tell the system the parameter filling strategy when executing SQL tasks, including automatic splicing parameters and manually defined parameters;
任务SQL是指数据共享任务的结构化查询语言;Task SQL refers to the structured query language for data sharing tasks;
时间戳字段全名用于解决各个开发商时间戳字段命名多样的问题,同时用于指明任务sql中哪个字段代表时间戳。The full name of the timestamp field is used to solve the problem of diverse naming of timestamp fields by various developers, and is also used to indicate which field in the task sql represents the timestamp.
定时任务修改模块,用于对已经存在的数据共享定时任务进行修改;The scheduled task modification module is used to modify existing data sharing scheduled tasks;
定时任务启动、停止或删除模块,用于对已经存在的数据共享定时任务进行相应的启动、停止或删除操作;The scheduled task start, stop or delete module is used to start, stop or delete existing data sharing scheduled tasks accordingly;
手动补偿任务模块,用于迅速、自由的对历史数据进行手动补偿。The manual compensation task module is used to quickly and freely perform manual compensation on historical data.
任务日志管理单元,用于查看每个历史任务的详细执行过程、每个历史任务的执行过程以及生成的加密数据文件在文档中心的存放位置,同时提供查看SQL、删除日志以及高级操作,如附图6和7所示;任务日志管理单元包括,The task log management unit is used to view the detailed execution process of each historical task, the execution process of each historical task, and the storage location of the generated encrypted data files in the document center. It also provides viewing of SQL, deletion of logs, and advanced operations, as attached. As shown in Figures 6 and 7; the task log management unit includes,
任务日志清空模块,用于将选中的任务的所有历史进行清空,并清空不需要的历史记录;The task log clearing module is used to clear all the history of the selected task and clear unnecessary history records;
SQL查看模块,用于查看选中任务的具体执行SQL,并协助分析数据共享过程中出现的问题;The SQL viewing module is used to view the specific execution SQL of the selected task and assist in analyzing problems that arise during the data sharing process;
日志删除模块,用于逐条删除任务日志;Log deletion module, used to delete task logs one by one;
高级操作模块,用于提供对日志管理更高级的操作。Advanced operation module, used to provide more advanced operations for log management.
任务统计单元,用于对每个任务执行过程进行统计,如附图8所示;任务统计单元统计的内容包括共享的数据行数、共享数据文本文件大小、共享数据加密文件大小以及连接数据库时长。The task statistics unit is used to collect statistics on the execution process of each task, as shown in Figure 8; the statistics of the task statistics unit include the number of shared data rows, the size of the shared data text file, the size of the shared data encryption file, and the length of the connection to the database. .
任务监控单元,用于对任何一个任务进行实时监控,通过可视化图表的形式查看分析任务运行状态,为管理员提供全方位的数据共享任务监控信息,如附图9和10所示;The task monitoring unit is used to monitor any task in real time, view and analyze the task running status in the form of visual charts, and provide administrators with a full range of data sharing task monitoring information, as shown in Figures 9 and 10;
失败报告单元,用于对失败的任务生成报告,管理员直观的看到失败的任务出问题的环节,如附图11所示。The failure reporting unit is used to generate reports on failed tasks. Administrators can intuitively see the problem areas of failed tasks, as shown in Figure 11.
如附图2所示,该数据共享系统提供两种工作模式,具体如下:As shown in Figure 2, the data sharing system provides two working modes, as follows:
①、单Master模式:数据共享系统工作在一台服务器上,会自动通过多线程的方式模拟多个Slave子节点;①. Single Master mode: The data sharing system works on one server and will automatically simulate multiple Slave sub-nodes through multi-threading;
②、Master-Slave模式:将数据共享系统手动拆分为Master与Slave,其中Master节点工作在一台服务器上,Slave节点工作在其他多台服务器上,Master节点与Slave节点之间采用MQ消息中间件进行通信。②. Master-Slave mode: Manually split the data sharing system into Master and Slave. The Master node works on one server and the Slave node works on multiple other servers. The MQ message intermediate is used between the Master node and Slave node. to communicate.
数据共享系统的工作过程具体如下:The working process of the data sharing system is as follows:
(1)、系统管理员登录,通过数据库管理功能新增要共享数据的数据库;(1) Log in as the system administrator and add a database to be shared through the database management function;
(2)、配置定时任务与数据共享规则并启动任务;(2) Configure scheduled tasks and data sharing rules and start the tasks;
(3)、数据共享系统会按照系统管理员的配置,定时的从源数据库并通过定义的规则抽取数据形成数据文本文件;(3) The data sharing system will regularly extract data from the source database according to the defined rules according to the configuration of the system administrator to form a data text file;
(4)、数据文本文件形成后数据共享系统会将这些文本文件归档压缩同时通过RSA非对称算法进行加密形成加密文件,随后系统会将加密文件上传至企业内部非结构化数据库管理系统(简称文档中心)并记录访问链接;(4) After the data text files are formed, the data sharing system will archive and compress these text files and encrypt them through the RSA asymmetric algorithm to form encrypted files. Then the system will upload the encrypted files to the enterprise's internal unstructured database management system (referred to as document Center) and record the access link;
(4)、数据共享系统对外提供数据共享接口,开发商可以通过数据共享系统接口获取到下载链接;(4) The data sharing system provides an external data sharing interface, and developers can obtain download links through the data sharing system interface;
(5)、调用该链接去文档中心下载加密文件,并通过事先分配好的密钥进行解密从而获取数据。(5) Call this link to download the encrypted file from the document center, and decrypt it with the pre-assigned key to obtain the data.
本发明适用于零售行业。The invention is suitable for the retail industry.
实施例2:Example 2:
本发明的基于分布式思想的高性能高可靠数据共享方法,该方法具体如下:The present invention's high-performance and high-reliability data sharing method based on distributed thinking is specifically as follows:
S1、系统管理员配置启动数据共享任务,任务定时的从源数据库并通过定义的规则抽取数据形成数据文本文件并对文件进行压缩、加密后上传到文档中心(企业内部非结构化数据库管理系统)并记录访问链接;S1. The system administrator configures and starts a data sharing task. The task regularly extracts data from the source database through defined rules to form a data text file, compresses and encrypts the file, and then uploads it to the document center (internal unstructured database management system of the enterprise). and record the access link;
S2、数据共享系统对外提供数据共享接口,使用方通过数据共享接口获取到下载链接;S2. The data sharing system provides a data sharing interface to the outside world, and the user obtains the download link through the data sharing interface;
S3、通过调用下载链接去文档中心下载加密文件,并通过事先分配好的密钥进行解密从而获取数据。S3. Go to the document center to download the encrypted file by calling the download link, and decrypt it with the pre-assigned key to obtain the data.
其中,数据共享系统能够按照管理员设置的任务步长将一个数据量巨大的共享任务按照时间维度进行自动拆分,分散执行,降低源数据库的压力;Among them, the data sharing system can automatically split a sharing task with a huge amount of data according to the time dimension according to the task step set by the administrator, and perform distributed execution to reduce the pressure on the source database;
数据共享系统从源数据库中获取的数据形成文本文件并通过RSA非对称加密算法进行加密形成加密文件,载发送加密文件到文档中心存储,需要共享数据的用户直接通过链接从文档中心下载数据,从而转移数据共享系统的压力;The data sharing system forms a text file from the source database and encrypts it through the RSA asymmetric encryption algorithm to form an encrypted file. The encrypted file is sent to the document center for storage. Users who need to share data directly download the data from the document center through the link. pressure to shift data-sharing systems;
需要共享数据的用户需要分两步获取目标数据,具体如下:Users who need to share data need to obtain the target data in two steps, as follows:
①、先调用数据共享系统提供的接口;① First call the interface provided by the data sharing system;
②、根据第一步获取的结果调用文档中心的接口取得加密文件。②. Based on the result obtained in the first step, call the interface of the document center to obtain the encrypted file.
最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention, but not to limit it. Although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: The technical solutions described in the foregoing embodiments can still be modified, or some or all of the technical features can be equivalently replaced; and these modifications or substitutions do not deviate from the essence of the corresponding technical solutions from the technical solutions of the embodiments of the present invention. scope.
Claims (8)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010805266.9A CN111897877B (en) | 2020-08-12 | 2020-08-12 | High-performance high-reliability data sharing system and method based on distributed ideas |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010805266.9A CN111897877B (en) | 2020-08-12 | 2020-08-12 | High-performance high-reliability data sharing system and method based on distributed ideas |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN111897877A CN111897877A (en) | 2020-11-06 |
| CN111897877B true CN111897877B (en) | 2024-03-26 |
Family
ID=73228879
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202010805266.9A Active CN111897877B (en) | 2020-08-12 | 2020-08-12 | High-performance high-reliability data sharing system and method based on distributed ideas |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN111897877B (en) |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112632001A (en) * | 2021-01-13 | 2021-04-09 | 中教云智数字科技有限公司 | System based on database table sharing exchange |
| CN113836210A (en) * | 2021-09-15 | 2021-12-24 | 浙江中烟工业有限责任公司 | Method for regularly processing non-public retail big data in tobacco industry |
| CN114860479A (en) * | 2022-05-11 | 2022-08-05 | 中国邮政储蓄银行股份有限公司 | Data synchronization method, device, computer readable storage medium and processor |
| CN117793177B (en) * | 2023-12-05 | 2024-11-22 | 江苏苏商银行股份有限公司 | A configuration decoupling method and system based on distributed microservice architecture |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102147807A (en) * | 2011-03-10 | 2011-08-10 | 南京信息工程大学 | Mass lightning data space-time analysis method based on GIS |
| CN103391185A (en) * | 2013-08-12 | 2013-11-13 | 北京泰乐德信息技术有限公司 | Cloud security storage and processing method and system for rail transit monitoring data |
| CN108052681A (en) * | 2018-01-12 | 2018-05-18 | 毛彬 | The synchronous method and system of structural data between a kind of relevant database |
| CN109062920A (en) * | 2018-05-31 | 2018-12-21 | 江苏开拓信息与系统有限公司 | A kind of data Fast Collision subsystem memory-based for data digging system |
| CN110134714A (en) * | 2019-05-22 | 2019-08-16 | 东北大学 | A kind of distributed computing framework caching index suitable for big data iterative calculation |
| CN110399425A (en) * | 2019-07-07 | 2019-11-01 | 上海鸿翼软件技术股份有限公司 | A kind of intelligence Dropbox micro services system |
| KR20200056357A (en) * | 2020-03-17 | 2020-05-22 | 주식회사 실크로드소프트 | Technique for implementing change data capture in database management system |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10049145B2 (en) * | 2016-04-25 | 2018-08-14 | Dropbox, Inc. | Storage constrained synchronization engine |
-
2020
- 2020-08-12 CN CN202010805266.9A patent/CN111897877B/en active Active
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102147807A (en) * | 2011-03-10 | 2011-08-10 | 南京信息工程大学 | Mass lightning data space-time analysis method based on GIS |
| CN103391185A (en) * | 2013-08-12 | 2013-11-13 | 北京泰乐德信息技术有限公司 | Cloud security storage and processing method and system for rail transit monitoring data |
| CN108052681A (en) * | 2018-01-12 | 2018-05-18 | 毛彬 | The synchronous method and system of structural data between a kind of relevant database |
| CN109062920A (en) * | 2018-05-31 | 2018-12-21 | 江苏开拓信息与系统有限公司 | A kind of data Fast Collision subsystem memory-based for data digging system |
| CN110134714A (en) * | 2019-05-22 | 2019-08-16 | 东北大学 | A kind of distributed computing framework caching index suitable for big data iterative calculation |
| CN110399425A (en) * | 2019-07-07 | 2019-11-01 | 上海鸿翼软件技术股份有限公司 | A kind of intelligence Dropbox micro services system |
| KR20200056357A (en) * | 2020-03-17 | 2020-05-22 | 주식회사 실크로드소프트 | Technique for implementing change data capture in database management system |
Non-Patent Citations (2)
| Title |
|---|
| "智慧方志"下的地方志数据库建设及安全规范研究;唐远波;;巴蜀史志(第01期);全文 * |
| 基于项目管理的创新型教育资源建设与管理平台设计;张京彬;贺志强;;中国电化教育(第01期);全文 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN111897877A (en) | 2020-11-06 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111897877B (en) | High-performance high-reliability data sharing system and method based on distributed ideas | |
| CN114925045B (en) | PaaS platform for big data integration and management | |
| Nedelkoski et al. | Multi-source distributed system data for ai-powered analytics | |
| CN102685180B (en) | Cloud computing-oriented network security early warning method | |
| CN102542007B (en) | Method and system for synchronization of relational databases | |
| CN103605698A (en) | Cloud database system used for distributed heterogeneous data resource integration | |
| WO2017162032A1 (en) | Method and device for executing data recovery operation | |
| US20100223446A1 (en) | Contextual tracing | |
| CN111680105B (en) | Management method and system of distributed relational database based on blockchain | |
| CN104036365A (en) | Method for constructing enterprise-level data service platform | |
| CN103118130A (en) | Cluster management method and cluster management system for distributed service | |
| CN101408889A (en) | Method, apparatus and system for monitoring performance | |
| US8959051B2 (en) | Offloading collection of application monitoring data | |
| CN202373025U (en) | Intelligent device for dispatching business integration and data integration | |
| WO2019223178A1 (en) | Cross-platform task scheduling method and system, computer device, and storage medium | |
| US11811847B2 (en) | Server-side workflow improvement based on client-side data mining | |
| CN103810272A (en) | Data processing method and system | |
| CN111680900A (en) | A work order issuing method, device, electronic device and storage medium | |
| CN113849561A (en) | Energized platform based on block chain technology | |
| WO2023016187A1 (en) | Update system for forest resource one-map progress, cloud platform, and method | |
| CN116010494A (en) | Data exchange system supporting heterogeneous data sources | |
| CN106789395B (en) | A Web-based Distributed PDM System Data Transmission Monitoring Method | |
| CN106251078A (en) | Qualitative materiel for electrical network manages system | |
| CN119397073A (en) | A visual data platform full-link data flow tracing method, system, device and medium | |
| CN110826993A (en) | Project management processing method, device, storage medium and processor |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| CB02 | Change of applicant information | ||
| CB02 | Change of applicant information |
Country or region after: China Address after: 271000 Langchao science and Technology Park, 527 Dongyue street, Tai'an City, Shandong Province Applicant after: INSPUR SOFTWARE Co.,Ltd. Address before: No. 1036, Shandong high tech Zone wave road, Ji'nan, Shandong Applicant before: INSPUR SOFTWARE Co.,Ltd. Country or region before: China |
|
| GR01 | Patent grant | ||
| GR01 | Patent grant |