CN113923209A

CN113923209A - Processing method for downloading batch data based on levelDB

Info

Publication number: CN113923209A
Application number: CN202111154388.7A
Authority: CN
Inventors: 徐可光; 徐逸文
Original assignee: Beijing Qingzhou Zhihang Technology Co ltd
Current assignee: Beijing Qingzhou Zhihang Intelligent Technology Co ltd
Priority date: 2021-09-29
Filing date: 2021-09-29
Publication date: 2022-01-11
Anticipated expiration: 2041-09-29
Also published as: CN113923209B

Abstract

The embodiment of the present invention relates to a processing method for batch data download based on LevelDB, the method includes: obtaining a first batch data download application instruction; from a preset complete LevelDB data file, query and each first download keyword data The matched data content generates corresponding first data content data, and the first keyword-content data pair is composed of the first download keyword data and the corresponding first data content data; according to the obtained multiple first keyword-content Perform data file reconstruction processing on the data pair to generate the current LevelDB data file; use the current LevelDB data file as the download data for this batch data download to perform corresponding data download processing. Through the present invention, the problem of jamming caused by multiple consecutive search requests during batch downloading can be avoided, and the problem of excessively large redundant data caused by downloading a complete LevelDB data file can be avoided.

Description

Processing method for downloading batch data based on levelDB

Technical Field

The invention relates to the technical field of data processing, in particular to a processing method for downloading batch data based on a levelDB.

Background

The level DB data file is a mass data file format commonly used for current cloud storage, and is also a storage format commonly used for storing drive test data in the field of automatic driving. The logical structure of the levelDB data file is that data and indexes are divided into two parts for storage, wherein the data area is called as a data area for storing the data, and the index area is called as an index area for storing the indexes. In order to save the storage space, the cloud end usually stores the data in the level db data file in a data compression manner. The cloud storage server meets the data query and data acquisition requirements of the remote equipment by providing a data search interface or a complete level DB data file downloading interface which takes individual keywords as search objects for the remote equipment.

With the continuous growth of stored data, we find that both operation modes of the cloud storage server have some disadvantages. When the remote device wants to acquire the storage data corresponding to more than one keyword from the stored mass data, the data search interface is frequently used to initiate a plurality of search requests to the cloud storage server, and the operation can generate an obvious stuck condition under the environment with limited network conditions. At this time, in principle, the above-mentioned situation of stuck caused by multiple interactions can be avoided by initiating one-time file downloading through a complete level db data file downloading interface, but in practical application, the amount of level db data files stored in the cloud is very large and far exceeds the amount of data required to be downloaded in the present period, and this operation is very easy to cause download failure due to transmission interruption in an environment with limited network conditions on one hand, and on the other hand, even if the download is successful, a large amount of redundant data is generated, resulting in a situation of local resource shortage.

Disclosure of Invention

The invention aims to provide a processing method for downloading batch data based on a level DB, an electronic device and a computer readable storage medium, and provides a batch data downloading processing mechanism which is newly added, after receiving a batch data downloading application instruction containing a plurality of keywords, obtains data contents matched with each keyword from the existing complete level DB data file storing mass data, reconstructs a data file comprising a data area and index reconstruction according to the level DB data file format, and returns the data file with the reconstructed volume far smaller than that of the complete level DB data file as one-time downloading data. The invention can avoid the problem of jamming caused by continuous and repeated search requests during batch downloading and the problem of overlarge redundant data caused by downloading complete level DB data files.

In order to achieve the above object, a first aspect of the embodiments of the present invention provides a processing method for downloading batch data based on a level db, where the method includes:

acquiring a first batch data downloading application instruction; the first batch data downloading application instruction comprises a plurality of first downloading keyword data;

inquiring data content matched with each first downloaded keyword data from a preset complete level DB data file to generate corresponding first data content data, and forming a first keyword-content data pair by the first downloaded keyword data and the corresponding first data content data;

performing data file reconstruction processing according to the obtained plurality of first keyword-content data pairs to generate a current level DB data file;

and taking the current level DB data file as download data of the batch data download to perform corresponding data download processing.

Preferably, the complete level db data file includes a plurality of first data storage modules having the same length; the first data storage module comprises first storage data, a first data compression algorithm type and a first data check code.

Preferably, the querying, from a preset complete level db data file, data content matched with each of the first download keyword data to generate corresponding first data content data, and forming a first keyword-content data pair by the first download keyword data and the corresponding first data content data specifically includes:

according to a preset check code algorithm type, carrying out data file integrity check processing on the complete level DB data file;

if the integrity check processing of the data file is successful, performing data decompression processing on the complete level DB data file to generate a corresponding first decompressed data file;

in the first decompressed data file, querying data content matched with each first downloaded keyword data to generate corresponding first data content data;

and the first keyword-content data pair is formed by the first downloading keyword data and the corresponding first data content data.

Further, the performing, according to a preset check code algorithm type, data file integrity check processing on the complete level db data file specifically includes:

polling each first data storage module of the complete level DB data file, and taking the first data storage module which is polled currently as a first current data storage module; extracting the first storage data of the first current data storage module as first current storage data; extracting the first data check code of the first current data storage module as a first current data check code; performing check code calculation on the first current stored data by using a check code algorithm corresponding to the check code algorithm type to generate a comparison data check code; confirming whether the comparison data check code is matched with the first current data check code, and if the comparison data check code is matched with the first current data check code, setting a corresponding first module check result as successful;

and if all the obtained first module verification results are successful, the data file integrity verification processing is successful.

Further, the performing data decompression processing on the complete level db data file to generate a corresponding first decompressed data file specifically includes:

polling each first data storage module of the complete level DB data file, and taking the first data storage module which is polled currently as a second current data storage submodule; extracting the first storage data of the second current data storage module as second current storage data; extracting the first data compression algorithm type of the second current data storage module as a current compression algorithm type; performing data decompression processing on the second current storage data by using a data decompression algorithm corresponding to the current compression algorithm type to generate corresponding first storage module decompressed data;

and splicing the obtained plurality of first storage module decompressed data in sequence to generate the first decompressed data file.

Further, the first decompressed data file includes at least a first data area and a first index area;

a plurality of first data modules are stored in the first data area; a plurality of first data records are stored in the first data module; the first data record at least stores a first keyword name and first keyword content; the length of the first data record is not fixed; in the first data module, storing the first data records according to the sequence from small to large of the character string sequence of the corresponding first keyword name;

the first index area stores a plurality of first index records; the first index record at least stores a first module index identification, a first module boundary keyword name, a first module starting offset address and a first module length;

the first index records correspond to the first data modules one to one;

the first module index identifier of the first index record is an index identifier of the corresponding first data module in the first data area; the string ordering of the first module boundary key name of the first index record is not less than the string ordering of the first key name of the last first data record in the corresponding first data module; the first module starting offset address recorded by the first index is a starting storage address of the corresponding first data module in the first data area; the first module length of the first index record is the storage data length of the corresponding first data module in the first data area;

in two adjacent first index records of any module index identifier stored in the first index area, the character string ordering of the first module boundary keyword name of the former first index record is smaller than the character string ordering of the first keyword name of the first data module corresponding to the latter first index record.

Further, the querying, in the first decompressed data file, data content matched with each of the first downloaded keyword data to generate corresponding first data content data specifically includes:

in the first index area of the first decompressed data file, the first module boundary key name of any one of the first index records is used as a first end key name corresponding to the current first index record; if the last first index record of the current first index record is not empty, taking the first module boundary key name of the last first index record as a first starting key name corresponding to the current first index record; if the previous first index record is empty, taking the first keyword name of the first data record in the first data module corresponding to the current first index record as the first starting keyword name;

polling each first downloading keyword data, and taking the currently polled first downloading keyword data as current downloading keyword data; and taking the first index record in the first index area, in which the character string ordering of the first starting keyword name is less than or equal to the character string ordering of the current downloading keyword data and the character string ordering of the current downloading keyword data is less than the character string ordering of the first ending keyword name, as a current matching index record; taking the first data module corresponding to the current matching index record in the first data area as a current matching data module; taking the first data record in the current matching data module, in which the first keyword name is matched with the current downloading keyword data, as a current matching data record; and extracting the first keyword content in the currently matching data record as the first data content data corresponding to the currently downloaded keyword data.

Preferably, the reconstructing a data file according to the obtained multiple first keyword-content pairs to generate a current level db data file specifically includes:

extracting the first downloaded keyword data from each first keyword-content data pair as a corresponding second keyword name, extracting the first data content data as corresponding second keyword content, and forming a corresponding second data record by the second keyword name and the second keyword content;

sequencing all the obtained second data records according to the sequence from small to large of the character string sequencing of the corresponding second keyword name to obtain an initial data module; performing module cutting on the initial data module according to a preset data module dividing principle to obtain a plurality of second data modules; the second data module comprises a plurality of second data records, and the second data records are still sorted in the second data module according to the sequence from small to large of the character string sorting of the corresponding second keyword name;

performing data splicing on the obtained plurality of second data modules to generate a second data area;

in the second data area, allocating a unique module index identifier for each second data module as a corresponding second module index identifier; taking the initial position of each second data module in the second data area as a corresponding second module initial offset address; taking the data length of each second data module as the corresponding second module length;

distributing a corresponding second module boundary keyword name to each second data module according to a preset boundary keyword distribution mode; when the boundary keyword distribution mode is a first mode, setting the second module boundary keyword name according to the second keyword name of the last second data record of the current second data module; when the boundary keyword allocation mode is a second mode, allocating a dynamic string to a current second data module as the second module boundary keyword name, wherein the string ordering of the dynamic string is greater than the string ordering of the second keyword name of the last second data record of the current second data module, and the string ordering of the dynamic string is also less than the string ordering of the second keyword name of the first second data record of the next second data module when the next second data module of the current second data module is not empty;

forming a corresponding second index record by the second module index identifier, the second module boundary keyword name, the second module starting offset address and the second module length corresponding to each second data module; splicing all the second index records according to the sequence of the corresponding second module index identifications from small to large to obtain corresponding second index areas;

splicing the second data area and the second index area to generate a first data file to be compressed;

according to a preset data file to be compressed cutting principle, performing data module cutting on the first data file to be compressed to generate a plurality of first cutting module data; distributing a corresponding second data compression algorithm type for each first cutting module data, and setting the second data compression algorithm type according to a preset batch data compression algorithm type; performing data compression processing on the data of each first cutting module by using a data compression algorithm corresponding to the type of the second data compression algorithm to generate corresponding second storage data; performing check code calculation on each second storage data by using a check code algorithm corresponding to a preset check code algorithm type to generate a corresponding second data check code; the second storage data corresponding to each first cutting module data, a second data compression algorithm type and a second data check code form a corresponding second data storage module;

and sequentially splicing all the obtained second data storage modules to obtain the current level DB data file.

A second aspect of an embodiment of the present invention provides an electronic device, including: a memory, a processor, and a transceiver;

the processor is configured to be coupled to the memory, read and execute instructions in the memory, so as to implement the method steps of the first aspect;

the transceiver is coupled to the processor, and the processor controls the transceiver to transmit and receive messages.

A third aspect of embodiments of the present invention provides a computer-readable storage medium storing computer instructions that, when executed by a computer, cause the computer to perform the method of the first aspect.

The embodiment of the invention provides a processing method for downloading batch data based on a level DB, an electronic device and a computer readable storage medium, wherein a batch data downloading processing mechanism is newly added, after a batch data downloading application instruction containing a plurality of keywords is received, data content matched with each keyword is obtained from an existing complete level DB data file storing mass data, data file reconstruction operation including data area and index reconstruction is carried out according to a level DB data file format, and a data file with the volume far smaller than that of the complete level DB data file obtained through reconstruction is returned as one-time downloading data. By the method and the device, the problem of blockage caused by continuous and repeated search requests during batch downloading is avoided, and the downloading quality is improved; and the problem of overlarge redundant data caused by downloading the complete level DB data file is avoided, and the downloading efficiency is improved.

Drawings

Fig. 1 is a schematic diagram of a processing method for downloading batch data based on a level db according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of an electronic device according to a second embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The cloud server uses the processing method for downloading the batch data based on the level DB provided by the embodiment of the invention, a processing mechanism for downloading the batch data is newly added, after a batch data downloading application instruction containing a plurality of keywords is received, data contents matched with the keywords are obtained from the existing complete level DB data file in which massive data are stored, data file reconstruction operation including data area and index reconstruction is carried out on the data file according to the obtained keyword-contents according to the level DB data file format, and the data file with the volume far smaller than that of the complete level DB data file obtained through reconstruction is returned as one-time downloading data; therefore, the problem of jamming caused by continuous and repeated search requests during batch downloading is avoided, the downloading quality is improved, the problem of overlarge redundant data caused by downloading of complete level DB data files is also avoided, and the downloading efficiency is improved; fig. 1 is a schematic diagram of a processing method for downloading batch data based on a level db according to an embodiment of the present invention, as shown in fig. 1, the method mainly includes the following steps:

step 1, acquiring a first batch data downloading application instruction;

the first batch data downloading application instruction comprises a plurality of first downloading keyword data.

Here, the cloud server acquires a first batch data downloading application instruction sent by the remote device; the first batch data download application instruction comprises a plurality of keyword information, namely first download keyword data, and in an extreme case, the first batch data download application instruction can also only comprise one first download keyword data.

Step 2, inquiring data content matched with each first downloaded keyword data from a preset complete level DB data file to generate corresponding first data content data, and forming a first keyword-content data pair by the first downloaded keyword data and the corresponding first data content data;

the complete level DB data file comprises a plurality of first data storage modules with the same length; the first data storage module comprises first storage data, a first data compression algorithm type and a first data check code;

here, the complete level db data file is a complete level db data file in which mass data is stored in the cloud storage medium; the storage structure of the levelDB data file of the embodiment of the invention comprises a plurality of storage sub-modules, namely a first data storage module, and the storage structure of each storage sub-module at least comprises three parts: the method comprises the steps that a compressed data packet is first stored data, type data of a compression algorithm is identified and is a first data compression algorithm type, and a check code for ensuring that the compressed data packet is not tampered is first data check code;

the cloud server obtains data content matched with each first downloaded keyword data from an existing complete level DB data file with mass data, namely first data content data, and a first keyword-content data pair is formed by the matched first downloaded keyword data and the first data content data;

the method specifically comprises the following steps: step 21, carrying out data file integrity check processing on the complete level DB data file according to a preset check code algorithm type;

the check code algorithm types not only comprise a plurality of conventional universal check code algorithm types, such as serial algorithm types of CRC-8 family, CRC-16 family, CRC-32 family and the like, but also comprise a plurality of conventional digital abstract/hash algorithm types, and in addition, the check code algorithm types also comprise a cryptographic hash algorithm SM3 algorithm type;

the method specifically comprises the following steps: step 211, polling each first data storage module of the complete level DB data file, and taking the currently polled first data storage module as a first current data storage module; extracting first storage data of the first current data storage module as first current storage data; extracting a first data check code of the first current data storage module as a first current data check code; performing check code calculation on the first current stored data by using a check code algorithm corresponding to the check code algorithm type to generate a comparison data check code; and comparing whether the data check code is matched with the first current data check code to confirm, and if the data check code is matched with the first current data check code, setting a corresponding first module check result as successful;

the data file integrity check processing of the complete level DB data file is to check the integrity of each first data storage module in the complete level DB data file one by one; when the integrity check of the modules one by one is carried out, the first stored data (first current stored data) of each module is used as original data for generating a check code, a check code calculation processing flow corresponding to a preset check code algorithm type is selected, check code calculation is carried out on the first current stored data, a check code for comparison, namely a comparison data check code, is obtained, and if the comparison data check code is the same as the first data check code (first current data check code) of the current module, the data integrity of the current module is not damaged;

step 212, if all the obtained first module verification results are successful, the data file integrity verification processing is successful;

here, if the data integrity of all modules of the complete level db data file is not destroyed, it indicates that the data integrity of the complete level db data file is not destroyed, and the processing result of the natural data file integrity check processing is successful;

step 22, if the data file integrity check processing is successful, performing data decompression processing on the complete level DB data file to generate a corresponding first decompressed data file;

the method specifically comprises the following steps: step 221, polling each first data storage module of the complete level DB data file, and taking the currently polled first data storage module as a second current data storage submodule; extracting first storage data of a second current data storage module as second current storage data; extracting a first data compression algorithm type of a second current data storage module as a current compression algorithm type; performing data decompression processing on the second current storage data by using a data decompression algorithm corresponding to the current compression algorithm type to generate corresponding first storage module decompressed data;

wherein the first data compression algorithm type comprises at least an uncompressed algorithm type, a GZIP (GNUzip) compression algorithm type, an LZO (Lempel-Ziv-Oberhumer) compression algorithm type, a Zippy compression algorithm type, a Snapppy compression algorithm type;

further, the step of performing data decompression processing on the second current storage data by using a data decompression algorithm corresponding to the current compression algorithm type to generate corresponding first storage module decompressed data specifically includes: if the first data compression algorithm type is an uncompressed algorithm type, taking the second current storage data as the corresponding first storage module decompressed data; if the first data compression algorithm type is the GZIP compression algorithm type, performing data decompression processing on the second current storage data by using a GZIP data decompression algorithm corresponding to the GZIP compression algorithm type to generate corresponding first storage module decompressed data; if the first data compression algorithm type is the LZO compression algorithm type, performing data decompression processing on second current storage data by using an LZO data decompression algorithm corresponding to the LZO compression algorithm type to generate corresponding first storage module decompressed data; if the first data compression algorithm type is the Zippy compression algorithm type, performing data decompression processing on the second current storage data by using a Zippy data decompression algorithm corresponding to the Zippy compression algorithm type to generate corresponding first storage module decompressed data; if the first data compression algorithm type is the Snapsy compression algorithm type, performing data decompression processing on the second current storage data by using a Snapsy data decompression algorithm corresponding to the Snapsy compression algorithm type to generate corresponding first storage module decompressed data;

step 222, splicing the obtained multiple first storage module decompressed data in sequence to generate a first decompressed data file;

wherein the first decompressed data file includes at least a first data area and a first index area;

a plurality of first data modules are stored in the first data area; the first data module stores a plurality of first data records; the first data record at least stores a first keyword name and first keyword content; the length of the first data record is not fixed; in a first data module, storing first data records in a sequence from small to large according to the character string sequence of the corresponding first keyword name;

here, in the embodiments of the present invention, it will be mentioned at many places that a processing mechanism of a string ratio size is adopted for the sorting of the strings, and the processing mechanism specifically includes:

step A1, if the first character string and the second character string used for comparison are completely the same, the character string ordering of the first character string is equal to the character string ordering of the second character string;

step A2, if the first character string and the second character string used for comparison are not identical, respectively performing character-by-character code system conversion on the first character string and the second character string according to an agreed character code system (for example, ASC character code system, UNICODE character code system, etc.), so as to obtain a corresponding first character string code value sequence and a corresponding second character string code value sequence, wherein the first character string code value sequence and the second character string code value sequence are both composed of a plurality of character code values; comparing the character code values at corresponding positions in the first and second character string code value sequences in sequence from the 1 st to the last 1 character code value; when the size comparison is carried out, if the character code value of the current position of the first character string code value sequence is equal to the character code value of the current position of the second character string code value sequence, continuing to switch to the next position for carrying out the size comparison of the character code value; if the character code value of the current position of the first character string code value sequence is larger than the character code value of the current position of the second character string code value sequence, the character string ordering of the first character string is larger than the character string ordering of the second character string; if the character code value of the current position of the first character string code value sequence is smaller than the character code value of the current position of the second character string code value sequence, the character string ordering of the first character string is smaller than the character string ordering of the second character string;

for example, if the first string is "a 1" and the second string is "a 1", the string ordering of the first string is equal to the string ordering of the second string; the first character string is 'A1', the second character string is 'A2', and the character string ordering of the first character string is smaller than that of the second character string; the first character string is 'A2', the second character string is 'A1', and the character string ordering of the first character string is greater than the character string ordering of the second character string;

based on the processing mechanism of the character string ratio, before a plurality of first data records are stored in a first data module, the first keyword name of each first data record is subjected to character string format conversion, then the character strings of the first keyword name are subjected to pairwise size comparison, the first data record of the first keyword name with the minimum character string ordering is used as the 1 st first data record for storage according to the relationship from small to large, the first data record of the first keyword name with the minimum character string ordering from last to last is used as the 2 nd first data record for storage, and so on, and finally the first data record of the first keyword name with the maximum character string ordering is used as the last 1 first data record for storage;

in addition, the first index area stores a plurality of first index records; the first index record at least stores a first module index identification, a first module boundary keyword name, a first module starting offset address and a first module length; the first index records correspond to the first data modules one to one; the first module index identifier of the first index record is the index identifier of the corresponding first data module in the first data area; the character string ordering of the first module boundary keyword name of the first index record is not less than the character string ordering of the first keyword name of the last first data record in the corresponding first data module; the first module starting offset address recorded by the first index is the starting storage address of the corresponding first data module in the first data area; the first module length of the first index record is the storage data length of the corresponding first data module in the first data area;

in two adjacent first index records of any module index identifier stored in the first index area, the character string ordering of the first module boundary keyword name of the former first index record is smaller than the character string ordering of the first module boundary keyword name of the latter first index record, and the character string ordering of the first module boundary keyword name of the former first index record is smaller than the character string ordering of the first keyword name of the first data module corresponding to the latter first index record;

for example, there are 2 first data modules in the first data area: the first data module 1 and the first data module 2 have 2 first index records in the corresponding first index area: a first index record 1 and a first index record 2; moreover, there are 2 first data records in the first data module 1: a first data record 11 (first keyword name "a 1", first keyword content being "content 1") and a first data record 12 (first keyword name "a 2", first keyword content being "content 2"); there are 2 first data records in the first data module 2: a first data record 21 (first keyword name "B2", first keyword content "content 3") and a first data record 22 (first keyword name "B3", first keyword content "content 4");

then, based on the principle that the string ordering of the first module boundary key name of the first index record is not less than the string ordering of the first key name of the last first data record in the corresponding first data module, and the string ordering of the first module boundary key name of the previous first index record is less than the string ordering of the first key name of the first data record of the first data module corresponding to the next first index record, the first module boundary key name of the first index record 1 should be a string ordering greater than or equal to "a 2" (the first key name of the last first data record 12 of the first data module 1) and less than "B2" (the first key name of the first data record 21 of the first data module 2 corresponding to the first index record 2), such as "a 2" "A21", "A3", or "B1", and the like;

step 23, in the first decompressed data file, querying the data content matched with each first downloaded keyword data to generate corresponding first data content data;

here, when querying the first decompressed data file, because the file may be very large, if the record matching comparison is performed on the data records of the data modules one by one in the first data area of the first decompressed data file, the querying efficiency is very low, and the processing time is also very long; therefore, in the embodiment of the invention, the first index record corresponding to the first download keyword data is found in the first index area in advance, and then the first data module corresponding to the index record is switched to carry out record matching processing, so that a large amount of redundant polling operation can be saved, the query efficiency is greatly improved, and the query time is shortened;

the method specifically comprises the following steps: step 231, in the first index area of the first decompressed data file, using the first module boundary key name of any first index record as the first end key name corresponding to the current first index record; if the last first index record of the current first index record is not empty, taking the first module boundary key name of the last first index record as a first starting key name corresponding to the current first index record; if the last first index record is empty, taking the first keyword name of the first data record in the first data module corresponding to the current first index record as the first starting keyword name;

for example, there are 2 first data modules in the first data area: the first data module 1 and the first data module 2 have 2 first index records in the corresponding first index area: a first index record 1 and a first index record 2;

there are 2 first data records in the first data module 1: a first data record 11 (first keyword name "a 1", first keyword content being "content 1") and a first data record 12 (first keyword name "a 2", first keyword content being "content 2");

there are 2 first data records in the first data module 2: a first data record 21 (first keyword name "B2", first keyword content "content 3") and a first data record 22 (first keyword name "B3", first keyword content "content 4");

a first index record 1 (the first module index is identified as 1, the first module boundary key name is "a 21", the first module start offset address is 0, and the first module length is length 1);

a first index record 2 (the first module index is identified as 2, the first module boundary key name is "B4", the first module start offset address is 0+ length 1, and the first module length is length 2);

then, when the current first index record is the first index record 1, and the last first index record of the first index record 1 is empty, the first start key name 1 is the first key name "a 1" of the first data record 11 of the first data module 1, and the first end key name 1 is the first module boundary key name "a 21" of the first index record 1;

when the current first index record is the first index record 2, if the last first index record of the first index record 1 is not empty, the first start key name 2 is the first module boundary key name "a 21" of the first index record 1, and the first end key name 2 is the first module boundary key name "B4" of the first index record 2;

step 232, polling each first download keyword data, and taking the currently polled first download keyword data as the current download keyword data; in the first index area, taking a first index record, as a current matching index record, of which the character string ordering of the first starting keyword name is less than or equal to the character string ordering of the current downloading keyword data and the character string ordering of the current downloading keyword data is less than the character string ordering of the first ending keyword name; taking a first data module corresponding to the current matching index record in the first data area as a current matching data module; taking a first data record in the current matching data module, wherein the first keyword name is matched with the current downloading keyword data, as a current matching data record; extracting first keyword content in the current matching data record as first data content data corresponding to the current downloading keyword data;

for example, if there are 2 first download key data: first download key data 1 "a 1", first download key data 2 "B2"; the first start key name 1 corresponding to the first index record 1 is "a 1", and the first end key name 1 is "a 21"; the first start key name 2 corresponding to the first index record 2 is "a 21", and the first end key name 1 is "B4";

then, when the current download key data is "a 1", because the string rank of the first start key name 1 "a 1" of the first index record 1 is the string rank of the current download key data "a 1" and the string rank of the first end key name 1 "a 21", the current matching index record corresponding to the current download key data "a 1" is the first index record 1, the current matching data module is the first data module 1, the current matching data record is the first data record 11, and the corresponding first data content data 1 is "content 1";

when the current download key data is "B2", because the string rank of the first start key name 2 "a 21" of the first index record 2 < the string rank of the current download key data "B2" < the string rank of the first end key name 2 "B4", the current matching index record corresponding to the current download key data "B2" is the first index record 2, the current matching data module is the first data module 2, the current matching data record is the first data record 21, and the corresponding first data content data 2 is "content 3";

and 24, forming a first keyword-content data pair by the first download keyword data and the corresponding first data content data.

For example, if there are 2 first download key data: first download key data 1 "a 1", first download key data 2 "B2"; the corresponding first data content data 1 is "content 1", and the first data content data 2 is "content 3";

then, the first keyword-content data pair 1 is: first download key data 1 "a 1" + first data content data 1 "content 1"; the first keyword-content data pair 2 is: the first download keyword data 2 "B2" + the first data content data 2 "content 3".

Step 3, carrying out data file reconstruction processing according to the obtained plurality of first keyword-content data pairs to generate a current level DB data file;

the cloud server performs data file reconstruction operation including data area and index reconstruction according to the obtained keyword-content pair according to the level DB data file format, similarly performs cutting processing on the complete level DB data file once, retains the required batch query data result, and deletes a large amount of redundant data;

the method specifically comprises the following steps: step 31, extracting first download keyword data from each first keyword-content data pair as a corresponding second keyword name, extracting first data content data as a corresponding second keyword content, and forming a corresponding second data record by the second keyword name and the second keyword content;

here, the data format of the second data record is similar to the data format of the first data record in the foregoing step, and further description is omitted;

step 32, sequencing all the obtained second data records according to the sequence from small to large of the character string sequencing of the corresponding second keyword names to obtain an initial data module; performing module cutting on the initial data module according to a preset data module division principle to obtain a plurality of second data modules;

the second data module comprises a plurality of second data records, and the second data records are still sorted in the second data module according to the sequence from small to large of the character string sequence of the corresponding second keyword name;

here, the data format of the second data module is similar to the data format of the first data module in the foregoing step, and further description is omitted;

the data module division principle can be various, one is that module division is carried out according to the specified record number, and the record number in each second data module does not exceed a preset specified record number threshold value; the method comprises the steps that module cutting is carried out according to the maximum data capacity, the total data length of each second data module does not exceed a preset maximum data capacity threshold, but the second data records in the second data modules are guaranteed not to be cut off, namely the second data records in the second data modules are complete data records;

step 33, performing data splicing on the obtained plurality of second data modules to generate a second data area;

step 34, in the second data area, allocating a unique module index identifier to each second data module as a corresponding second module index identifier; the initial position of each second data module in the second data area is used as a corresponding initial offset address of the second module; taking the data length of each second data module as the corresponding second module length;

step 35, distributing a corresponding second module boundary keyword name to each second data module according to a preset boundary keyword distribution mode; when the boundary keyword distribution mode is the first mode, setting the boundary keyword name of the second module according to the second keyword name of the last second data record of the current second data module; when the boundary keyword allocation mode is a second mode, allocating a dynamic character string to the current second data module as a second module boundary keyword name, wherein the character string ordering of the dynamic character string is greater than the character string ordering of the second keyword name of the last second data record of the current second data module, and the character string ordering of the dynamic character string is also less than the character string ordering of the second keyword name of the first second data record of the next second data module when the next second data module of the current second data module is not empty;

step 36, forming a corresponding second index record by the second module index identifier, the second module boundary keyword name, the second module start offset address and the second module length corresponding to each second data module; splicing all the second index records according to the sequence of the corresponding second module index identifications from small to large to obtain corresponding second index areas;

here, the data formats of the second index record and the second data area are similar to the data formats of the first index record and the first data area in the previous steps, and are not further described;

step 37, splicing the second data area and the second index area to generate a first data file to be compressed;

here, the data format of the first to-be-compressed data file is similar to the data format of the first decompressed data file in the previous step, and is not further described;

step 38, performing data module cutting on the first data file to be compressed according to a preset data file to be compressed cutting principle to generate a plurality of first cutting module data; distributing a corresponding second data compression algorithm type for each first cutting module data, and setting the second data compression algorithm type according to a preset batch data compression algorithm type; performing data compression processing on the data of each first cutting module by using a data compression algorithm corresponding to the type of the second data compression algorithm to generate corresponding second storage data; performing check code calculation on each second storage data by using a check code algorithm corresponding to a preset check code algorithm type to generate a corresponding second data check code; the corresponding second data storage module is composed of second storage data corresponding to each first cutting module data, a second data compression algorithm type and a second data check code;

here, the second data compression algorithm type, similar to the first data compression algorithm type in the previous step, also includes an uncompressed algorithm type, a GZIP compression algorithm type, an LZO compression algorithm type, a zip compression algorithm type, a Snappy compression algorithm type;

further, according to a preset data file to be compressed cutting principle, performing data module cutting on the first data file to be compressed, and generating a plurality of first cutting module data specifically includes: if the data length of the first data file to be compressed is an integral multiple of the designated data length threshold, sequentially cutting the first data file to be compressed according to the designated data length threshold to obtain a plurality of first cutting module data; if the data length of the first data file to be compressed is not the integral multiple of the specified data length threshold, performing complement operation on the first data file to be compressed until the data length of the first data file to be compressed is the integral multiple of the specified data length threshold, and then sequentially cutting the complemented first data file to be compressed according to the specified data length threshold to obtain a plurality of pieces of first cutting module data;

further, performing data compression processing on each first cutting module data by using a data compression algorithm corresponding to the second data compression algorithm type, and generating corresponding second storage data specifically includes: if the second data compression algorithm type is an uncompressed algorithm type, performing code complementing operation on each first cutting module data according to a preset data fixed length threshold value to ensure that the length of the obtained second storage data is the data fixed length threshold value; if the second data compression algorithm type is the GZIP compression algorithm type, performing data compression processing on the data of each first cutting module by using a GZIP data compression algorithm corresponding to the GZIP compression algorithm type to generate corresponding second storage data, and ensuring the length of the obtained second storage data to be a data fixed-length threshold; if the second data compression algorithm type is the LZO compression algorithm type, performing data compression processing on the data of each first cutting module by using an LZO data compression algorithm corresponding to the LZO compression algorithm type to generate corresponding second storage data, and ensuring that the length of the obtained second storage data is a data fixed-length threshold; if the second data compression algorithm type is the zipper compression algorithm type, performing data compression processing on the data of each first cutting module by using a zipper data compression algorithm corresponding to the zipper compression algorithm type to generate corresponding second storage data, and ensuring that the length of the obtained second storage data is a data fixed-length threshold; if the second data compression algorithm type is a Snapsy compression algorithm type, performing data compression processing on the data of each first cutting module by using a Snapsy data compression algorithm corresponding to the Snapsy compression algorithm type to generate corresponding second storage data, and ensuring that the length of the obtained second storage data is a data fixed-length threshold;

and 39, sequentially splicing all the obtained second data storage modules to obtain the current level DB data file.

Here, the data format of the current level db data file is similar to the data format of the complete level db data file in the previous step, and is not further described; but the data capacity of the current level db data file is much smaller than that of the complete level db data file in the previous step.

And 4, taking the current level DB data file as download data for the batch data download to perform corresponding data download processing.

Here, the cloud server takes the current level db data file as download data for the batch data download, that is, response data of the first batch data download application instruction in step 1, and pushes down the download data to the remote device that initiated the application instruction; in addition, for the work load of the cloud server, the current level DB data file can be stored in a cloud storage position as the download data of the batch data download, the URL of the cloud storage position is pushed to the remote equipment, and the remote equipment can additionally reflect the URL of the cloud storage position for downloading.

Fig. 2 is a schematic structural diagram of an electronic device according to a second embodiment of the present invention. The electronic device may be a terminal device or a server for implementing the method of the embodiment of the present invention, or may be a terminal device or a server connected to the terminal device or the server for implementing the method of the embodiment of the present invention. As shown in fig. 2, the electronic device may include: a processor 301 (e.g., a CPU), a memory 302, a transceiver 303; the transceiver 303 is coupled to the processor 301, and the processor 301 controls the transceiving operation of the transceiver 303. Various instructions may be stored in memory 302 for performing various processing functions and implementing the processing steps described in the foregoing method embodiments. Preferably, the electronic device according to an embodiment of the present invention further includes: a power supply 304, a system bus 305, and a communication port 306. The system bus 305 is used to implement communication connections between the elements. The communication port 306 is used for connection communication between the electronic device and other peripherals.

The system bus 305 mentioned in fig. 2 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The system bus may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus. The communication interface is used for realizing communication between the database access device and other equipment (such as a client, a read-write library and a read-only library). The Memory may include a Random Access Memory (RAM) and may also include a Non-Volatile Memory (Non-Volatile Memory), such as at least one disk Memory.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), a Graphics Processing Unit (GPU), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

It should be noted that the embodiment of the present invention also provides a computer-readable storage medium, which stores instructions that, when executed on a computer, cause the computer to execute the method and the processing procedure provided in the above-mentioned embodiment.

The embodiment of the present invention further provides a chip for executing the instructions, where the chip is configured to execute the processing steps described in the foregoing method embodiment.

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A processing method for downloading batch data based on a levelDB is characterized by comprising the following steps:

2. The method for processing batch data download based on level DB as claimed in claim 1,

the complete level DB data file comprises a plurality of first data storage modules with the same length; the first data storage module comprises first storage data, a first data compression algorithm type and a first data check code.

3. The method for processing batch data downloading based on a level db according to claim 2, wherein the step of querying data content matched with each first downloading keyword data from a preset complete level db data file to generate corresponding first data content data, and forming a first keyword-content data pair by the first downloading keyword data and the corresponding first data content data specifically comprises:

4. The method for processing batch data downloading based on a level DB as claimed in claim 3, wherein the step of performing data file integrity check processing on the complete level DB data file according to a preset check code algorithm type specifically comprises the steps of:

5. The method for processing batch data downloading based on a level db according to claim 3, wherein the step of performing data decompression processing on the complete level db data file to generate a corresponding first decompressed data file specifically comprises:

6. The method for processing batch data download based on level DB as claimed in claim 3,

the first decompressed data file includes at least a first data area and a first index area;

the first index records correspond to the first data modules one to one;

7. The method for processing batch data downloading based on a level db according to claim 6, wherein the querying, in the first decompressed data file, data content matched with each piece of the first download keyword data to generate corresponding first data content data specifically includes:

8. The method for processing batch data downloading based on a level db according to claim 1, wherein the step of performing data file reconstruction processing according to the obtained multiple first keyword-content data pairs to generate a current level db data file specifically comprises:

9. An electronic device, comprising: a memory, a processor, and a transceiver;

the processor is used for being coupled with the memory, reading and executing the instructions in the memory to realize the method steps of any one of claims 1 to 8;

10. A computer-readable storage medium having stored thereon computer instructions which, when executed by a computer, cause the computer to perform the method of any of claims 1-8.