CN105912877A

CN105912877A - Data processing method of medicine product

Info

Publication number: CN105912877A
Application number: CN201610313487.8A
Authority: CN
Inventors: 许驰
Original assignee: CHENGDU DINGZHIHUI SCIENCE AND TECHNOLOGY Co Ltd
Current assignee: CHENGDU DINGZHIHUI SCIENCE AND TECHNOLOGY Co Ltd
Priority date: 2016-05-12
Filing date: 2016-05-12
Publication date: 2016-08-31

Abstract

The invention provides a data processing method of a medicine product. The method comprises the following steps: storing multi-dimension data in a cloud storage system through distributed data indexes; managing the copies according to the retrieving frequency of data blocks. With the adoption of the method, the problem of retrieving of the multi-dimension data can be solved; the delay of a system responding to a user request can be obviously reduced, and thus the retrieving experience of a user can be improved.

Description

Medical product data processing method

Technical field

The present invention relates to big data calculate, particularly to a kind of medical product data processing method.

Background technology

Cloud storage have employed the technology such as cloud computing, distributed file system and server cluster, by network Various storage resources are aggregating, and common externally provide data storage and Operational Visit function, medical scientific research, Produce and auto service field extensive application.But, when cloud computing provider carries to field of medicaments client During for service, the availability of system resource and utilization rate become affects field of medicaments customer experience and self benefit Key index, how huge resource is effectively managed and has become cloud computing provider and have to examine One of problem considered.Retrieval process is the core technology of data management in cloud platform, and retrieval performance directly affects User uses the service quality of cloud platform.Owing to existing data directory and method for organizing implement complexity, Index maintenance cost is high, and when the dimension especially retrieved is higher, the retrieval of user is experienced and drastically declined.Need The biggest overhead, has a strong impact on performance and the handling capacity of system.

Summary of the invention

For solving the problem existing for above-mentioned prior art, the present invention proposes a kind of medical product data and processes Method, including:

Distributed data index is used to store the data of multiple dimensions in cloud storage system, and based on data block Retrieval frequency manage copy.

Preferably, described employing distributed data index stores the data of multiple dimensions, farther includes:

Each back end is used to manage the mode of oneself local data index independently, in each back end, Bit sequence value corresponding to tuple on this node is set up B-tree；Each key in the leaf node of tree is right Answer a bit sequence value and a chained list pointing to this bit sequence value；When new tuple is inserted into, first basis The back end that this tuple should be inserted searched by the major key of tuple, and calculates new tuple institute in this back end Corresponding relative position；Whether the property value then judging this tuple is to occur first also on this back end And more new node；If occurring first, then bit sequence value corresponding for new tuple is inserted in B-tree index； If not occurring first, then only need to find this bit sequence value in the leaf node of B-tree, update it and point to Chained list, this tuple is joined in chained list；When to delete certain tuple, first according to the major key of tuple Value finds back end that this tuple should exist and the relative position in this back end, then foundation The bit sequence value corresponding to value of the indexed attribute in this tuple is retrieved and updates the local rope of this back end Draw B-tree；When tuple to be retrieved, first search condition is converted to condition bit sequence, described bar by host node Part bit sequence covers all probabilities that search condition is comprised, and then condition bit sequential parallel is sent out by host node Deliver on all of back end, then retrieve on back end, it is judged that whether this back end may There is qualified tuple, if it does not exist, then return empty set, otherwise continue to search for B-tree index, from After leaf node finds qualified tuple bit sequence value, then travel through the chained list of leaf node；Host node The union of all back end result of calculation is returned to user.

Preferably, described retrieval frequency based on data block manages copy, farther includes:

Using namenode as central server, all of read-write requests of client must first pass through namenode, By increasing a monitor at namenode, carry out log file access times；Retrieval frequency is defined as In one cycle retrieval frequency and current period, file is write frequency and is deducted the weighted mean reading frequency, name byte Monitor in point often just carries out the meter of primary retrieval frequency through a cycle to the All Files in system Calculate；

Use FH_mRepresent the frequency of m-th week after date, use R_kThe read operation of file in the expression k-th cycle Frequency, uses W_kIn the expression k-th cycle, the write operation frequency of file, represents the cycle with T, then at m Individual all after dates, the frequency representation of file is:

FH_m=α FH_m-1+β(R_m-W_m)

Wherein, α, β be all higher than 0 and and be 1, represent respectively cycle retrieval frequency affect weights and In this cycle, read-write frequency affects weights；According to the difference of file access frequency, frequency chained list is divided into height Frequency access frequency chained list, medium frequency access frequency chained list, low frequency access frequency chained list；To different levels File access queue use the different cycles add up；

First, represent the frequency of copy with FC, represent the quantity of copy in current system with count；Copy Frequency representation be:

FC=FH/count

Copy is set and increases threshold value and copy minimizing threshold value；When the value of FC increases threshold value more than copy, prison Visual organ notice namenode increases the quantity of respective copies；The request that namenode response monitor is sent, opens Dynamic copy amount increases order, selects optimum back end to carry out stored copies according to copy Distribution Strategy, when After on back end, copy replication has worked, namenode updates copy amount；When the value of FC is less than pair During this minimizing threshold value, monitor notice namenode reduces copy amount, and namenode response monitor is sent Request, start copy amount and reduce order, delete on optimum back end according to copy Distribution Strategy Copy, after on back end, copy deletion work completes, namenode updates copy amount.

The present invention compared to existing technology, has the advantage that

The present invention proposes a kind of medical product data processing method, it is possible to solve the search problem of multidimensional data, Significantly reducing the time delay of system response user's request, the retrieval improving user is experienced.

Accompanying drawing explanation

Fig. 1 is the flow chart of medical product data processing method according to embodiments of the present invention.

Detailed description of the invention

Hereafter provide one or more embodiment of the present invention together with the accompanying drawing of the diagram principle of the invention is detailed Thin description.Describe the present invention in conjunction with such embodiment, but the invention is not restricted to any embodiment.This Bright scope is limited only by the appended claims, and the present invention contains many replacements, amendment and equivalent.? Middle elaboration many detail is described below to provide thorough understanding of the present invention.For exemplary purposes And these details are provided, and can also be according to right without some in these details or all details Claim realizes the present invention.

An aspect of of the present present invention provides a kind of medical product data processing method.Fig. 1 is real according to the present invention Execute the medical product data processing method flow chart of example.The present invention is directed to the range retrieval employing point of multidimensional data Cloth data index method, and retrieval frequency based on data block and use self adaptation copy adjustable strategies.

Tuple in the present invention, attribute concept identical, i.e. with the concept of tuple in relevant database, attribute Every record in table is exactly a tuple, and each column is exactly an attribute.In order to realize the high concurrent of retrieval, The present invention uses each back end to manage the mode of oneself local data index independently, at each back end In, the bit sequence value corresponding to the tuple on this node is set up B-tree.Each key in the leaf node of tree An all corresponding bit sequence value and a chained list pointing to this bit sequence value.When new tuple is inserted into, first The back end that this tuple should be inserted searched by major key according to tuple, and calculates new tuple at this back end Relative position corresponding in.Whether the property value then judging this tuple is to go out first on this back end Show and more new node.If occurring first, then bit sequence value corresponding for new tuple is inserted into B-tree rope In drawing. if not occurring first, then only need to find this bit sequence value in the leaf node of B-tree, update it The chained list pointed to, joins this tuple in chained list and goes.When to delete certain tuple, first basis The Major key of tuple finds back end that this tuple should exist and the phase para-position in this back end thereof Put, then retrieve according to the bit sequence value corresponding to the value of the indexed attribute in this tuple and update this data The partial indexes B-tree of node.

When tuple to be retrieved, first search condition is converted to condition bit sequence by host node.The condition generated Bit sequence should cover all probabilities that search condition is comprised.Then host node is by condition bit sequential parallel Be sent on all of back end, then retrieve on back end, it is judged that in this back end whether There may be qualified tuple, if it does not exist, then return empty set, otherwise continue to search for B-tree index, After finding qualified tuple bit sequence value from leaf node, then travel through the chained list of leaf node.Main joint Point returns the union of all back end result of calculation to user.

In copy adjustable strategies based on file access frequency, mainly include how that carrying out file read-write accesses frequency The calculating of rate: different grades of file access frequency chained list；Determine the upper and lower bound of copy amount；And Set copy and increase threshold value and copy minimizing threshold value.

Using namenode as central server, all of read-write requests of client must first pass through namenode, By increasing a monitor at namenode, carry out log file access times.

Retrieval frequency was defined as file in retrieval frequency of upper cycle and current period write frequency and deduct reading frequency The weighted mean of rate.Monitor in namenode often through a cycle just to the All Files in system Carry out the calculating of primary retrieval frequency.

Present invention FH_mRepresent the frequency of m-th week after date.Use R_kFile in the expression k-th cycle Read operation frequency, uses W_kIn the expression k-th cycle, the write operation frequency of file, represents the cycle with T.Then At m-th week after date, the frequency representation of file is:

FH_m=α FH_m-1+β(R_m-W_m)

Wherein, α, β be all higher than 0 and and be 1, represent respectively cycle retrieval frequency affect weights and In this cycle, read-write frequency affects weights.

According to the difference of file access frequency, frequency chained list is divided into three kinds of different levels by the present invention: high frequency Rate access frequency chained list, medium frequency access frequency chained list, low frequency access frequency chained list.The present invention is to difference The file access queue of level uses the different cycles to add up, for the queue of altofrequency access frequency, The present invention uses the shorter cycle, and for the queue of low frequency access frequency, the present invention uses the longer cycle.

First, represent the frequency of copy with FC, represent the quantity of copy in current system with count.So, The frequency of copy can be expressed as:

FC=FH/count

Two threshold values are here set: copy increases threshold value and copy reduces threshold value.When the value of FC is more than When the copy set increases threshold value, the quantity needing to increase copy is described, responds substantial amounts of user request. Monitor needs to notify the quantity that namenode increases respective copies.What namenode response monitor was sent please Ask, start copy amount and increase order, select optimum back end to store pair according to copy Distribution Strategy This, after on back end, copy replication has worked, namenode updates copy amount.Value as FC When reducing threshold value less than the copy being set, illustrate that this time can reduce the quantity of unnecessary copy, Improve the utilization rate of system.In this time, monitor needs to notify that namenode reduces copy amount.Name The request that byte point response monitor is sent, starts copy amount and reduces order, delete according to copy Distribution Strategy Except the copy on optimum back end, after on back end, copy deletion work completes, namenode Update copy amount.

The present invention, according further to the reliability requirement of file, determines the lower limit MIN of copy amount, according to system The expense that system allows determines upper limit MAX of copy amount.Represent reliability requirement with R, represent data with L Node failure rate, represents environmental bug rate with EL.So, it is necessary to meet:

(1-L^MIN)*(1-EL)>R

Represent, with S, the expense that system allows, represent copy size with V, represent renewal frequency with F, then It must is fulfilled for:

V*MAX*F<S

In sum, the present invention proposes a kind of medical product data processing method, it is possible to solve multidimensional data Search problem, hence it is evident that reduce system response user request time delay, improve user retrieval experience.

Obviously, it should be appreciated by those skilled in the art, each module or each step of the above-mentioned present invention are permissible Realizing by general calculating system, they can concentrate in single calculating system, or is distributed in many On the network that individual calculating system is formed, alternatively, they can use the executable program code of calculating system Realize, it is thus possible to be stored in storage system being performed by calculating system.So, this Bright be not restricted to any specific hardware and software combine.

It should be appreciated that the above-mentioned detailed description of the invention of the present invention is used only for exemplary illustration or explains this The principle of invention, and be not construed as limiting the invention.Therefore, without departing from the spirit and scope of the present invention In the case of any modification, equivalent substitution and improvement etc. done, should be included in protection scope of the present invention Within.Additionally, claims of the present invention be intended to fall into scope and border or Whole in the equivalents on this scope of person and border change and modifications example.

Claims

1. a medical product data processing method, it is characterised in that including:

Method the most according to claim 1, it is characterised in that described employing distributed data index is deposited Store up the data of multiple dimension, farther include:

Method the most according to claim 2, it is characterised in that described retrieval frequency based on data block is come Management copy, farther includes:

FH_m=α FH_m-1+β(R_m-W_m)

FC=FH/count

Copy is set and increases threshold value and copy minimizing threshold value；When the value of FC increases threshold value more than copy, monitor Device notice namenode increases the quantity of respective copies；The request that namenode response monitor is sent, starts Copy amount increases order, selects optimum back end to carry out stored copies according to copy Distribution Strategy, works as number After on node, copy replication has worked, namenode updates copy amount；When the value of FC is less than copy When reducing threshold value, monitor notice namenode reduces copy amount, and namenode response monitor is sent Request, starts copy amount and reduces order, delete the pair on optimum back end according to copy Distribution Strategy This, after on back end, copy deletion work completes, namenode updates copy amount.