CN105912877A - Data processing method of medicine product - Google Patents
Data processing method of medicine product Download PDFInfo
- Publication number
- CN105912877A CN105912877A CN201610313487.8A CN201610313487A CN105912877A CN 105912877 A CN105912877 A CN 105912877A CN 201610313487 A CN201610313487 A CN 201610313487A CN 105912877 A CN105912877 A CN 105912877A
- Authority
- CN
- China
- Prior art keywords
- frequency
- copy
- tuple
- back end
- namenode
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/13—File access structures, e.g. distributed indices
- G06F16/134—Distributed indices
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a data processing method of a medicine product. The method comprises the following steps: storing multi-dimension data in a cloud storage system through distributed data indexes; managing the copies according to the retrieving frequency of data blocks. With the adoption of the method, the problem of retrieving of the multi-dimension data can be solved; the delay of a system responding to a user request can be obviously reduced, and thus the retrieving experience of a user can be improved.
Description
Technical field
The present invention relates to big data calculate, particularly to a kind of medical product data processing method.
Background technology
Cloud storage have employed the technology such as cloud computing, distributed file system and server cluster, by network
Various storage resources are aggregating, and common externally provide data storage and Operational Visit function, medical scientific research,
Produce and auto service field extensive application.But, when cloud computing provider carries to field of medicaments client
During for service, the availability of system resource and utilization rate become affects field of medicaments customer experience and self benefit
Key index, how huge resource is effectively managed and has become cloud computing provider and have to examine
One of problem considered.Retrieval process is the core technology of data management in cloud platform, and retrieval performance directly affects
User uses the service quality of cloud platform.Owing to existing data directory and method for organizing implement complexity,
Index maintenance cost is high, and when the dimension especially retrieved is higher, the retrieval of user is experienced and drastically declined.Need
The biggest overhead, has a strong impact on performance and the handling capacity of system.
Summary of the invention
For solving the problem existing for above-mentioned prior art, the present invention proposes a kind of medical product data and processes
Method, including:
Distributed data index is used to store the data of multiple dimensions in cloud storage system, and based on data block
Retrieval frequency manage copy.
Preferably, described employing distributed data index stores the data of multiple dimensions, farther includes:
Each back end is used to manage the mode of oneself local data index independently, in each back end,
Bit sequence value corresponding to tuple on this node is set up B-tree;Each key in the leaf node of tree is right
Answer a bit sequence value and a chained list pointing to this bit sequence value;When new tuple is inserted into, first basis
The back end that this tuple should be inserted searched by the major key of tuple, and calculates new tuple institute in this back end
Corresponding relative position;Whether the property value then judging this tuple is to occur first also on this back end
And more new node;If occurring first, then bit sequence value corresponding for new tuple is inserted in B-tree index;
If not occurring first, then only need to find this bit sequence value in the leaf node of B-tree, update it and point to
Chained list, this tuple is joined in chained list;When to delete certain tuple, first according to the major key of tuple
Value finds back end that this tuple should exist and the relative position in this back end, then foundation
The bit sequence value corresponding to value of the indexed attribute in this tuple is retrieved and updates the local rope of this back end
Draw B-tree;When tuple to be retrieved, first search condition is converted to condition bit sequence, described bar by host node
Part bit sequence covers all probabilities that search condition is comprised, and then condition bit sequential parallel is sent out by host node
Deliver on all of back end, then retrieve on back end, it is judged that whether this back end may
There is qualified tuple, if it does not exist, then return empty set, otherwise continue to search for B-tree index, from
After leaf node finds qualified tuple bit sequence value, then travel through the chained list of leaf node;Host node
The union of all back end result of calculation is returned to user.
Preferably, described retrieval frequency based on data block manages copy, farther includes:
Using namenode as central server, all of read-write requests of client must first pass through namenode,
By increasing a monitor at namenode, carry out log file access times;Retrieval frequency is defined as
In one cycle retrieval frequency and current period, file is write frequency and is deducted the weighted mean reading frequency, name byte
Monitor in point often just carries out the meter of primary retrieval frequency through a cycle to the All Files in system
Calculate;
Use FHmRepresent the frequency of m-th week after date, use RkThe read operation of file in the expression k-th cycle
Frequency, uses WkIn the expression k-th cycle, the write operation frequency of file, represents the cycle with T, then at m
Individual all after dates, the frequency representation of file is:
FHm=α FHm-1+β(Rm-Wm)
Wherein, α, β be all higher than 0 and and be 1, represent respectively cycle retrieval frequency affect weights and
In this cycle, read-write frequency affects weights;According to the difference of file access frequency, frequency chained list is divided into height
Frequency access frequency chained list, medium frequency access frequency chained list, low frequency access frequency chained list;To different levels
File access queue use the different cycles add up;
First, represent the frequency of copy with FC, represent the quantity of copy in current system with count;Copy
Frequency representation be:
FC=FH/count
Copy is set and increases threshold value and copy minimizing threshold value;When the value of FC increases threshold value more than copy, prison
Visual organ notice namenode increases the quantity of respective copies;The request that namenode response monitor is sent, opens
Dynamic copy amount increases order, selects optimum back end to carry out stored copies according to copy Distribution Strategy, when
After on back end, copy replication has worked, namenode updates copy amount;When the value of FC is less than pair
During this minimizing threshold value, monitor notice namenode reduces copy amount, and namenode response monitor is sent
Request, start copy amount and reduce order, delete on optimum back end according to copy Distribution Strategy
Copy, after on back end, copy deletion work completes, namenode updates copy amount.
The present invention compared to existing technology, has the advantage that
The present invention proposes a kind of medical product data processing method, it is possible to solve the search problem of multidimensional data,
Significantly reducing the time delay of system response user's request, the retrieval improving user is experienced.
Accompanying drawing explanation
Fig. 1 is the flow chart of medical product data processing method according to embodiments of the present invention.
Detailed description of the invention
Hereafter provide one or more embodiment of the present invention together with the accompanying drawing of the diagram principle of the invention is detailed
Thin description.Describe the present invention in conjunction with such embodiment, but the invention is not restricted to any embodiment.This
Bright scope is limited only by the appended claims, and the present invention contains many replacements, amendment and equivalent.?
Middle elaboration many detail is described below to provide thorough understanding of the present invention.For exemplary purposes
And these details are provided, and can also be according to right without some in these details or all details
Claim realizes the present invention.
An aspect of of the present present invention provides a kind of medical product data processing method.Fig. 1 is real according to the present invention
Execute the medical product data processing method flow chart of example.The present invention is directed to the range retrieval employing point of multidimensional data
Cloth data index method, and retrieval frequency based on data block and use self adaptation copy adjustable strategies.
Tuple in the present invention, attribute concept identical, i.e. with the concept of tuple in relevant database, attribute
Every record in table is exactly a tuple, and each column is exactly an attribute.In order to realize the high concurrent of retrieval,
The present invention uses each back end to manage the mode of oneself local data index independently, at each back end
In, the bit sequence value corresponding to the tuple on this node is set up B-tree.Each key in the leaf node of tree
An all corresponding bit sequence value and a chained list pointing to this bit sequence value.When new tuple is inserted into, first
The back end that this tuple should be inserted searched by major key according to tuple, and calculates new tuple at this back end
Relative position corresponding in.Whether the property value then judging this tuple is to go out first on this back end
Show and more new node.If occurring first, then bit sequence value corresponding for new tuple is inserted into B-tree rope
In drawing. if not occurring first, then only need to find this bit sequence value in the leaf node of B-tree, update it
The chained list pointed to, joins this tuple in chained list and goes.When to delete certain tuple, first basis
The Major key of tuple finds back end that this tuple should exist and the phase para-position in this back end thereof
Put, then retrieve according to the bit sequence value corresponding to the value of the indexed attribute in this tuple and update this data
The partial indexes B-tree of node.
When tuple to be retrieved, first search condition is converted to condition bit sequence by host node.The condition generated
Bit sequence should cover all probabilities that search condition is comprised.Then host node is by condition bit sequential parallel
Be sent on all of back end, then retrieve on back end, it is judged that in this back end whether
There may be qualified tuple, if it does not exist, then return empty set, otherwise continue to search for B-tree index,
After finding qualified tuple bit sequence value from leaf node, then travel through the chained list of leaf node.Main joint
Point returns the union of all back end result of calculation to user.
In copy adjustable strategies based on file access frequency, mainly include how that carrying out file read-write accesses frequency
The calculating of rate: different grades of file access frequency chained list;Determine the upper and lower bound of copy amount;And
Set copy and increase threshold value and copy minimizing threshold value.
Using namenode as central server, all of read-write requests of client must first pass through namenode,
By increasing a monitor at namenode, carry out log file access times.
Retrieval frequency was defined as file in retrieval frequency of upper cycle and current period write frequency and deduct reading frequency
The weighted mean of rate.Monitor in namenode often through a cycle just to the All Files in system
Carry out the calculating of primary retrieval frequency.
Present invention FHmRepresent the frequency of m-th week after date.Use RkFile in the expression k-th cycle
Read operation frequency, uses WkIn the expression k-th cycle, the write operation frequency of file, represents the cycle with T.Then
At m-th week after date, the frequency representation of file is:
FHm=α FHm-1+β(Rm-Wm)
Wherein, α, β be all higher than 0 and and be 1, represent respectively cycle retrieval frequency affect weights and
In this cycle, read-write frequency affects weights.
According to the difference of file access frequency, frequency chained list is divided into three kinds of different levels by the present invention: high frequency
Rate access frequency chained list, medium frequency access frequency chained list, low frequency access frequency chained list.The present invention is to difference
The file access queue of level uses the different cycles to add up, for the queue of altofrequency access frequency,
The present invention uses the shorter cycle, and for the queue of low frequency access frequency, the present invention uses the longer cycle.
First, represent the frequency of copy with FC, represent the quantity of copy in current system with count.So,
The frequency of copy can be expressed as:
FC=FH/count
Two threshold values are here set: copy increases threshold value and copy reduces threshold value.When the value of FC is more than
When the copy set increases threshold value, the quantity needing to increase copy is described, responds substantial amounts of user request.
Monitor needs to notify the quantity that namenode increases respective copies.What namenode response monitor was sent please
Ask, start copy amount and increase order, select optimum back end to store pair according to copy Distribution Strategy
This, after on back end, copy replication has worked, namenode updates copy amount.Value as FC
When reducing threshold value less than the copy being set, illustrate that this time can reduce the quantity of unnecessary copy,
Improve the utilization rate of system.In this time, monitor needs to notify that namenode reduces copy amount.Name
The request that byte point response monitor is sent, starts copy amount and reduces order, delete according to copy Distribution Strategy
Except the copy on optimum back end, after on back end, copy deletion work completes, namenode
Update copy amount.
The present invention, according further to the reliability requirement of file, determines the lower limit MIN of copy amount, according to system
The expense that system allows determines upper limit MAX of copy amount.Represent reliability requirement with R, represent data with L
Node failure rate, represents environmental bug rate with EL.So, it is necessary to meet:
(1-LMIN)*(1-EL)>R
Represent, with S, the expense that system allows, represent copy size with V, represent renewal frequency with F, then
It must is fulfilled for:
V*MAX*F<S
In sum, the present invention proposes a kind of medical product data processing method, it is possible to solve multidimensional data
Search problem, hence it is evident that reduce system response user request time delay, improve user retrieval experience.
Obviously, it should be appreciated by those skilled in the art, each module or each step of the above-mentioned present invention are permissible
Realizing by general calculating system, they can concentrate in single calculating system, or is distributed in many
On the network that individual calculating system is formed, alternatively, they can use the executable program code of calculating system
Realize, it is thus possible to be stored in storage system being performed by calculating system.So, this
Bright be not restricted to any specific hardware and software combine.
It should be appreciated that the above-mentioned detailed description of the invention of the present invention is used only for exemplary illustration or explains this
The principle of invention, and be not construed as limiting the invention.Therefore, without departing from the spirit and scope of the present invention
In the case of any modification, equivalent substitution and improvement etc. done, should be included in protection scope of the present invention
Within.Additionally, claims of the present invention be intended to fall into scope and border or
Whole in the equivalents on this scope of person and border change and modifications example.
Claims (3)
1. a medical product data processing method, it is characterised in that including:
Distributed data index is used to store the data of multiple dimensions in cloud storage system, and based on data block
Retrieval frequency manage copy.
Method the most according to claim 1, it is characterised in that described employing distributed data index is deposited
Store up the data of multiple dimension, farther include:
Each back end is used to manage the mode of oneself local data index independently, in each back end,
Bit sequence value corresponding to tuple on this node is set up B-tree;Each key in the leaf node of tree is right
Answer a bit sequence value and a chained list pointing to this bit sequence value;When new tuple is inserted into, first basis
The back end that this tuple should be inserted searched by the major key of tuple, and calculates new tuple institute in this back end
Corresponding relative position;Whether the property value then judging this tuple is to occur first also on this back end
And more new node;If occurring first, then bit sequence value corresponding for new tuple is inserted in B-tree index;
If not occurring first, then only need to find this bit sequence value in the leaf node of B-tree, update it and point to
Chained list, this tuple is joined in chained list;When to delete certain tuple, first according to the major key of tuple
Value finds back end that this tuple should exist and the relative position in this back end, then foundation
The bit sequence value corresponding to value of the indexed attribute in this tuple is retrieved and updates the local rope of this back end
Draw B-tree;When tuple to be retrieved, first search condition is converted to condition bit sequence, described bar by host node
Part bit sequence covers all probabilities that search condition is comprised, and then condition bit sequential parallel is sent out by host node
Deliver on all of back end, then retrieve on back end, it is judged that whether this back end may
There is qualified tuple, if it does not exist, then return empty set, otherwise continue to search for B-tree index, from
After leaf node finds qualified tuple bit sequence value, then travel through the chained list of leaf node;Host node
The union of all back end result of calculation is returned to user.
Method the most according to claim 2, it is characterised in that described retrieval frequency based on data block is come
Management copy, farther includes:
Using namenode as central server, all of read-write requests of client must first pass through namenode,
By increasing a monitor at namenode, carry out log file access times;Retrieval frequency is defined as
In one cycle retrieval frequency and current period, file is write frequency and is deducted the weighted mean reading frequency, name byte
Monitor in point often just carries out the meter of primary retrieval frequency through a cycle to the All Files in system
Calculate;
Use FHmRepresent the frequency of m-th week after date, use RkThe read operation of file in the expression k-th cycle
Frequency, uses WkIn the expression k-th cycle, the write operation frequency of file, represents the cycle with T, then at m
Individual all after dates, the frequency representation of file is:
FHm=α FHm-1+β(Rm-Wm)
Wherein, α, β be all higher than 0 and and be 1, represent respectively cycle retrieval frequency affect weights and
In this cycle, read-write frequency affects weights;According to the difference of file access frequency, frequency chained list is divided into height
Frequency access frequency chained list, medium frequency access frequency chained list, low frequency access frequency chained list;To different levels
File access queue use the different cycles add up;
First, represent the frequency of copy with FC, represent the quantity of copy in current system with count;Copy
Frequency representation be:
FC=FH/count
Copy is set and increases threshold value and copy minimizing threshold value;When the value of FC increases threshold value more than copy, monitor
Device notice namenode increases the quantity of respective copies;The request that namenode response monitor is sent, starts
Copy amount increases order, selects optimum back end to carry out stored copies according to copy Distribution Strategy, works as number
After on node, copy replication has worked, namenode updates copy amount;When the value of FC is less than copy
When reducing threshold value, monitor notice namenode reduces copy amount, and namenode response monitor is sent
Request, starts copy amount and reduces order, delete the pair on optimum back end according to copy Distribution Strategy
This, after on back end, copy deletion work completes, namenode updates copy amount.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610313487.8A CN105912877A (en) | 2016-05-12 | 2016-05-12 | Data processing method of medicine product |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610313487.8A CN105912877A (en) | 2016-05-12 | 2016-05-12 | Data processing method of medicine product |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN105912877A true CN105912877A (en) | 2016-08-31 |
Family
ID=56748142
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201610313487.8A Pending CN105912877A (en) | 2016-05-12 | 2016-05-12 | Data processing method of medicine product |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN105912877A (en) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108924215A (en) * | 2018-06-28 | 2018-11-30 | 北京顺丰同城科技有限公司 | A kind of service discovery processing method and processing device based on tree structure |
| CN113990436A (en) * | 2021-12-27 | 2022-01-28 | 西藏自治区人民政府驻成都办事处医院 | Method and system for rapidly judging medication rationality based on matrix check |
| CN118535652A (en) * | 2024-07-25 | 2024-08-23 | 卓世智星(青田)元宇宙科技有限公司 | Big data storage method and system |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1216841A (en) * | 1997-10-31 | 1999-05-19 | 国际商业机器公司 | Multidimensional data clustering and dimensionality reduction for indexing and retrieval |
| CN1632792A (en) * | 2004-12-29 | 2005-06-29 | 复旦大学 | An Efficient Path Indexing Method Based on XML Data |
| CN101187931A (en) * | 2007-12-12 | 2008-05-28 | 浙江大学 | Management Method of Multiple File Copy in Distributed File System |
| CN102591970A (en) * | 2011-12-31 | 2012-07-18 | 北京奇虎科技有限公司 | Distributed key-value query method and query engine system |
| CN103198134A (en) * | 2013-04-12 | 2013-07-10 | 同方光盘股份有限公司 | Visual navigation method for academic literature |
| CN105205172A (en) * | 2015-10-14 | 2015-12-30 | 成都鼎智汇科技有限公司 | Database retrieval method |
-
2016
- 2016-05-12 CN CN201610313487.8A patent/CN105912877A/en active Pending
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1216841A (en) * | 1997-10-31 | 1999-05-19 | 国际商业机器公司 | Multidimensional data clustering and dimensionality reduction for indexing and retrieval |
| CN1632792A (en) * | 2004-12-29 | 2005-06-29 | 复旦大学 | An Efficient Path Indexing Method Based on XML Data |
| CN101187931A (en) * | 2007-12-12 | 2008-05-28 | 浙江大学 | Management Method of Multiple File Copy in Distributed File System |
| CN102591970A (en) * | 2011-12-31 | 2012-07-18 | 北京奇虎科技有限公司 | Distributed key-value query method and query engine system |
| CN103198134A (en) * | 2013-04-12 | 2013-07-10 | 同方光盘股份有限公司 | Visual navigation method for academic literature |
| CN105205172A (en) * | 2015-10-14 | 2015-12-30 | 成都鼎智汇科技有限公司 | Database retrieval method |
Non-Patent Citations (2)
| Title |
|---|
| 刘法明 等: ""一种适用于多维数据范围查询的辅助索引机制"", 《广西大学学报:自然科学版》 * |
| 刘法明: ""面向大数据集查询的索引与数据组织优化研究"", 《中国优秀硕士论文全文数据库 信息科技辑》 * |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108924215A (en) * | 2018-06-28 | 2018-11-30 | 北京顺丰同城科技有限公司 | A kind of service discovery processing method and processing device based on tree structure |
| CN108924215B (en) * | 2018-06-28 | 2021-03-19 | 北京顺丰同城科技有限公司 | Service discovery processing method and device based on tree structure |
| CN113990436A (en) * | 2021-12-27 | 2022-01-28 | 西藏自治区人民政府驻成都办事处医院 | Method and system for rapidly judging medication rationality based on matrix check |
| CN113990436B (en) * | 2021-12-27 | 2022-03-01 | 西藏自治区人民政府驻成都办事处医院 | Method and system for rapidly judging medication rationality based on matrix check |
| CN118535652A (en) * | 2024-07-25 | 2024-08-23 | 卓世智星(青田)元宇宙科技有限公司 | Big data storage method and system |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11176128B2 (en) | Multiple access path selection by machine learning | |
| US8543596B1 (en) | Assigning blocks of a file of a distributed file system to processing units of a parallel database management system | |
| JP6697392B2 (en) | Transparent discovery of semi-structured data schema | |
| US8200705B2 (en) | Method and apparatus for applying database partitioning in a multi-tenancy scenario | |
| US9916353B2 (en) | Generating multiple query access plans for multiple computing environments | |
| US9672241B2 (en) | Representing an outlier value in a non-nullable column as null in metadata | |
| US10824614B2 (en) | Custom query parameters in a database system | |
| US10970300B2 (en) | Supporting multi-tenancy in a federated data management system | |
| Binani et al. | SQL vs. NoSQL vs. NewSQL-a comparative study | |
| US20180060341A1 (en) | Querying Data Records Stored On A Distributed File System | |
| US20160283538A1 (en) | Fast multi-tier indexing supporting dynamic update | |
| US20160350302A1 (en) | Dynamically splitting a range of a node in a distributed hash table | |
| CN104268295B (en) | A kind of data query method and device | |
| US10255307B2 (en) | Database object management for a shared pool of configurable computing resources | |
| US10108664B2 (en) | Generating multiple query access plans for multiple computing environments | |
| CN106940715B (en) | A kind of method and apparatus of the inquiry based on concordance list | |
| CN108509437A (en) | A kind of ElasticSearch inquiries accelerated method | |
| WO2017156855A1 (en) | Database systems with re-ordered replicas and methods of accessing and backing up databases | |
| WO2021016050A1 (en) | Multi-record index structure for key-value stores | |
| US10133805B2 (en) | System and method for analyzing sequential data access efficiency | |
| CN105912877A (en) | Data processing method of medicine product | |
| US12141032B2 (en) | Data replication with cross replication group references | |
| US11841857B2 (en) | Query efficiency using merged columns | |
| US20230153300A1 (en) | Building cross table index in relational database | |
| Ghandour et al. | User-based Load Balancer in HBase. |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication | ||
| RJ01 | Rejection of invention patent application after publication |
Application publication date: 20160831 |