CN106933622A

CN106933622A - The Hadoop dispositions methods of model-driven in cloud environment

Info

Publication number: CN106933622A
Application number: CN201710094086.2A
Authority: CN
Inventors: 武永卫; 陈康; 郑纬民; 陈哲毅
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2017-02-21
Filing date: 2017-02-21
Publication date: 2017-07-07

Abstract

The invention discloses a kind of Hadoop dispositions methods of model-driven in cloud environment, including：The Hadoop dispositions methods of model-driven in a kind of cloud environment；Model conversion between the Hadoop demand models and the Hadoop deployment models is realized according to default transformation rule；The information change situation in the Hadoop demand models and the Hadoop deployment models is monitored using synchronization engine, and information in the Hadoop demand models and/or the Hadoop deployment models carries out synchronizing information when changing.The invention has the advantages that：Diversified software and hardware resources in cloud environment can be managed and deployment.

Description

Model-driven Hadoop deployment method in cloud environment

技术领域technical field

本发明涉及软件工程领域，特别涉及一种云环境中模型驱动的Hadoop部署方法。The invention relates to the field of software engineering, in particular to a model-driven Hadoop deployment method in a cloud environment.

背景技术Background technique

当今社会，每天有大量的数据流通量生成，且全球90％的数据是在过去两年内产生的，海量数据处理技术已广泛地应用到社会生产的各个领域，这也意味着大数据时代的真正到来。In today's society, a large amount of data traffic is generated every day, and 90% of the world's data is generated in the past two years. Massive data processing technology has been widely used in various fields of social production, which also means that the real big data era arrival.

Hadoop作为一种大数据分布式处理的开源软件框架，它能够以可靠、高效、可扩展的方式处理海量数据。此外，随着Hadoop生态系统的快速发展及其大量子项目开发的相继完工，其已被广泛地应用于各种场景下大数据的处理和存储。如今，Hadoop已经成为大数据处理最重要的软件工具之一。随着Hadoop越来越广泛地部署在云中，管理员需要根据具体需求，以不同的方式对Hadoop进行部署和配置，因此给Hadoop部署带来了两个方面的挑战：As an open source software framework for distributed processing of big data, Hadoop can process massive amounts of data in a reliable, efficient, and scalable manner. In addition, with the rapid development of the Hadoop ecosystem and the successive completion of a large number of sub-projects, it has been widely used in the processing and storage of big data in various scenarios. Today, Hadoop has become one of the most important software tools for big data processing. As Hadoop is more and more widely deployed in the cloud, administrators need to deploy and configure Hadoop in different ways according to specific needs, which brings two challenges to Hadoop deployment:

(1)硬件资源的多样性：Hadoop集群可能部署在不同类型的基础设施上，包括物理服务器、虚拟机和Docker容器等，这种异构性给集群节点的管理带来了难度和复杂度。(1) Diversity of hardware resources: Hadoop clusters may be deployed on different types of infrastructure, including physical servers, virtual machines, and Docker containers. This heterogeneity brings difficulty and complexity to the management of cluster nodes.

(2)软件资源的多样性：Hadoop生态系统包含多种不同类型的计算和存储框架，例如，HDFS、MapReduce、HBase、Yarn、Spark等。不同类型的框架均有特定的部署和配置方法，此外，一些框架间还存在着依赖或约束关系。(2) Diversity of software resources: The Hadoop ecosystem includes many different types of computing and storage frameworks, such as HDFS, MapReduce, HBase, Yarn, Spark, etc. Different types of frameworks have specific deployment and configuration methods. In addition, there are dependencies or constraints between some frameworks.

目前，存在一些管理工具可以帮助用户部署Hadoop集群，例如Cloudera Manager和Apache Ambari。此外，开源容器引擎Docker通过对应用组件的封装、分发、部署、运行等生命周期的管理，达到应用组件级别的“一次封装，随处运行”，降低了Hadoop部署和运维的难度。上述的部署工具与技术虽然对Hadoop集群的部署与管理提供了解决方案，但是研究重点大多在于环境的配置与参数的设置，通常提供的是一种固定的部署模式，没有考虑到云平台的多样化的基础设施以及扩展性问题，不能根据服务类型、节点资源和场景特性来满足用户特定的Hadoop部署需求。Currently, some management tools exist to help users deploy Hadoop clusters, such as Cloudera Manager and Apache Ambari. In addition, the open source container engine Docker achieves "one package, run anywhere" at the application component level by managing the life cycle of application components such as packaging, distribution, deployment, and operation, reducing the difficulty of Hadoop deployment and operation and maintenance. Although the above-mentioned deployment tools and technologies provide solutions for the deployment and management of Hadoop clusters, most of the research focuses on the configuration of the environment and the setting of parameters, usually providing a fixed deployment mode, without considering the diversity of cloud platforms Due to the limited infrastructure and scalability issues, it cannot meet user-specific Hadoop deployment requirements based on service types, node resources, and scene characteristics.

发明内容Contents of the invention

本发明旨在至少解决上述技术问题之一。The present invention aims to solve at least one of the above-mentioned technical problems.

为此，本发明的目的在于提出一种云环境中模型驱动的Hadoop部署方法，能够对云环境中多样化软硬件资源的进行管理与部署。Therefore, the object of the present invention is to propose a model-driven Hadoop deployment method in a cloud environment, which can manage and deploy diversified software and hardware resources in the cloud environment.

为了实现上述目的，本发明的实施例公开了一种云环境中模型驱动的Hadoop部署方法，包括以下步骤：S1：提供Hadoop需求模型和Hadoop部署模型，其中，所述Hadoop需求模型用于用于根据系统需求生成相应的管理视图，所述Hadoop部署模型用于描述所述管理试图的节点配置信息、运行状态和软件进行部署；S2：根据预设转换规则实现所述Hadoop需求模型和所述Hadoop部署模型之间的模型转换，其中，所述预设转换规则包括节点转换模型和集群服务转换模型，所述节点转换模型用于实现所述Hadoop需求模型的节点和所述Hadoop部署模型的节点之间的模型转换，所述集群服务转换模型用于实现所述Hadoop需求模型的集群服务和所述Hadoop部署模型的集群服务之间的模型转换；S3：使用同步引擎监测所述Hadoop需求模型和所述Hadoop部署模型中的信息变化情况，并在所述Hadoop需求模型和/或所述Hadoop部署模型中的信息发生变化时进行信息同步。In order to achieve the above object, an embodiment of the present invention discloses a model-driven Hadoop deployment method in a cloud environment, comprising the following steps: S1: providing a Hadoop demand model and a Hadoop deployment model, wherein the Hadoop demand model is used for Generate corresponding management views according to system requirements, and the Hadoop deployment model is used to describe the node configuration information, running status and software deployment of the management view; S2: realize the Hadoop demand model and the Hadoop according to preset conversion rules Model conversion between deployment models, wherein the preset conversion rules include a node conversion model and a cluster service conversion model, and the node conversion model is used to realize the node between the nodes of the Hadoop demand model and the Hadoop deployment model Between the model conversion, the cluster service conversion model is used to realize the model conversion between the cluster service of the Hadoop demand model and the cluster service of the Hadoop deployment model; S3: use the synchronization engine to monitor the Hadoop demand model and the cluster service Information changes in the Hadoop deployment model, and information synchronization is performed when the information in the Hadoop demand model and/or the Hadoop deployment model changes.

进一步地，所述Hadoop需求模型包括：集群节点模块，所述集群节点模块设置有基础设施资源，所述基础设施资源包括节点配置列表、节点列表和容器映像列表中对应的资源和属性；集群服务模块，所述集群服务模块设置有服务列表，所述服务列表中包括多种服务和每种服务的属性。Further, the Hadoop demand model includes: a cluster node module, the cluster node module is provided with infrastructure resources, and the infrastructure resources include corresponding resources and attributes in the node configuration list, node list and container image list; cluster service module, the cluster service module is provided with a service list, and the service list includes various services and attributes of each service.

进一步地，所述Hadoop部署模型包括：集群节点单元，所述集群节点单元存储有虚拟机配置列表、虚拟机列表和虚拟机映像列表；集群服务单元，所述集群服务单元用于提供集群服务。Further, the Hadoop deployment model includes: a cluster node unit, which stores a virtual machine configuration list, a virtual machine list, and a virtual machine image list; a cluster service unit, which is used to provide cluster services.

进一步地，所述节点转换模型通过所述Hadoop需求模型的节点和所述Hadoop部署模型的节点之间的元素映射关系来实现模型转换，所述元素映射关系包括helper标签和mapper标签，所述helper标签用于描述类和类之间元素的映射关系，所述helper标签用于描述类和类之间属性的映射关系。Further, the node conversion model realizes model conversion through the element mapping relationship between the node of the Hadoop demand model and the node of the Hadoop deployment model, the element mapping relationship includes a helper label and a mapper label, and the helper The tag is used to describe the mapping relationship between classes and elements between classes, and the helper tag is used to describe the mapping relationship between classes and attributes between classes.

进一步地，所述集群服务转换模型通过约束模型和预设转换算法进行集群服务的自动转换，所述约束模型用于限定多个模型元素之间的关联关系，所述预设转换算法根据所述Hadoop需求模型和所述约束模型生成服务部署方案。Further, the cluster service conversion model performs automatic conversion of cluster services through a constraint model and a preset conversion algorithm, the constraint model is used to define the association relationship between multiple model elements, and the preset conversion algorithm is based on the The Hadoop requirements model and the constraints model generate a service deployment solution.

进一步地，所述预设部署算法包括以下步骤：根据所述Hadoop需求模型中服务列表下的服务单元，得到需要部署的服务集合；根据约束模型中服务单元之间的依赖关系，对服务集合中的服务进行补充和排序，得到实际需要部署的服务有序集合；根据所述服务有序集合，按照逆序的方式依次读取每一个服务并计算服务的部署方案；根据服务部署单元的节点集合，依次进行服务的部署。Further, the preset deployment algorithm includes the following steps: according to the service units under the service list in the Hadoop demand model, obtain the set of services that need to be deployed; Supplement and sort the services to obtain the ordered set of services that actually need to be deployed; according to the ordered set of services, read each service in reverse order and calculate the deployment plan of the service; according to the node set of the service deployment unit, Deploy the services sequentially.

进一步地，采用SM@RT工具构造所述Hadoop部署模型。Further, the Hadoop deployment model is constructed using the SM@RT tool.

根据本发明实施例的云环境中模型驱动的Hadoop部署方法，将运行时体系结构模型引入到Hadoop部署过程中，通过模型提出、模型转换和模型同步三步来实现满足用户特定的Hadoop部署需求。According to the model-driven Hadoop deployment method in the cloud environment of the embodiment of the present invention, the runtime architecture model is introduced into the Hadoop deployment process, and the user-specific Hadoop deployment requirements are met through three steps of model proposal, model conversion and model synchronization.

本发明的附加方面和优点将在下面的描述中部分给出，部分将从下面的描述中变得明显，或通过本发明的实践了解到。Additional aspects and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

附图说明Description of drawings

本发明的上述和/或附加的方面和优点从结合下面附图对实施例的描述中将变得明显和容易理解，其中：The above and/or additional aspects and advantages of the present invention will become apparent and comprehensible from the description of the embodiments in conjunction with the following drawings, wherein:

图1是本发明实施例的云环境中模型驱动的Hadoop部署方法的流程图；Fig. 1 is the flowchart of the model-driven Hadoop deployment method in the cloud environment of the embodiment of the present invention;

图2是本发明一个实施例的Hadoop需求元模型的示意图；Fig. 2 is the schematic diagram of the Hadoop demand metamodel of an embodiment of the present invention;

图3是本发明一个实施例的Hadoop部署元模型的示意图；Fig. 3 is the schematic diagram of the Hadoop deployment metamodel of an embodiment of the present invention;

图4是本发明一个实施例的模型元素间映射关系的示意图；Fig. 4 is a schematic diagram of a mapping relationship between model elements according to an embodiment of the present invention;

图5是本发明一个实施例的约束模型元模型的示意图；Fig. 5 is a schematic diagram of a constraint model meta-model according to an embodiment of the present invention;

图6是本发明一个实施例的约束模型的示意图；Fig. 6 is a schematic diagram of a constraint model of an embodiment of the present invention;

图7是本发明一个实施例的Hadoop集群服务部署的操作进行时参数改变说明图；Fig. 7 is an explanatory diagram of parameter changes during the operation of Hadoop cluster service deployment in one embodiment of the present invention;

图8是本发明一个实施例的Hadoop部署模型与运行系统的双向同步的示意图；Fig. 8 is a schematic diagram of the two-way synchronization of the Hadoop deployment model and the operating system of an embodiment of the present invention;

图9是本发明具体实施例中Hadoop需求模型的示意图；Fig. 9 is the schematic diagram of Hadoop demand model in the specific embodiment of the present invention;

图10是本发明具体实施例中Hadoop部署模型的示意图。Fig. 10 is a schematic diagram of a Hadoop deployment model in a specific embodiment of the present invention.

具体实施方式detailed description

下面详细描述本发明的实施例，实施例的示例在附图中示出，其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的，仅用于解释本发明，而不能理解为对本发明的限制。Embodiments of the present invention are described in detail below, and examples of the embodiments are shown in the drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary only for explaining the present invention and should not be construed as limiting the present invention.

参照下面的描述和附图，将清楚本发明的实施例的这些和其他方面。在这些描述和附图中，具体公开了本发明的实施例中的一些特定实施方式，来表示实施本发明的实施例的原理的一些方式，但是应当理解，本发明的实施例的范围不受此限制。相反，本发明的实施例包括落入所附加权利要求书的精神和内涵范围内的所有变化、修改和等同物。These and other aspects of embodiments of the invention will become apparent with reference to the following description and drawings. In these descriptions and drawings, some specific implementations of the embodiments of the present invention are specifically disclosed to represent some ways of implementing the principles of the embodiments of the present invention, but it should be understood that the scope of the embodiments of the present invention is not limited by This restriction. On the contrary, the embodiments of the present invention include all changes, modifications and equivalents coming within the spirit and scope of the appended claims.

以下结合附图描述本发明。The present invention is described below in conjunction with accompanying drawing.

图1是本发明实施例的云环境中模型驱动的Hadoop部署方法的示意图。如图1所示，本发明实施例的云环境中模型驱动的Hadoop部署方法包括以下步骤：FIG. 1 is a schematic diagram of a model-driven Hadoop deployment method in a cloud environment according to an embodiment of the present invention. As shown in Figure 1, the model-driven Hadoop deployment method in the cloud environment of the embodiment of the present invention comprises the following steps:

S1：提供Hadoop需求模型和Hadoop部署模型。其中，Hadoop需求模型用于用于根据系统需求生成相应的管理视图，Hadoop部署模型用于描述管理试图的节点配置信息、运行状态和软件进行部署；S1: Provide Hadoop requirement model and Hadoop deployment model. Among them, the Hadoop requirement model is used to generate corresponding management views according to the system requirements, and the Hadoop deployment model is used to describe the node configuration information, running status and software deployment of the management view;

在本发明一个实施例中，Hadoop需求模型包括：In one embodiment of the present invention, Hadoop demand model includes:

集群节点模块，集群节点模块设置有基础设施资源，基础设施资源包括节点配置列表、节点列表和容器映像列表中对应的资源和属性；集群服务模块，集群服务模块设置有服务列表，服务列表中包括多种服务和每种服务的属性。The cluster node module, the cluster node module is set with infrastructure resources, and the infrastructure resources include the corresponding resources and attributes in the node configuration list, node list and container image list; the cluster service module, the cluster service module is set with a service list, and the service list includes Various services and properties of each service.

具体地，在Hadoop集群的部署过程中，Hadoop需求模型为管理员提供了节点资源与集群服务的统一的管理视图。如图2所示，需求元模型由集群节点和集群服务两部分组成。Specifically, during the deployment of Hadoop clusters, the Hadoop demand model provides administrators with a unified management view of node resources and cluster services. As shown in Figure 2, the requirements meta-model consists of two parts: cluster nodes and cluster services.

在集群节点部分，Infrastructure表示基础设施资源，包含一个NodeTypes元素、一个Nodes元素和一个Images元素。NodeTypes元素为节点配置列表，表示可使用的配置文件的集合，而每一个NodeType元素表示一个节点配置，包含id、name、ram、disk、cpus等属性，依次表示标识符、名称、磁盘、存储器、CPU数量等信息；Nodes元素表示节点列表，表示节点配置的集合，而每一个Node元素表示一个节点，包括id、name、nodeType、imgeId、Status等属性，依次表示节点的标识符、节点的名称、节点类型、容器映像、节点状态等信息；Images为容器映像列表，表示可使用的容器映像文件的集合，而每一个Image元素表示一个容器映像，包含id、name、size、status、miniDisk、miniRam等属性，表示镜像的标识符、镜像的名称、镜像大小、镜像状态、磁盘、存储等信息。In the cluster node part, Infrastructure represents infrastructure resources, including a NodeTypes element, a Nodes element, and an Images element. The NodeTypes element is a list of node configurations, representing a collection of available configuration files, and each NodeType element represents a node configuration, including id, name, ram, disk, cpus and other attributes, which in turn represent identifiers, names, disks, storage, Information such as the number of CPUs; the Nodes element represents a node list, representing a collection of node configurations, and each Node element represents a node, including attributes such as id, name, nodeType, imgeId, Status, etc., which in turn represent the identifier of the node, the name of the node, Node type, container image, node status and other information; Images is a list of container images, representing a collection of available container image files, and each Image element represents a container image, including id, name, size, status, miniDisk, miniRam, etc. Attributes, which represent the image identifier, image name, image size, image status, disk, storage and other information.

在集群服务部分，Services元素表示Hadoop所提供的服务列表，包含HDFSService、MapReduceService、HBaseService、YarnService、SparkService等元素，分别表示HDFS、MapReduce、HBase、Yarn和Spark等服务。以HDFS Service、MapReduce Service、HBase Service为例进行阐述：HDFSService表示集群的HDFS服务，该服务由集群中多个节点提供，每个节点均部署相应的HDFS软件模块，即HDFSAgent；HBaseService表示集群的HBase服务，该服务由集群中多个节点提供，每个节点均部署相应的HBase软件模块，即HBaseAgent；MapReduce Service则表示集群的MapReduce服务，该服务由集群中多个节点提供，每个节点均部署相应的MapReduce软件模块，即MapReduceAgent。上述所有的服务都包含id、name、version和status属性，分别表示标识符、名称、版本和当前运行状态，而其软件部署模块(即Agent)则包含health和nodeId属性，表示健康状况和软件部署模块所在的节点位置等信息。In the cluster service part, the Services element represents the list of services provided by Hadoop, including elements such as HDFSService, MapReduceService, HBaseService, YarnService, and SparkService, respectively representing services such as HDFS, MapReduce, HBase, Yarn, and Spark. Take HDFS Service, MapReduce Service, and HBase Service as examples: HDFSService represents the HDFS service of the cluster, which is provided by multiple nodes in the cluster, and each node deploys a corresponding HDFS software module, that is, HDFSAgent; HBaseService represents the HBase of the cluster Service, which is provided by multiple nodes in the cluster, and each node deploys the corresponding HBase software module, that is, HBaseAgent; MapReduce Service represents the MapReduce service of the cluster, which is provided by multiple nodes in the cluster, and each node deploys The corresponding MapReduce software module is MapReduceAgent. All the above-mentioned services contain id, name, version and status attributes, which respectively represent the identifier, name, version and current running status, and its software deployment module (ie Agent) contains health and nodeId attributes, which represent the health status and software deployment Information such as the node location where the module is located.

在本发明一个实施例中，Hadoop部署模型包括：集群节点单元，集群节点单元存储有虚拟机配置列表、虚拟机列表和虚拟机映像列表；集群服务单元，集群服务单元用于提供集群服务。In one embodiment of the present invention, the Hadoop deployment model includes: a cluster node unit, which stores a virtual machine configuration list, a virtual machine list and a virtual machine image list; a cluster service unit, which is used to provide cluster services.

具体地，在Hadoop集群部署过程中，部署模型为管理员提供了系统部署的管理视图，描述了集群节点配置、运行状态以及其上的软件部署单元，并与运行系统双向同步。如图3所示，部署元模型也包括集群节点和集群服务两部分。Specifically, during the Hadoop cluster deployment process, the deployment model provides administrators with a management view of system deployment, describes the cluster node configuration, running status, and software deployment units on it, and is bidirectionally synchronized with the running system. As shown in Figure 3, the deployment metamodel also includes two parts: cluster nodes and cluster services.

在集群节点部分，以Cloudstack为例，Project表示一个项目，包含一个VMTypes元素、一个VMs元素和一个Images元素。VMTypes元素为虚拟机配置列表，表示可使用的配置文件的集合，每一个VMType元素表示一个虚拟机配置，包含id、name、description、guestCpus、memoryMb、imageSpaceGb等属性，依次表示标识符、名称、虚拟机描述、CPU数量、内存、镜像空间大小等信息；VMs元素表示虚拟机列表，表示虚拟机配置的集合，每一个VM元素表示一个虚拟机，包含id、name、imageId、vmtypeId、cpuutiliz、memoryutiliz等属性，表示标识符、名称、虚拟机映像、虚拟机类型、CPU使用率、内存使用率等信息等信息；Images为虚拟机映像列表，表示可使用的虚拟机映像文件的集合，每一个Image元素表示一个虚拟机映像，包含id、name、vsize、description等属性，表示标识符、名称、镜像大小、镜像描述等信息。In the cluster node part, taking Cloudstack as an example, Project represents a project, including a VMTypes element, a VMs element, and an Images element. The VMTypes element is a list of virtual machine configurations, representing a collection of configuration files that can be used. Each VMType element represents a virtual machine configuration, including id, name, description, guestCpus, memoryMb, imageSpaceGb and other attributes, which in turn represent identifier, name, virtual Information such as machine description, CPU quantity, memory, image space size, etc.; the VMs element represents a list of virtual machines, representing a collection of virtual machine configurations, and each VM element represents a virtual machine, including id, name, imageId, vmtypeId, cpuutiliz, memoryutiliz, etc. Attributes, indicating identifiers, names, virtual machine images, virtual machine types, CPU usage, memory usage and other information; Images is a list of virtual machine images, indicating a collection of available virtual machine image files, each Image element Represents a virtual machine image, including attributes such as id, name, vsize, and description, and represents information such as identifier, name, image size, and image description.

在集群服务部分，Services元素表示Hadoop所提供的服务列表，包含HDFSService、MapReduceService、HBaseService、YarnService、SparkService等元素，分别表示HDFS、MapReduce、HBase、Yarn和Spark等服务，每种服务包含服务单元、角色单元、部署单元等三种主要模型元素。服务单元表示Hadoop所提供的具体服务；角色单元表示具体Hadoop服务所包含的不同角色；部署单元表示不同角色所拥有的正在运行的软件模块。以HDFSService、MapReduceService、HBaseService为例进行阐述。HDFS服务单元包含三种类型的角色单元，即NameNode，SecondaryNameNode和DataNode；其中，角色单元NameNode表示HDFS的管理中心，用于管理文件系统的命名空间、集群配置信息和存储块的复制，有且仅有一个部署单元DU_NN；角色单元SecondaryNameNode表示NameNode的备份节点，用于备份NameNode节点保存的元数据，有且仅有一个部署单元DU_SNN；角色单元DataNode表示HDFS的工作节点，用于调度存储和检索数据，可以有多个部署单元DU_DN。MapReduce服务单元包含两种类型的角色单元，即JobTracker和TaskTracker；其中，角色单元JobTracker表示MapReduce的中心服务节点，用于调度Job的每一个子任务task使其运行于TaskTracker上，有且仅有一个JobTracker的部署单元DU_JT；TaskTracker表示子服务节点，用于执行JobTracker分配的任务，可以有多个TaskTracker的部署单元DU_TT。HBase服务单元包含两种类型的角色单元，即HMaster和HRegionServer；其中，HMaster表示HBase的管理调度中心，用于分配与管理HRegionServer，有且仅有一个HMaster的部署单元DU_HM；HRegionServer表示HBase运行在每个工作节点上的服务，用于维护HMaster分配的Region与IO请求，可以有多个HRegionServer的部署单元DU_RS。上述不同角色的部署单元都包含vmId和health属性，表示部署单元的运行健康状况和软件模块所在的虚拟机位置等信息。In the cluster service part, the Services element represents the list of services provided by Hadoop, including HDFSService, MapReduceService, HBaseService, YarnService, SparkService and other elements, respectively representing services such as HDFS, MapReduce, HBase, Yarn and Spark, and each service includes service units and roles There are three main model elements: unit and deployment unit. The service unit represents the specific service provided by Hadoop; the role unit represents the different roles contained in the specific Hadoop service; the deployment unit represents the running software modules owned by different roles. Take HDFSService, MapReduceService, and HBaseService as examples. The HDFS service unit includes three types of role units, NameNode, SecondaryNameNode, and DataNode; among them, the role unit NameNode represents the management center of HDFS, which is used to manage the namespace of the file system, cluster configuration information, and the replication of storage blocks. There is a deployment unit DU_NN; the role unit SecondaryNameNode represents the backup node of the NameNode, which is used to back up the metadata saved by the NameNode node, and there is only one deployment unit DU_SNN; the role unit DataNode represents the working node of HDFS, which is used to schedule storage and retrieve data , there can be multiple deployment units DU_DN. The MapReduce service unit includes two types of role units, namely JobTracker and TaskTracker; among them, the role unit JobTracker represents the central service node of MapReduce, which is used to schedule each subtask task of the Job to run on the TaskTracker, and there is only one The deployment unit DU_JT of the JobTracker; TaskTracker represents a sub-service node, which is used to execute the tasks assigned by the JobTracker, and there can be multiple deployment units DU_TT of the TaskTracker. The HBase service unit includes two types of role units, namely HMaster and HRegionServer; among them, HMaster represents the management and scheduling center of HBase, which is used to allocate and manage HRegionServer, and there is only one deployment unit DU_HM of HMaster; HRegionServer represents that HBase runs on each The service on a working node is used to maintain the Region and IO requests allocated by the HMaster. There can be multiple HRegionServer deployment units DU_RS. The above-mentioned deployment units of different roles all include vmId and health attributes, which represent information such as the running health status of the deployment unit and the location of the virtual machine where the software module is located.

S2：根据预设转换规则实现Hadoop需求模型和Hadoop部署模型之间的模型转换。其中，预设转换规则包括节点转换模型和集群服务转换模型，节点转换模型用于实现Hadoop需求模型的节点和Hadoop部署模型的节点之间的模型转换，集群服务转换模型用于实现Hadoop需求模型的集群服务和Hadoop部署模型的集群服务之间的模型转换。S2: Realize the model conversion between the Hadoop requirement model and the Hadoop deployment model according to the preset conversion rules. Among them, the preset conversion rules include a node conversion model and a cluster service conversion model. The node conversion model is used to realize the model conversion between the nodes of the Hadoop demand model and the nodes of the Hadoop deployment model, and the cluster service conversion model is used to realize the Hadoop demand model. Model conversion between cluster services and cluster services for Hadoop deployment models.

在本发明的一个实施例中，节点转换模型通过Hadoop需求模型的节点和Hadoop部署模型的节点之间的元素映射关系来实现模型转换，元素映射关系包括helper标签和mapper标签，helper标签用于描述类和类之间元素的映射关系，helper标签用于描述类和类之间属性的映射关系。In one embodiment of the present invention, the node conversion model realizes model conversion through the element mapping relationship between the nodes of the Hadoop demand model and the nodes of the Hadoop deployment model. The element mapping relationship includes helper tags and mapper tags, and the helper tags are used to describe The mapping relationship between classes and elements between classes, and the helper tag is used to describe the mapping relationship between classes and attributes between classes.

具体地，在不同应用场景下，部署模型中的集群节点部分存在着较大差异。例如，在CloudStack中，主要的管理单元包括VM、Flavor、Image，表示虚拟机、配置文件和镜像；而在Docker中，主要的管理单元则是Container、Repository、Image，表示容器、仓库和镜像。因此，需要建立需求模型和部署模型节点部分的元素映射关系来实现模型转换。Specifically, in different application scenarios, there are great differences in the cluster node part in the deployment model. For example, in CloudStack, the main management units include VM, Flavor, and Image, which represent virtual machines, configuration files, and images; while in Docker, the main management units are Container, Repository, and Image, which represent containers, warehouses, and images. Therefore, it is necessary to establish the element mapping relationship between the requirement model and the node part of the deployment model to realize model conversion.

本发明的实施例设计了一套映射关系的描述规则及模型操作的转换方法，根据给定的模型间的元素映射关系，自动进行需求模型到部署模型的转换。模型间的元素映射关系通过一个XML文件进行描述，描述规则中关键字的定义如下：Embodiments of the present invention design a set of mapping relationship description rules and model operation conversion methods, and automatically convert the requirement model to the deployment model according to the element mapping relationship between the given models. The element mapping relationship between models is described through an XML file, and the definitions of keywords in the description rules are as follows:

(1)helper用于描述类和类之间元素的映射关系。helper标签含有两个属性：value和key，value表示需求模型中将要映射的元素，key表示部署模型中对应的元素。(1) helper is used to describe the mapping relationship between classes and elements between classes. The helper tag contains two attributes: value and key, value indicates the element to be mapped in the requirement model, and key indicates the corresponding element in the deployment model.

(2)mapper用于描述类和类之间属性的映射关系。mapper标签也含有key和value两个属性，key表示部署模型应该被映射的元素属性的名称，value表示需求模型中元素对应属性的名称，它们所归属的元素分别由上一层的helper标签定义。(2) mapper is used to describe the mapping relationship between classes and attributes between classes. The mapper tag also contains two attributes, key and value. The key indicates the name of the element attribute that the deployment model should be mapped to, and the value indicates the name of the corresponding attribute of the element in the requirement model. The elements they belong to are respectively defined by the helper tag of the upper layer.

基于上述关键字，就可以按照映射规则对本发明所提出的需求模型和部署模型中的元素的映射关系进行描述。如图4所示，需求模型中的NodeType元素映射到部署模型中的VMType元素，用一个helper标签来描述，helper标签的key属性和value属性分别为部署模型和需求模型对应元素的名称，即VMType和NodeType。其中，NodeType的id与VMType的id对应，name和name对应，ram和memoryMb对应，disk和imageSpaceGb对应，vcpus和guestCpus对应。Based on the above keywords, the mapping relationship between the requirements model proposed by the present invention and the elements in the deployment model can be described according to the mapping rules. As shown in Figure 4, the NodeType element in the requirement model is mapped to the VMType element in the deployment model, which is described by a helper tag. The key attribute and value attribute of the helper tag are the names of the corresponding elements of the deployment model and the requirement model, namely VMType and NodeType. Among them, the id of NodeType corresponds to the id of VMType, name corresponds to name, ram corresponds to memoryMb, disk corresponds to imageSpaceGb, and vcpus corresponds to guestCpus.

通过模型操作可以实现对系统的管理，模型操作包含五种基本类型，即Get、Set、List、Add和Remove。任意一个作用于需求模型元素的模型操作，将转换为一个作用于部署模型元素的模型操作。如表1所示，本发明定义了模型操作的转换规则，实现了模型操作的自动转换。例如，需求模型中的A元素映射为部署模型中的B元素，那么，对于A元素的List、Add和Remove操作将被映射到相应B元素上的相同操作，对于A属性的Get或Set操作也将被映射到对相应属性的相同操作。The management of the system can be realized through the model operation, and the model operation includes five basic types, namely Get, Set, List, Add and Remove. Any model operation on a requirement model element will be converted to a model operation on a deployment model element. As shown in Table 1, the present invention defines conversion rules for model operations and realizes automatic conversion of model operations. For example, the A element in the requirement model is mapped to the B element in the deployment model, then the List, Add, and Remove operations on the A element will be mapped to the same operation on the corresponding B element, and the Get or Set operation on the A attribute will also be will be mapped to the same operation on the corresponding attribute.

表1模型操作转换的映射规则Table 1 Mapping rules for model operation conversion

在本发明的一个实施例中，集群服务转换模型通过约束模型和预设转换算法进行集群服务的自动转换，约束模型用于限定多个模型元素之间的关联关系，预设转换算法根据Hadoop需求模型和约束模型生成服务部署方案。In one embodiment of the present invention, the cluster service conversion model performs automatic conversion of cluster services through a constraint model and a preset conversion algorithm, the constraint model is used to define the association relationship between multiple model elements, and the preset conversion algorithm is based on Hadoop requirements Model and constraint models generate service deployment scenarios.

具体地，在Hadoop生态系统中，包含HDFS、MapReduce、Hbase、Yarn、Spark等多种不同类型的计算和存储框架，这些计算或存储框架均有特定的部署和配置方法，且不同的框架之间可能存在依赖或约束关系。例如，部署MapReduce服务需要依赖于HDFS服务。因此，对于不同Hadoop集群服务，模型转换方法存在较大差异。本发明通过约束模型描述集群服务的部署规则，并通过一个通用算法实现集群服务的自动转换。Specifically, in the Hadoop ecosystem, HDFS, MapReduce, Hbase, Yarn, Spark and other different types of computing and storage frameworks are included. These computing or storage frameworks have specific deployment and configuration methods, and different frameworks There may be dependencies or constraints. For example, deploying MapReduce services needs to depend on HDFS services. Therefore, for different Hadoop cluster services, the model conversion methods are quite different. The invention describes the deployment rules of the cluster service through a constraint model, and realizes the automatic conversion of the cluster service through a general algorithm.

约束模型描述了一种Hadoop服务的部署规则。如图5所示，约束模型的元模型包含服务单元、角色单元、部署单元等几种主要的模型元素，并描述了它们之间的关联关系。其中，Service表示服务单元，包含name、minNodeNum属性，依次表示名称和最小部署节点数；Role表示角色单元，包含name、excluName、resPriority、deOrder等属性，依次表示名称、排他属性、资源优先级和部署顺序；DU表示部署单元，包含health等属性，表示服务的健康状况；Dependency_S表示服务单元之间的依赖关系，包含name属性，表示所依赖的服务单元的名称；类似地，Dependency_DU表示部署单元之间的依赖关系，name属性则表示所依赖的部署单元的名称。当服务单元之间存在依赖关系时，部署单元之间不一定存在依赖关系；但是，当部署单元之间存在依赖关系时，服务单元之间必然存在依赖关系。The constraint model describes the deployment rules of a Hadoop service. As shown in Figure 5, the metamodel of the constraint model includes several main model elements such as service unit, role unit, and deployment unit, and describes the relationship between them. Among them, Service represents a service unit, including name and minNodeNum attributes, which in turn represent the name and the minimum number of deployment nodes; Role represents a role unit, including name, excluName, resPriority, deOrder and other attributes, which represent name, exclusive attribute, resource priority and deployment in turn Sequence; DU indicates the deployment unit, including health and other attributes, indicating the health status of the service; Dependency_S indicates the dependency relationship between service units, including the name attribute, indicating the name of the service unit on which it depends; similarly, Dependency_DU indicates the relationship between deployment units Dependencies, the name attribute indicates the name of the deployment unit on which it depends. When there is a dependency relationship between service units, there may not be a dependency relationship between deployment units; however, when there is a dependency relationship between deployment units, there must be a dependency relationship between service units.

如图6所示，以HDFS、MapReduce、HBase等服务为例进行阐述。HDFS服务单元包含三种角色单元，即NameNode、SeconderyNameNode和DataNode。其中，角色单元NameNode有且仅有一个部署单元DU_NN，角色单元SeconderyNameNode有且仅有一个部署单元DU_SNN，且NameNode和SeconderyNameNode不能部署在同一节点；角色单元DataNode可以有多个部署单元DU_DN，且DataNode可以和NameNode或SeconderyNameNode部署在同一节点；此外，HDFS服务单元对其他服务单元不存在依赖，且部署单元DU_NN、DU_SNN和DU_DN对其他部署单元也不存在依赖。MapReduce服务单元包含两种角色单元，即JobTracker和TaskTracker；其中，角色单元JobTracker有且仅有一个部署单元DU_JT；角色单元TaskTracker可以有多个部署单元DU_TT，且JobTracker和TaskTracker不能部署在同一节点；此外，MapReduce服务单元对HDFS服务单元及存在依赖，部署单元DU_JT对部署单元DU_NN存在依赖，部署单元DU_TT对部署单元DU_DN存在依赖。HBase服务单元包含两种角色单元，即HMaster和HRegionServer；其中，角色单元HMaster有且仅有一个部署单元DU_HM；角色单元HRegionServer可以有多个部署单元DU_RS，且HMaster和HRegionServer不能部署在同一节点；此外，HBase服务单元对HDFS服务单元及存在依赖，部署单元DU_HM对部署单元DU_NN存在依赖，部署单元DU_RS对部署单元DU_DN存在依赖。As shown in Figure 6, HDFS, MapReduce, HBase and other services are taken as examples to illustrate. HDFS service unit includes three role units, NameNode, SeconderyNameNode and DataNode. Among them, the role unit NameNode has one and only one deployment unit DU_NN, the role unit SeconderyNameNode has one and only one deployment unit DU_SNN, and the NameNode and SeconderyNameNode cannot be deployed on the same node; the role unit DataNode can have multiple deployment units DU_DN, and the DataNode can It is deployed on the same node as NameNode or SecondaryNameNode; in addition, HDFS service units have no dependencies on other service units, and deployment units DU_NN, DU_SNN, and DU_DN have no dependencies on other deployment units. The MapReduce service unit includes two role units, JobTracker and TaskTracker; among them, the role unit JobTracker has one and only one deployment unit DU_JT; the role unit TaskTracker can have multiple deployment units DU_TT, and the JobTracker and TaskTracker cannot be deployed on the same node; , the MapReduce service unit is dependent on the HDFS service unit, the deployment unit DU_JT is dependent on the deployment unit DU_NN, and the deployment unit DU_TT is dependent on the deployment unit DU_DN. The HBase service unit includes two role units, namely HMaster and HRegionServer; among them, the role unit HMaster has one and only one deployment unit DU_HM; the role unit HRegionServer can have multiple deployment units DU_RS, and HMaster and HRegionServer cannot be deployed on the same node; in addition , the HBase service unit is dependent on the HDFS service unit, the deployment unit DU_HM is dependent on the deployment unit DU_NN, and the deployment unit DU_RS is dependent on the deployment unit DU_DN.

在本发明的一个实施例中，预设部署算法包括以下步骤：根据Hadoop需求模型中服务列表下的服务单元，得到需要部署的服务集合；根据约束模型中服务单元之间的依赖关系，对服务集合中的服务进行补充和排序，得到实际需要部署的服务有序集合；根据服务有序集合，按照逆序的方式依次读取每一个服务并计算服务的部署方案；根据服务部署单元的节点集合，依次进行服务的部署。In one embodiment of the present invention, the preset deployment algorithm includes the following steps: according to the service units under the service list in the Hadoop demand model, obtain the set of services that need to be deployed; The services in the collection are supplemented and sorted to obtain the ordered collection of services that actually need to be deployed; according to the ordered collection of services, each service is read in reverse order and the deployment plan of the service is calculated; according to the node collection of the service deployment unit, Deploy the services sequentially.

具体地，本发明的实施例提出了一种通用算法，能够根据给定的需求模型和约束模型，自动生成服务部署方案，算法的基本步骤如下：Specifically, the embodiment of the present invention proposes a general algorithm, which can automatically generate a service deployment plan according to a given demand model and constraint model. The basic steps of the algorithm are as follows:

1、根据需求模型中服务列表services下的服务单元，得到需要部署的服务集合services{S1,S2,S3,…,Si}。1. According to the service units under the service list services in the demand model, obtain the service set services {S1, S2, S3, ..., Si} that need to be deployed.

2、根据约束模型中服务单元之间的依赖关系，对服务集合services中的服务进行补充和排序，得到实际需要部署的服务有序集合servicesOrder{S1,S2,S3,...,Sj}；若服务集合中某个服务所依赖的其他服务未出现在服务集合中，则需要进行补充；在有序集合servicesOrder中，服务S1不依赖于其他服务，服务S2可依赖于服务S1，服务S3可依赖于服务S1和服务S2，以此类推。2. According to the dependency relationship between the service units in the constraint model, supplement and sort the services in the service set services to obtain the ordered set servicesOrder{S1,S2,S3,...,Sj} of the services that actually need to be deployed; If other services that a service depends on in the service set do not appear in the service set, supplements need to be made; in the ordered set servicesOrder, service S1 does not depend on other services, service S2 can depend on service S1, and service S3 can Depends on service S1 and service S2, and so on.

3、根据有序集合servicesOrder，按照逆序的方式依次读取每一个服务并计算服务的部署方案：首先，根据需求模型中当前部署的服务的软件部署模块(即Agent)中的信息，得到该服务的部署节点列表；其次，根据约束模型得到该服务所包含的角色类型及其部署约束；接着，根据节点信息和角色部署约束计算出每一个角色的部署单元所部署的节点；最后，根据约束模型中角色的部署单元的依赖关系记录其依赖的部署单元的部署节点信息，得到有序集合servicesOrder中每一个服务的部署单元的节点集合，即servicesOrderDeploy{S1{DU1{},DU2{},...},S2{DU1{},DU2{},...},...,Sn{DU1{},DU2{},...}}，例如，一个包含HDFS和MapReduce服务的部署方案的部署单元的节点集合为servicesOrderDeploy{HDFSService{DU_NN{1},DU_SNN{2},DU_DN{1,2,3}},MapReduceService{DU_JT{1},DU_TT{2,3}}}。3. According to the ordered set servicesOrder, read each service in reverse order and calculate the deployment plan of the service: First, get the service according to the information in the software deployment module (ie Agent) of the currently deployed service in the demand model secondly, according to the constraint model, obtain the role types and deployment constraints contained in the service; then, calculate the nodes deployed by each role's deployment unit according to the node information and role deployment constraints; finally, according to the constraint model The dependency relationship of the deployment unit of the role records the deployment node information of the deployment unit it depends on, and obtains the node set of the deployment unit of each service in the ordered set servicesOrder, that is, servicesOrderDeploy{S1{DU1{},DU2{},.. .},S2{DU1{},DU2{},...},...,Sn{DU1{},DU2{},...}}, for example, a deployment scenario that includes HDFS and MapReduce services The node set of the deployment unit is servicesOrderDeploy{HDFSService{DU_NN{1},DU_SNN{2},DU_DN{1,2,3}},MapReduceService{DU_JT{1},DU_TT{2,3}}}.

4、根据服务部署单元的节点集合servicesOrderDeploy，依次进行服务的部署。4. According to the node set servicesOrderDeploy of the service deployment unit, the services are deployed sequentially.

S3：使用同步引擎监测Hadoop需求模型和Hadoop部署模型中的信息变化情况，并在Hadoop需求模型和/或Hadoop部署模型中的信息发生变化时进行信息同步。S3: Use the synchronization engine to monitor information changes in the Hadoop demand model and Hadoop deployment model, and perform information synchronization when the information in the Hadoop demand model and/or Hadoop deployment model changes.

在本发明的一个是实施例中，采用SM@RT工具构造Hadoop部署模型。In one embodiment of the present invention, the Hadoop deployment model is constructed using the SM@RT tool.

具体地，运行时模型用一组可管理的单元来表示系统的整体架构，将隐藏在系统内部的结构、状态、配置等运行时信息显示化地描述为标准的、面向管理者视角的结构化视图。Specifically, the runtime model uses a group of manageable units to represent the overall architecture of the system, and explicitly describes the runtime information such as the structure, status, and configuration hidden inside the system as a standard, manager-oriented structured model. view.

本发明采用SM@RT工具构造Hadoop部署模型。给定系统的元模型和访问模型，其中，元模型描述了受管系统的信息，访问模型描述了管理接口的调用方法，基于以上输入，目标系统运行时体系结构能够由SM@RT工具自动生成，而它与系统间的双向一致性也能够得到保证。The present invention adopts SM@RT tool to construct Hadoop deployment model. Given the metamodel and access model of the system, the metamodel describes the information of the managed system, and the access model describes the calling method of the management interface. Based on the above inputs, the runtime architecture of the target system can be automatically generated by the SM@RT tool , and the two-way consistency between it and the system can also be guaranteed.

如图7所示，本发明提供了一组关于Hadoop集群服务部署的操作，概述了对不同类型模型元素的操作并对模型操作进行了分类，对于每种操作，本发明具体化了操作名称需要的参数和操作所提供的改变。As shown in Figure 7, the present invention provides a set of operations about Hadoop cluster service deployment, summarizes the operations of different types of model elements and classifies the model operations, and for each operation, the present invention specifies the operation name requirements The parameters and operations provided change.

SM@RT支持Hadoop部署模型与运行系统的双向同步。如图8所示，当同步引擎检测到Hadoop部署模型中某个服务的角色单元增加了一个部署单元时，自动地在运行系统中添加一个虚拟机进行部署；当管理员在运行系统中删除这个虚拟机时，同步引擎也能够自动检测到这个变化，并调用管理接口终止相应的部署单元。SM@RT supports two-way synchronization between the Hadoop deployment model and the running system. As shown in Figure 8, when the synchronization engine detects that a deployment unit has been added to the role unit of a certain service in the Hadoop deployment model, it automatically adds a virtual machine to the running system for deployment; when the administrator deletes this When running a virtual machine, the synchronization engine can also automatically detect this change, and call the management interface to terminate the corresponding deployment unit.

为了是本领域技术人员进一步理解本发明，将通过以下实施例进行详细说明。In order for those skilled in the art to further understand the present invention, the following examples will be described in detail.

为了验证本发明所提出方法的可行性和有效性，实现了一个自动部署和配置Hadoop的实例，提供了一个用户定制的Hadoop服务的解决方案，包括部署MapReduce服务和HBase服务。In order to verify the feasibility and effectiveness of the method proposed in the present invention, an example of automatic deployment and configuration of Hadoop is realized, and a solution of user-customized Hadoop service is provided, including the deployment of MapReduce service and HBase service.

实验在CloudStack上部署Hadoop集群，利用CloudStack中的5个虚拟机部署MapReduce和HBase服务。其中，MapReduce服务部署在Host-1、Host-2、Host-3三个虚拟机上；HBase服务部署在Host-1、Host-4、Host-5三个虚拟机上。此外，不同的虚拟机分配了不同的CPU、内存、存储等资源，具体的配置细节与部署情况见表2。The experiment deploys a Hadoop cluster on CloudStack, and uses five virtual machines in CloudStack to deploy MapReduce and HBase services. Among them, the MapReduce service is deployed on the three virtual machines Host-1, Host-2, and Host-3; the HBase service is deployed on the three virtual machines Host-1, Host-4, and Host-5. In addition, different virtual machines are assigned different resources such as CPU, memory, and storage. See Table 2 for specific configuration details and deployment conditions.

表2节点部署情况Table 2 Node deployment

用户只需要定义Hadoop集群的需求模型，本发明提出的方法就能够实现需求模型到部署模型的自动转换，并最终完成集群部署。The user only needs to define the demand model of the Hadoop cluster, and the method proposed by the invention can realize the automatic conversion from the demand model to the deployment model, and finally complete the cluster deployment.

如图9所示，需求模型包括集群节点部分和集群服务部分。As shown in Figure 9, the requirement model includes a cluster node part and a cluster service part.

在集群节点部分，NodeTypes表示节点类型列表，包括三种节点类型，分别为“CPU：4，Memory：8，Storage：400”、“CPU：2、Memory：8、Storage：400”和“CPU：2、Memory：4、Storage：800”；Nodes表示节点列表，包括5个节点，其中，id为1的Node的节点类型对应id为nt1的NodeType，id为2和3的Node的节点类型对应id为nt2的NodeType，id为4和5的Node的节点类型对应id为nt3的NodeType。In the cluster node section, NodeTypes represents a list of node types, including three node types, namely "CPU: 4, Memory: 8, Storage: 400", "CPU: 2, Memory: 8, Storage: 400" and "CPU: 2. Memory: 4, Storage: 800"; Nodes represents the node list, including 5 nodes, among which, the node type of Node with id 1 corresponds to NodeType with id nt1, and the node types of Node with id 2 and 3 correspond to id The NodeType is nt2, and the node types of Node with id 4 and 5 correspond to the NodeType with id nt3.

在集群服务部分，MapReduce和HBase服务分别部署在其对应的3个节点上。In the cluster service part, MapReduce and HBase services are deployed on their corresponding three nodes.

需求模型到部署模型的转换分为两部分，分别是集群节点的模型转换和集群服务的模型转换。The conversion from the demand model to the deployment model is divided into two parts, namely the model conversion of cluster nodes and the model conversion of cluster services.

在集群节点部分，需求模型中的NodeType元素和Node元素分别映射到部署模型中的VMType元素和VM元素。因此，部署模型中也存在三种虚拟机类型(即VMType)和5个虚拟机(即VM)，如图10所示。In the cluster node part, the NodeType element and the Node element in the requirement model are mapped to the VMType element and the VM element in the deployment model respectively. Therefore, there are also three virtual machine types (ie VMType) and 5 virtual machines (ie VM) in the deployment model, as shown in FIG. 10 .

在集群服务部分，根据本发明提出的约束模型和通用算法，模型转换过程如下：In the cluster service part, according to the constraint model and general algorithm proposed by the present invention, the model conversion process is as follows:

1、根据需求模型中服务列表services下的服务单元，得到需要部署的服务集合services{MapReduceService,HBaseService}；1. According to the service unit under the service list services in the demand model, get the service collection services {MapReduceService, HBaseService} to be deployed;

2、根据约束模型中MapReduceService和HBaseService之间的依赖关系可知，MapReduceService和HBaseService之间不存在依赖关系但这两个服务都依赖于HDFSService，此时，HDFSService未出现在服务集合中，因此，需要在服务集合services中补充HDFSService并进行部署排序，得到实际需要部署的服务的有序集合servicesOrder{HDFSService,MapReduceService,HBaseService}；2. According to the dependency relationship between MapReduceService and HBaseService in the constraint model, there is no dependency relationship between MapReduceService and HBaseService, but both services depend on HDFSService. At this time, HDFSService does not appear in the service collection. Therefore, you need to Add HDFSService to the service collection services and perform deployment sorting to obtain the ordered collection servicesOrder{HDFSService, MapReduceService, HBaseService} of the services that actually need to be deployed;

3、根据有序集合servicesOrder，按照逆序的方式依次读取HBaseService、MapReduceService、HDFSService并计算服务的部署方案：首先，根据需求模型中HBaseService的软件部署模块(即Agent)中的信息，得到HBase服务的部署节点列表，1、4、5号节点；其次，根据约束模型可知HBaseService包含HMaster和HRegionServer两种角色单元，HMaster有且仅有一个部署单元DU_HM，HRegionServer可以有多个部署单元DU_RS，且HMaster和HRegionServer不能部署在同一个节点上，此外，部署单元DU_HM对HDFS服务的部署单元DU_NN存在依赖，部署单元DU_RS对HDFS服务的部署单元DU_DN存在依赖；接着，根据节点信息可知和角色部署约束计算得到，HMaster和NameNode的资源优先级和部署次序都是最高的，因此将这两个角色单元部署在资源丰富的1号节点，而4、5号节点部署角色单元HRegionServer和DataNode；类似地，在部署MapReduceService时，通过需求模型和约束模型可以计算得到角色单元JobTracker和NameNode部署在1号节点，而2、3号节点部署角色单元TaskTracker和DataNode；而在部署HDFSService时，由约束模型可知，HDFSService不依赖于其他服务，且NameNode和SeconderyNameNode不能部署在同一节点，再根据HBaseService和MapReduceService部署节点信息可知，NameNode部署在1号节点，SeconderyNameNode部署在2号节点，DataNode部署在1～5号节点；最后，根据约束模型中部署单元的依赖关系记录其依赖的部署单元的部署节点信息，得到有序集合servicesOrder中每一个服务的部署单元的节点集合，即servicesOrderDeploy{HDFSService{DU_NN{1},DU_SNN{2},DU_DN{1,2,3,4,5}},MapReduce Service{DU_JT{1},DU_TT{2,3}},HBaseService{DU_HM{1},DU_RS{4,5}}}。3. According to the ordered set servicesOrder, read HBaseService, MapReduceService, HDFSService in reverse order and calculate the deployment plan of the service: First, according to the information in the software deployment module (ie Agent) of HBaseService in the demand model, get the HBase service Deployment node list, nodes 1, 4, and 5; Secondly, according to the constraint model, it can be seen that HBaseService includes two role units, HMaster and HRegionServer, HMaster has one and only one deployment unit DU_HM, and HRegionServer can have multiple deployment units DU_RS, and HMaster and HRegionServer HRegionServer cannot be deployed on the same node. In addition, the deployment unit DU_HM has a dependency on the deployment unit DU_NN of the HDFS service, and the deployment unit DU_RS has a dependency on the deployment unit DU_DN of the HDFS service. Then, according to the node information and role deployment constraints, it is calculated that The resource priority and deployment order of HMaster and NameNode are the highest, so these two role units are deployed on the resource-rich node 1, and the role units HRegionServer and DataNode are deployed on nodes 4 and 5; similarly, when deploying MapReduceService When using the demand model and constraint model, it can be calculated that the role units JobTracker and NameNode are deployed on node 1, and the role units TaskTracker and DataNode are deployed on nodes 2 and 3; when HDFSService is deployed, it can be seen from the constraint model that HDFSService does not depend on Other services, and the NameNode and SecondaryNameNode cannot be deployed on the same node. According to the deployment node information of HBaseService and MapReduceService, the NameNode is deployed on node 1, the SecondaryNameNode is deployed on node 2, and the DataNode is deployed on nodes 1-5. Finally, according to the constraints The dependency relationship of the deployment unit in the model records the deployment node information of the deployment unit it depends on, and obtains the node set of the deployment unit of each service in the ordered set servicesOrder, that is, servicesOrderDeploy{HDFSService{DU_NN{1},DU_SNN{2},DU_DN {1,2,3,4,5}},MapReduce Service{DU_JT{1},DU_TT{2,3}},HBaseService{DU_HM{1},DU_RS{4,5}}}.

如图10所示，由集群节点的模型转换和集群服务的模型转换得到具体的Hadoop部署模型。As shown in Figure 10, the specific Hadoop deployment model is obtained from the model transformation of the cluster nodes and the model transformation of the cluster service.

本发明云环境中模型驱动的Hadoop部署方法，针对云环境中多样化的软硬件资源与服务随需性给部署Hadoop集群带来的困难，提出了模型驱动的Hadoop集群服务部署方法，通过提出Hadoop需求模型到部署模型的转换方法、实现部署模型与运行系统的双向同步，为Hadoop部署提供了一种快速可扩展的集群服务部署方式，并在实际场景中验证了本发明的可行性和有效性。The model-driven Hadoop deployment method in the cloud environment of the present invention aims at the difficulties brought by the diversified software and hardware resources and service on-demand in the cloud environment to deploy Hadoop clusters, and proposes a model-driven Hadoop cluster service deployment method. By proposing Hadoop The conversion method from the demand model to the deployment model and the two-way synchronization between the deployment model and the operating system provide a fast and scalable cluster service deployment method for Hadoop deployment, and verify the feasibility and effectiveness of the present invention in actual scenarios .

另外，本发明实施例的云环境中模型驱动的Hadoop部署方法的其它构成以及作用对于本领域的技术人员而言都是已知的，为了减少冗余，不做赘述。In addition, other components and functions of the model-driven Hadoop deployment method in the cloud environment of the embodiment of the present invention are known to those skilled in the art, and will not be repeated in order to reduce redundancy.

在本说明书的描述中，参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中，对上述术语的示意性表述不一定指的是相同的实施例或示例。而且，描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。In the description of this specification, descriptions referring to the terms "one embodiment", "some embodiments", "example", "specific examples", or "some examples" mean that specific features described in connection with the embodiment or example , structure, material or characteristic is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

尽管已经示出和描述了本发明的实施例，本领域的普通技术人员可以理解：在不脱离本发明的原理和宗旨的情况下可以对这些实施例进行多种变化、修改、替换和变型，本发明的范围由权利要求及其等同限定。Although the embodiments of the present invention have been shown and described, those skilled in the art can understand that various changes, modifications, substitutions and modifications can be made to these embodiments without departing from the principle and spirit of the present invention. The scope of the invention is defined by the claims and their equivalents.

Claims

1. in a kind of cloud environment model-driven Hadoop dispositions methods, it is characterised in that comprise the following steps：

S1：Hadoop demand models and Hadoop deployment models are provided, wherein, the Hadoop demand models are used for basis System requirements generate corresponding administration view, and the Hadoop deployment models are used to describe the node for attempting that manages with confidence Breath, running status and software are disposed；

S2：Realize that the model between the Hadoop demand models and the Hadoop deployment models turns according to default transformation rule Change, wherein, the default transformation rule includes node transformation model and cluster service transformation model, and the node transformation model is used Model conversion between the node of the node for realizing the Hadoop demand models and the Hadoop deployment models, the collection Group's service transformation model is used to realize the cluster service of the Hadoop demand models and the cluster of the Hadoop deployment models Model conversion between service；

S3：The information change feelings in the Hadoop demand models and the Hadoop deployment models are monitored using synchronization engine Condition, and information in the Hadoop demand models and/or the Hadoop deployment models to enter row information when changing same Step.

2. in cloud environment according to claim 1 model-driven Hadoop dispositions methods, it is characterised in that it is described Hadoop demand models include：

Clustered node module, the clustered node module is provided with infrastructure resources, and the infrastructure resources include node Configured list, node listing and corresponding resource and attribute in container image list；

Cluster service module, the cluster service module is provided with service list, the service list include various services and The attribute of every kind of service.

3. in cloud environment according to claim 2 model-driven Hadoop dispositions methods, it is characterised in that it is described Hadoop deployment models include：

Clustered node unit, the clustered node unit is stored with virtual machine configuration list, virtual machine list and virtual machine image List；

Cluster service unit, the cluster service unit is used to provide cluster service.

4. in cloud environment according to claim 3 model-driven Hadoop dispositions methods, it is characterised in that the node Element between node and the node of the Hadoop deployment models that transformation model passes through the Hadoop demand models maps Relation carrys out implementation model conversion, and the element mapping relations include helper labels and mapper labels, the helper labels Mapping relations for describing element between class and class, the mapping that the helper labels are used to describe attribute between class and class is closed System.

5. in cloud environment according to claim 3 model-driven Hadoop dispositions methods, it is characterised in that the cluster Service transformation model carries out the automatic conversion of cluster service by restricted model and default transfer algorithm, and the restricted model is used for Limit the incidence relation between multiple model elements, the default transfer algorithm according to the Hadoop demand models and it is described about Beam model generates service arrangement scheme.

6. in cloud environment according to claim 5 model-driven Hadoop dispositions methods, it is characterised in that it is described default Deployment Algorithm is comprised the following steps：

According to the service unit under service list in the Hadoop demand models, obtain needing the set of service of deployment；

According to the dependence between service unit in restricted model, the service in set of service is supplemented and sorted, obtained To the service ordered set for being actually needed deployment；

According to the service ordered set, each deployment side for servicing and calculating service is successively read according to the mode of backward Case；

According to the node set of service arrangement unit, the deployment for being serviced successively.

7. in cloud environment according to claim 1 model-driven Hadoop dispositions methods, it is characterised in that using SM@ RT instruments construct the Hadoop deployment models.