WO2019184480A1

WO2019184480A1 - Item recommendation

Info

Publication number: WO2019184480A1
Application number: PCT/CN2018/123411
Authority: WO
Inventors: 陈超超; 周俊
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2018-03-27
Filing date: 2018-12-25
Publication date: 2019-10-03
Anticipated expiration: 2020-09-27
Also published as: CN108647985B; TW201942834A; CN108647985A

Abstract

A method and a device for predicting a rating of an item by a user and a method and a device for item recommendation. Therating method for predicting a rating comprises: acquiring a plurality of sample pairs, a sample pair comprising any user identifier selected from a plurality of user identifiers and any item identifier selected from a plurality of item identifiers (S21); acquiring a plurality of existing ratings, the plurality of existing ratings corresponding to some of the plurality of sample pairs (S22); acquiring multiple sets of contextual features corresponding to the respective sample pairs respectively (S23); on the basis of the multiple sets of contextual features, clustering the plurality of sample pairs into a plurality of sub-categories (S24); and with regard to each sub-category, on the basis of the plurality of first user identifiers and the plurality of first item identifiers, as well as the plurality of existing ratings of the plurality of first items by the plurality of first users, predicting, by means of a collaborative filtering algorithm, the ratings of the first items not rated by the first users (S25).

Description

Item recommendation

相关申请的交叉引用Cross-reference to related applications

本专利申请要求于2018年3月27日提交的、申请号为201810257617.X、发明名称为“一种物品推荐方法和装置”的中国专利申请的优先权，该申请的全文以引用的方式并入本文中。The present application claims priority to Chinese Patent Application No. 201, 810, 257, 617, filed on March 27, 20, the entire disclosure of which is incorporated herein by reference. Into this article.

Technical field

本说明书实施例涉及数据处理领域，更具体地，涉及一种预测用户对物品的评分的方法和装置、以及一种物品推荐方法和装置。The embodiments of the present specification relate to the field of data processing, and more particularly, to a method and apparatus for predicting a user's rating of an item, and an item recommendation method and apparatus.

Background technique

在互联网中，推荐功能是频繁使用的一种功能。在现有的推荐系统中，一般依据已有的用户对物品的评分进行推荐。然而，在系统中，除了评分信息之外，还存在多种多样的信息。以电影推荐为例，除了用户对电影的评分信息以外，还有许多潜在的上下文特征，比如评分的时间(是否节假日，早上、中午、晚上等)，用户的年龄(青少年、中年还是老年)，电影的类型(如爱情，动作，恐怖)等等。因此，需要一种更有效的推荐方案，其除了利用显式的评分信息外，还可以利用所述上下文特征，以更有效地进行推荐。In the Internet, the recommendation function is a feature that is frequently used. In the existing recommendation system, the rating of the item is generally recommended based on the existing user. However, in the system, in addition to the rating information, there is a wide variety of information. Taking movie recommendation as an example, in addition to the user's rating information on the movie, there are many potential context features, such as the time of the rating (whether it is a holiday, morning, noon, evening, etc.), the age of the user (youth, middle-aged or old) , the type of film (such as love, action, horror) and so on. Therefore, there is a need for a more efficient recommendation that, in addition to utilizing explicit scoring information, can also utilize the contextual features to make recommendations more efficiently.

发明内容Summary of the invention

本说明书实施例旨在提供一种更有效的物品推荐方案，以解决现有技术中的不足。The embodiments of the present specification aim to provide a more effective item recommendation scheme to solve the deficiencies in the prior art.

为实现上述目的，本说明书一个方面提供一种预测用户对物品的评分的方法，包括：获取多个样本对，所述样本对包括选自于多个用户标识的任一个用户标识和选自于多个物品标识的任一个物品标识；获取多个已有评分，所述多个已有评分对应于所述多个样本对中的部分样本对；获取分别与各个样本对对应的多组上下文特征，其中，一组上下文特征包括以下至少一类特征：用户特征、物品特征、以及交互特征；基于所述多组上下文特征，将所述多个样本对聚类为多个子类，其中每个子类包括取自于所述多个样本对中的多个第一样本对，每个所述第一样本对包括第一用户标识和第一物品标识，其中所述第一用户标识为第一用户的标识，所述第一物品标识为第一物品的标识；以及关于每个子类，基于多个所述第一用户标识和多个所述第一物品标识、和多个所述第一用户相对于多个所述第一物品的多个已有评分，通过协同过滤算法预测各个第一用户对其未评分的第一物品的评分。To achieve the above object, an aspect of the present specification provides a method for predicting a user's rating of an item, comprising: acquiring a plurality of sample pairs, the sample pair comprising any one of the user identifiers selected from the plurality of user identifiers and selected from the group consisting of And identifying, by the plurality of item identifiers, the plurality of existing scores, the plurality of existing scores corresponding to the partial sample pairs of the plurality of sample pairs; acquiring the plurality of sets of context features respectively corresponding to the respective sample pairs The set of context features includes at least one of the following: a user feature, an item feature, and an interaction feature; and the plurality of sample pairs are clustered into a plurality of subclasses, wherein each subclass is based on the plurality of sets of context features Include a plurality of first sample pairs from the plurality of sample pairs, each of the first sample pairs including a first user identification and a first item identification, wherein the first user identification is first An identification of the user, the first item identification being an identification of the first item; and, regarding each sub-category, based on the plurality of the first user identification and the plurality of the first item identification, and And a plurality of the existing scores of the plurality of the first users relative to the plurality of the first items, and the scores of the first items that the first users have not scored are predicted by a collaborative filtering algorithm.

在一个实施例中，在所述预测用户对物品的评分的方法中，一组上下文特征包括以下至少一类特征：用户特征、物品特征、以及交互特征。In one embodiment, in the method of predicting a user's rating of an item, the set of contextual features includes at least one of the following characteristics: a user feature, an item feature, and an interactive feature.

在一个实施例中，在所述预测用户对物品的评分的方法中，所述用户特征包括用户属性特征和/或用户评分统计特征，所述物品特征包括物品属性特征和/或物品评分统计特征。In one embodiment, in the method of predicting a user's rating of an item, the user characteristic includes a user attribute feature and/or a user rating statistical feature, the item feature including an item attribute feature and/or an item rating statistical feature .

在一个实施例中，在所述预测用户对物品的评分的方法中，所述聚类算法为k-means算法或gmm算法。In one embodiment, in the method of predicting a user's rating of an item, the clustering algorithm is a k-means algorithm or a gmm algorithm.

在一个实施例中，在所述预测用户对物品的评分的方法中，基于所述多组上下文特征，将所述多个样本对聚类为多个子类包括：在所述多个样本对中随机选择预定数目的初始质心；基于所述上下文特征，计算每个非质心的样本对到各个质心的距离；根据所述距离，将每个非质心的样本对归类到距离最近的质心；根据所述预定数目的质心及其对应的非质心样本对，计算相同数目的新的质心；判断所述新的质心是否满足预定条件；以及在满足所述预定条件的情况中，输出对所述多个样本对的聚类结果。In one embodiment, in the method of predicting a user's rating of an item, clustering the plurality of sample pairs into a plurality of sub-categories based on the plurality of sets of context features comprises: in the plurality of sample pairs Randomly selecting a predetermined number of initial centroids; based on the context features, calculating a distance of each non-centroid sample pair to each centroid; according to the distance, classifying each non-centroid sample pair to the closest centroid; Calculating the same number of new centroids by the predetermined number of centroids and their corresponding non-centroid sample pairs; determining whether the new centroid meets a predetermined condition; and in the case that the predetermined condition is satisfied, outputting the plurality Clustering results for sample pairs.

在一个实施例中，在所述预测用户对物品的评分的方法中，所述协同过滤算法为矩阵分解算法或knn算法。In one embodiment, in the method of predicting a user's rating of an item, the collaborative filtering algorithm is a matrix decomposition algorithm or a knn algorithm.

在一个实施例中，在所述预测用户对物品的评分的方法中，通过协同过滤算法预测各个第一用户对其未评分的第一物品的评分包括：对于每个子类，基于所述多个第一用户标识、所述多个第一物品标识及所述多个第一用户相对于所述多个第一物品的所述多个已有评分，获取用户-物品评分矩阵；将所述用户-物品评分矩阵分解为两个低维矩阵，使得所述两个低维矩阵的乘积最接近所述用户-物品评分矩阵；根据将两个低维矩阵相乘获得的矩阵，预测所述用户-物品评分矩阵中各个第一用户对其未评分的第一物品的评分。In one embodiment, in the method of predicting a user's rating of an item, predicting, by a collaborative filtering algorithm, a score of each first user for a first item that is not scored includes: for each sub-category, based on the plurality of Obtaining a user-item scoring matrix by the first user identifier, the plurality of first item identifiers, and the plurality of existing scores of the plurality of first users relative to the plurality of first items; - the item scoring matrix is decomposed into two low dimensional matrices such that the product of the two low dimensional matrices is closest to the user-item scoring matrix; the user is predicted based on a matrix obtained by multiplying two low dimensional matrices - A score for each first user in the item rating matrix for which the first item was not scored.

在一个实施例中，在所述预测用户对物品的评分的方法中，所述已有评分为用户直接评分或基于用户操作获取的评分。In one embodiment, in the method of predicting a user's rating of an item, the existing rating is a rating directly scored by the user or based on a user operation.

本说明书另一方面提供一种物品推荐方法，包括：获取多个第二样本对，所述第二样本对包括第二用户标识和第二物品标识，其中，所述第二用户标识为待推荐用户的用户标识，所述第二物品标识为对应于多个待推荐物品的多个物品标识中的任一个物品标识；在通过上述预测评分的方法获取的多个子类中，确定各个所述第二样本对所在的子类；从通过上述预测评分的方法预测的评分中，获取每个所述第二样本对在其所属子类中对应的预测评分；根据所述预测评分，对所述各个第二样本对中包括的第二物品标识进行排序；以及根据所述排序，对所述第二用户推荐所述第二物品。Another aspect of the present specification provides an item recommendation method, including: acquiring a plurality of second sample pairs, the second sample pair including a second user identifier and a second item identifier, wherein the second user identifier is to be recommended a user identifier of the user, the second item identifier being any one of the plurality of item identifiers corresponding to the plurality of items to be recommended; determining, in each of the plurality of sub-categories obtained by the method of predicting the score a sub-category in which the two sample pairs are located; a score corresponding to each of the second sample pairs in the sub-category thereof is obtained from the score predicted by the method for predicting the score by the above; Sorting the second item identifications included in the second sample pair; and recommending the second item to the second user based on the ranking.

本说明书另一方面提供一种预测用户对物品的评分的装置，包括：样本对获取单元，配置为，获取多个样本对，所述样本对包括选自于多个用户标识的任一个用户标识和选自于多个物品标识的任一个物品标识；评分获取单元，配置为，获取多个已有评分，所述多个已有评分对应于所述多个样本对中的部分样本对；上下文特征获取单元，配置为，获取分别与各个样本对对应的多组上下文特征，其中，一组上下文特征包括以下至少一类特征：用户特征、物品特征、以及交互特征；聚类单元，配置为，基于所述多组上下文特征，将所述多个样本对聚类为多个子类，其中每个子类包括取自于所述多个样本对中的多个第一样本对，每个所述第一样本对包括第一用户标识和第一物品标识，其中所述第一用户标识为第一用户的标识，所述第一物品标识为第一物品的标识；以及评分预测单元，配置为，关于每个子类，基于多个所述第一用户标识和多个所述第一物品标识、和多个所述第一用户相对于多个所述第一物品的多个已有评分，通过协同过滤算法预测各个第一用户对其未评分的第一物品的评分。Another aspect of the present specification provides an apparatus for predicting a user's rating of an item, comprising: a sample pair obtaining unit configured to acquire a plurality of sample pairs, the sample pair including any one of the user identifiers selected from the plurality of user identifiers And any one of the item identifiers selected from the plurality of item identifiers; the score obtaining unit configured to acquire a plurality of existing scores, the plurality of existing scores corresponding to the partial sample pairs of the plurality of sample pairs; The feature acquiring unit is configured to acquire a plurality of sets of context features respectively corresponding to the respective sample pairs, wherein the set of context features includes at least one of the following types of features: a user feature, an item feature, and an interaction feature; the clustering unit is configured to Generating the plurality of sample pairs into a plurality of sub-categories based on the plurality of sets of context features, wherein each sub-class comprises a plurality of first sample pairs taken from the plurality of sample pairs, each of the The first sample pair includes a first user identification and a first item identification, wherein the first user identification is an identification of the first user, and the first item identification is a first item And a score prediction unit configured to, based on each of the plurality of the first user identifiers and the plurality of the first item identifiers, and the plurality of the first users relative to the plurality of first A plurality of existing scores of the item, and a score of each first user for the first item that is not scored is predicted by a collaborative filtering algorithm.

在一个实施例中，在所述预测用户对物品的评分的装置中，所述聚类单元包括：选择单元，配置为，在所述多个样本对中随机选择预定数目的初始质心；第一计算单元，配置为，基于所述上下文特征，计算每个非质心的样本对到各个质心的距离；归类单元，配置为，根据所述距离，将每个非质心的样本对归类到距离最近的质心；第二计算单元，配置为，根据所述预定数目的质心及其对应的非质心样本对，计算相同数目的新的质心；判断单元，配置为，判断所述新的质心是否满足预定条件；以及输出单元，配置为，在满足所述预定条件的情况中，输出对所述多个样本对的聚类结果。In one embodiment, in the device for predicting a user's rating of an item, the clustering unit includes: a selecting unit configured to randomly select a predetermined number of initial centroids among the plurality of sample pairs; a calculating unit configured to calculate, according to the context feature, a distance of each non-centroid sample pair to each centroid; the categorizing unit configured to classify each non-centroid sample pair to a distance according to the distance a second centroid; the second calculating unit configured to calculate the same number of new centroids according to the predetermined number of centroids and their corresponding non-centroid sample pairs; the determining unit is configured to determine whether the new centroid is satisfied a predetermined condition; and an output unit configured to output a clustering result for the plurality of sample pairs in a case where the predetermined condition is satisfied.

在一个实施例中，在所述预测用户对物品的评分的装置中，所述评分预测单元包括：获取单元，配置为，对于每个子类，基于所述多个第一用户标识、所述多个第一物品标识及所述多个第一用户相对于所述多个第一物品的所述多个已有评分，获取用户-物品评分矩阵；分解单元，配置为，将所述用户-物品评分矩阵分解为两个低维矩阵，使得所述两个低维矩阵的乘积最接近所述用户-物品评分矩阵；以及预测单元，配置为，根据将两个低维矩阵相乘获得的矩阵，预测所述用户-物品评分矩阵中各个第一用户对其未评分的第一物品的评分。In one embodiment, in the device for predicting a user's rating of an item, the rating prediction unit includes: an obtaining unit configured to, based on the plurality of first user identifiers, the plurality of sub-categories a first item identifier and the plurality of existing scores of the plurality of first users relative to the plurality of first items, acquiring a user-item scoring matrix; and a disassembling unit configured to: the user-item The scoring matrix is decomposed into two low-dimensional matrices such that the product of the two low-dimensional matrices is closest to the user-item scoring matrix; and the prediction unit is configured to be based on a matrix obtained by multiplying two low-dimensional matrices, A score for each of the first users in the user-item scoring matrix for which the first item was not scored is predicted.

本说明书另一方面提供一种物品推荐装置，包括：样本对获取单元，配置为，获取多个第二样本对，所述第二样本对包括第二用户标识和第二物品标识，其中，所述第二用户标识为待推荐用户的用户标识，所述第二物品标识为对应于多个待推荐物品的多个物品标识中的任一个物品标识；确定单元，配置为，在通过上述预测评分的方法获取的多个子类中，确定各个所述第二样本对所在的子类；预测评分获取单元，配置为，从通过所述预测评分的方法预测的评分中，获取每个所述第二样本对在其所属子类中对应的预测评分；排序单元，配置为，根据所述预测评分，对所述各个第二样本对中包括的第二物品标识进行排序；以及推荐单元，配置为，根据所述排序，对所述第二用户推荐所述第二物品。In another aspect, the present specification provides an item recommendation apparatus, including: a sample pair acquisition unit configured to acquire a plurality of second sample pairs, the second sample pair including a second user identifier and a second item identifier, wherein The second user identifier is a user identifier of the user to be recommended, and the second item identifier is any one of the plurality of item identifiers corresponding to the plurality of items to be recommended; the determining unit is configured to score according to the prediction a plurality of subclasses obtained by the method, determining a subclass in which each of the second sample pairs is located; a prediction score obtaining unit configured to acquire each of the second ones from a score predicted by the method of predicting the score And a ranking unit configured to sort the second item identifiers included in each of the second sample pairs according to the predicted score; and a recommendation unit configured to: The second item is recommended to the second user based on the ranking.

在根据本说明书实施例的物品推荐方法中，通过使用用户-物品的上下文特征对用户-物品对进行聚类，使得每个子类的评分噪音更小，相关性更高，因此，在每个子类中使用协同过滤方法，可以获得更好的推荐性能。In the item recommendation method according to the embodiment of the present specification, the user-item pair is clustered by using the context feature of the user-item, so that the scoring noise of each sub-class is smaller and the correlation is higher, and therefore, in each sub-class Use the collaborative filtering method to get better recommendation performance.

DRAWINGS

通过结合附图描述本说明书实施例，可以使得本说明书实施例更加清楚：The embodiments of the present specification can be more clearly understood by describing the embodiments of the specification with reference to the accompanying drawings:

图1示出了根据本说明书实施例的系统100的示意图；FIG. 1 shows a schematic diagram of a system 100 in accordance with an embodiment of the present specification;

图2示意示出了根据本说明书实施例的一种预测用户对物品的评分的方法的流程图；2 is a flow chart showing a method of predicting a user's rating of an item in accordance with an embodiment of the present specification;

图3示意示出了与用户-物品对应的多组上下文特征；Figure 3 schematically illustrates a plurality of sets of context features corresponding to user-items;

图4示出了根据本说明书实施例的通过K-means算法进行聚类的流程图；4 illustrates a flow chart for clustering by the K-means algorithm in accordance with an embodiment of the present specification;

图5示出了根据本说明书实施例的通过协同过滤算法预测评分的方法流程图；FIG. 5 illustrates a flow chart of a method for predicting a score by a collaborative filtering algorithm in accordance with an embodiment of the present specification; FIG.

图6示意示出了矩阵分解的过程；Figure 6 shows schematically the process of matrix decomposition;

图7示出了根据本说明书实施例的一种物品推荐方法的流程图；FIG. 7 is a flow chart showing an item recommendation method according to an embodiment of the present specification;

图8示出了根据本说明书实施例的一种预测用户对物品的评分的装置800；Figure 8 illustrates an apparatus 800 for predicting a user's rating of an item in accordance with an embodiment of the present specification;

图9示出根据本说明书实施例的一种物品推荐装置900。FIG. 9 illustrates an item recommendation device 900 in accordance with an embodiment of the present specification.

detailed description

下面将结合附图描述本说明书实施例。Embodiments of the present specification will be described below with reference to the drawings.

图1示出了根据本说明书实施例的系统100的示意图。如图1所示，系统100包括聚类模块11、预测评分模块12和推荐模块13。首先，将多个用户-物品对及其对应的多组上下文特征输入给聚类模块11。聚类模块11通过对由每组上下文特征构成的多个特征向量进行聚类，而获得对用户-物品对的聚类，即，将每个用户-物品对都聚类到对应的子类中。然后，聚类模块11将通过聚类获得的多个子类发送给预测评分模块12。同时，将各个子类包括的用户对物品的已有评分发送给预测评分模块12。预测评分模块12在各个子类中利用所述已有评分，通过协同过滤算法预测子类中的用户对物品的缺失的评分。在通过推荐模块13对用户进行推荐时，推荐模块13通过用户标识和待推荐物品标识，确定用户-待推荐物品对所在的子类，从预测评分模块12获取关于该子类的该用户-待推荐物品对的预测评分，并根据多个待推荐物品的预测评分的排序，向用户推荐物品。FIG. 1 shows a schematic diagram of a system 100 in accordance with an embodiment of the present specification. As shown in FIG. 1, the system 100 includes a clustering module 11, a predictive scoring module 12, and a recommendation module 13. First, a plurality of user-item pairs and their corresponding sets of contextual features are input to the clustering module 11. The clustering module 11 obtains clustering of user-item pairs by clustering a plurality of feature vectors composed of each set of context features, that is, clustering each user-item pair into a corresponding sub-category . Then, the clustering module 11 sends the plurality of subclasses obtained by the clustering to the prediction scoring module 12. At the same time, the existing ratings of the items by the users included in each sub-category are sent to the predictive scoring module 12. The predictive scoring module 12 utilizes the existing scoring in each sub-category to predict the user's score for the missing item in the sub-category through a collaborative filtering algorithm. When recommending the user through the recommendation module 13, the recommendation module 13 determines the sub-category of the user-to-recommended item pair by using the user identification and the item identification to be recommended, and acquires the user-to-subsidiary from the prediction scoring module 12 The predicted score of the recommended item pair is estimated, and the item is recommended to the user according to the ranking of the predicted scores of the plurality of items to be recommended.

图2示意示出了根据本说明书实施例的一种预测用户对物品的评分的方法的流程图，包括：在步骤S21，获取多个样本对，所述样本对包括选自于多个用户标识的任一个用户标识和选自于多个物品标识的任一个物品标识；在步骤S22，获取多个已有评分，所述多个已有评分对应于所述多个样本对中的部分样本对；在步骤S23，获取分别与各个样本对对应的多组上下文特征，其中，一组上下文特征包括以下至少一类特征：用户特征、物品特征、以及交互特征；在步骤S24，基于所述多组上下文特征，将所述多个样本对聚类为多个子类，其中每个子类包括取自于所述多个样本对中的多个第一样本对，每个所述第一样本对包括第一用户标识和第一物品标识，其中所述第一用户标识为第一用户的标识，所述第一物品标识为第一物品的标识；以及在步骤S25，关于每个子类，基于多个所述第一用户标识和多个所述第一物品标识、和多个所述第一用户相对于多个所述第一物品的多个已有评分，通过协同过滤算法预测各个第一用户对其未评分的第一物品的评分。2 is a flow chart schematically showing a method of predicting a user's rating of an item according to an embodiment of the present specification, including: in step S21, acquiring a plurality of sample pairs, the sample pair including being selected from a plurality of user identifiers Any one of the user identifiers and any one of the plurality of item identifiers; and in step S22, obtaining a plurality of existing scores, the plurality of existing scores corresponding to the plurality of sample pairs And acquiring, in step S23, a plurality of sets of context features respectively corresponding to the respective sample pairs, wherein the set of context features comprises at least one of the following types of features: a user feature, an item feature, and an interaction feature; and in step S24, based on the plurality of groups a context feature, clustering the plurality of sample pairs into a plurality of subclasses, wherein each subclass includes a plurality of first sample pairs taken from the plurality of sample pairs, each of the first sample pairs The first user identifier and the first item identifier are included, wherein the first user identifier is an identifier of the first user, the first item identifier is an identifier of the first item; and in step S25, regarding each child a class, based on a plurality of the first user identifiers and a plurality of the first item identifiers, and a plurality of existing scores of the plurality of first users relative to the plurality of first items, predicted by a collaborative filtering algorithm A score for each first user for the first item that they did not score.

首先，在步骤S21，获取多个样本对，所述样本对包括选自于多个用户标识的任一个用户标识和选自于多个物品标识的任一个物品标识。所述样本对即用户-物品对，其可以表示为(用户标识，物品标识)。所述用户可以是推荐系统中的全部用户，例如，在豆瓣电影APP中包括的全部用户、淘宝中包括的全部用户等。当然，所述多个用户不必须是推荐系统中的全部用户，其例如也可以是推荐系统中的一个单元所涉及的系统部分用户。所述物品可以是推荐系统中包括的全部物品，例如，豆瓣电影中的电影、淘宝中的商品等。同理，所述多个物品不必须是系统中的全部物品，其也可以是系统中一定范围内的部分物品。通过将多个用户中的每个用户与多个物品中的每个物品两两组合，从而获得多个用户-物品对。First, in step S21, a plurality of sample pairs are acquired, the sample pairs including any one of the user identifications selected from the plurality of user identifications and any one of the item identifications selected from the plurality of item identifications. The sample pair is a user-item pair, which can be represented as (user identification, item identification). The user may be all users in the recommendation system, for example, all users included in the Douban movie APP, all users included in Taobao, and the like. Of course, the plurality of users do not have to be all users in the recommendation system, and for example, they may also be system part users involved in one unit in the recommendation system. The item may be all items included in the recommendation system, for example, a movie in a watercress movie, a commodity in Taobao, or the like. Similarly, the plurality of items need not be all items in the system, but may also be part of a certain range of items in the system. A plurality of user-item pairs are obtained by combining each of a plurality of users with each of a plurality of items.

在步骤S22，获取多个已有评分，所述多个已有评分对应于所述多个样本对中的部分样本对。这里，已有评分可以是用户的直接评分，例如，在豆瓣电影中，用户会以1到5的分值对每个电影进行评分。在另一个实例中，通过用户的操作间接获取所述已有评分。例如，在淘宝中，可基于用户对物品的点击、购买等操作，计算出用户对物品的评分。在推荐系统中，通常只有部分用户对部分物品的评分，例如，在豆瓣电影中，有的用户只是浏览，不对电影进行打分，或者，有的电影过于生僻，没有用户对其进行打分。因此，只有部分样本对具有对应的用户对物品的已有评分。In step S22, a plurality of existing scores are obtained, and the plurality of existing scores correspond to a partial sample pair of the plurality of sample pairs. Here, the existing rating may be a direct rating of the user, for example, in a Douban movie, the user will score each movie with a score of 1 to 5. In another example, the existing rating is obtained indirectly by a user's operation. For example, in Taobao, the user's rating of the item can be calculated based on the user's operation of clicking, purchasing, etc. on the item. In the recommendation system, usually only some users score some items. For example, in the Douban movie, some users just browse, do not score the movie, or some movies are too unfamiliar, and no user scores them. Therefore, only a portion of the sample pairs have a corresponding user's existing rating for the item.

在步骤S23，获取分别与各个样本对对应的多组上下文特征，其中，一组上下文特征包括以下至少一类特征：用户特征、物品特征、以及交互特征。不同的推荐场景存在不同的特征类型，例如，在豆瓣电影中，与用户-物品对对应的上下文特征通常可分为以下几类特征：用户静态特征，例如用户的年龄特征，青少年、中年和老年，用户的性别特征等等；物品静态特征，如电影类别，爱情，动作，恐怖，等等；用户评分统计特征，如用户评分的平均分，方差等；物品评分统计特征，如电影的平均评分，方差等；交互特征，如评分时间是否节假日，早上、中午、晚上等。可从用户资料、物品属性及用户-物品交互信息获取所述上下文特征。In step S23, multiple sets of context features respectively corresponding to respective sample pairs are acquired, wherein the set of context features includes at least one of the following types of features: user features, item features, and interactive features. Different recommendation scenarios have different feature types. For example, in Douban movies, the context features corresponding to user-item pairs can be generally classified into the following types of features: user static features, such as user age characteristics, teens, middle-aged, and Old age, gender characteristics of users, etc.; static characteristics of items, such as movie categories, love, action, horror, etc.; user rating statistical characteristics, such as the average score of user ratings, variance, etc.; statistical characteristics of item ratings, such as the average of movies Rating, variance, etc.; interactive characteristics, such as whether the rating time is a holiday, morning, noon, evening, etc. The contextual feature can be obtained from user profiles, item attributes, and user-item interaction information.

图3示意示出了与用户-物品对应的多组上下文特征。图中u ₁、u ₂、u ₃和u ₄为用户标识，v ₁、v ₂、v ₃和v ₄为物品标识，u _i与v _j相交的方格表示一个用户-物品对，方格中的数字3、4、5等为对应的用户对物品的评分。在每个用户-物品对方格的后方，都包括一列方块，其示意表示对应于该用户-物品对的上下文特征组。该上下文特征组包括与该用户-物品对中包括的用户、物品及其交互相关的至少一个特征。 Figure 3 illustrates schematically a plurality of sets of contextual features corresponding to user-items. In the figure, u ₁ , u ₂ , u ₃ and u ₄ are user identifiers, v ₁ , v ₂ , v ₃ and v ₄ are item identifiers, and the squares where u _i and v _j intersect represent a user-item pair, square The numbers 3, 4, 5, etc. in the figure are the corresponding user's ratings of the items. Behind each user-item compartment includes a list of squares that schematically represent the set of contextual characteristics corresponding to the user-item pair. The set of context features includes at least one feature associated with the user, the item, and their interactions included in the user-item pair.

在步骤S24，基于所述多组上下文特征，将所述多个样本对聚类为多个子类，其中每个子类包括取自于所述多个样本对中的多个第一样本对，每个所述第一样本对包括第一用户标识和第一物品标识，其中所述第一用户标识为第一用户的标识，所述第一物品标识为第一物品的标识。At step S24, the plurality of sample pairs are clustered into a plurality of sub-categories based on the plurality of sets of context features, wherein each sub-class includes a plurality of first sample pairs taken from the plurality of sample pairs, Each of the first sample pairs includes a first user identification and a first item identification, wherein the first user identification is an identification of the first user, and the first item identification is an identification of the first item.

可以将上下文特征组以特征向量的形式表示，该特征向量的维度为一组上下文特征中包括的特征数，并且，该特征向量中的每个分量表示在对应的特征维度中的特征值。例如，一组上下文特征可能包括：年龄，中年；电影类型，爱情。通过将年龄维度中的取值量化为：1(青少年)、2(中年)、3(老年)，将电影类型维度中的取值量化为：1(爱情)、2(动作)、3(恐怖)，从而获得对应于该组上下文特征的特征向量：(2， 1)，其中第一分量表示年龄特征维度，第二个分量表示电影类型特征维度。从而可在由各个特征维度构成的特征空间中以向量点定位与所述上下文特征组对应的特征向量。不同用户-物品对对应的特征向量可能是相等的，即在维度空间中重合在一点上，即，该点对应于多个用户-物品对。The set of context features may be represented in the form of a feature vector whose dimensions are the number of features included in a set of context features, and each component of the feature vector represents a feature value in the corresponding feature dimension. For example, a set of contextual characteristics may include: age, middle age; movie type, love. By quantifying the values in the age dimension as: 1 (adolescent), 2 (middle-age), 3 (older), the values in the movie type dimension are quantified as: 1 (love), 2 (action), 3 ( Horror), thereby obtaining a feature vector corresponding to the set of contextual features: (2, 1), where the first component represents the age feature dimension and the second component represents the movie type feature dimension. Thereby, the feature vector corresponding to the context feature group can be located with a vector point in the feature space composed of the respective feature dimensions. The corresponding feature vectors of different user-item pairs may be equal, ie coincide in a dimension space at a point, ie, the point corresponds to a plurality of user-item pairs.

通过以上述方式将上下文特征组表示为特征空间中的向量点之后，可通过各种聚类算法对这些向量点进行聚类，例如K-means算法、gmm(高斯混合模型)算法、BIRCH算法、OPTICS算法等等。After the context feature set is represented as a vector point in the feature space in the above manner, the vector points can be clustered by various clustering algorithms, such as K-means algorithm, gmm (Gaussian mixed model) algorithm, BIRCH algorithm, OPTICS algorithm and so on.

下面将以K-means为例说明根据本说明书实施例的聚类过程。图4示出了根据本说明书实施例的通过K-means算法进行聚类的流程图。在步骤S41，在所述多个特征向量点中随机选择预定数目的初始质心。该预定数目即K-means算法中需预先确定的k。在本说明书实施例中，可通过预估的子类数确定k，例如，针对豆瓣电影，预估的子类可包括：(青少年，爱情)、(青少年、动作)、(青少年、恐怖)、(中年、爱情)、(中年、动作)、(中年、恐怖)、(老年、爱情)、(老年、动作)、(老年、恐怖)，因此，可将k设定为9。即，k的值与特征数及其组合相关。在确定好k之后，在选择初始质心时，优选选择分散的k个初始质心。The clustering process according to an embodiment of the present specification will be described below by taking K-means as an example. 4 shows a flow chart for clustering by the K-means algorithm in accordance with an embodiment of the present specification. In step S41, a predetermined number of initial centroids are randomly selected among the plurality of feature vector points. The predetermined number is k which needs to be predetermined in the K-means algorithm. In the embodiment of the present specification, k can be determined by the estimated number of sub-categories. For example, for the Douban movie, the estimated sub-categories may include: (youth, love), (youth, action), (youth, horror), (middle-aged, love), (middle-aged, action), (middle-aged, horror), (old age, love), (old age, action), (old age, horror), therefore, k can be set to 9. That is, the value of k is related to the number of features and combinations thereof. After determining the k, it is preferred to select the dispersed k initial centroids when selecting the initial centroid.

在步骤S42，基于各个特征向量点，计算每个非质心点到各个质心点的距离。所述距离可以采用各种计算形式，例如，其可以为欧式距离、明氏(Minkowsky)距离、马氏(Manhattan)距离等。在步骤S43，根据所述距离，将每个非质心点对归类到距离最近的质心，从而获得k个簇。At step S42, the distance from each non-centroid point to each centroid point is calculated based on each feature vector point. The distance may take various forms of calculation, for example, it may be Euclidean distance, Minkowsky distance, Manhattan distance, and the like. At step S43, each non-centroid point pair is classified into the closest centroid according to the distance, thereby obtaining k clusters.

在步骤S44，根据所述预定数目的质心点及其对应的非质心点，计算相同数目的新的质心，使得全部点到自己所属的簇中心的距离之和最小，即，如公式(1)所示，新的质心

为簇中的全部向量点的平均向量。 In step S44, the same number of new centroids are calculated according to the predetermined number of centroid points and their corresponding non-centroid points, so that the sum of the distances of all the points to the center of the cluster to which they belong is the smallest, that is, as in formula (1) As shown, the new centroid

Is the average vector of all vector points in the cluster.

在步骤S45，判断所述新的质心是否满足预定条件，例如，预定条件为，新的质心相对于原有的质心未发生变化。In step S45, it is judged whether or not the new centroid satisfies a predetermined condition, for example, the predetermined condition is that the new centroid does not change with respect to the original centroid.

在不满足所述预定条件的情况中，流程回到步骤S42，以重复步骤S42-S45，在满足所述预定条件的情况中，流程进到步骤S46。在步骤S46，输出聚类结果，所述聚类结果包括多个簇及每个簇中包括的点，所述点对应于特征向量，即，对应于用户-物品对。从而基于上下文特征，将多个用户-物品对聚类到多个子类中。其中每个子类包括取自于所述多个用户-物品对中的多个第一用户-物品对，每个所述第一用户-物品对包括第一用户标识和第一物品标识，其中所述第一用户标识为第一用户的标识，所述第一物品标识为第一物品的标识。In the case where the predetermined condition is not satisfied, the flow returns to step S42 to repeat steps S42-S45, and in the case where the predetermined condition is satisfied, the flow advances to step S46. In step S46, a clustering result is outputted, the clustering result including a plurality of clusters and points included in each cluster, the points corresponding to feature vectors, that is, corresponding to user-item pairs. Thereby multiple user-item pairs are clustered into multiple sub-categories based on contextual characteristics. Each of the sub-categories includes a plurality of first user-item pairs taken from the plurality of user-item pairs, each of the first user-item pairs including a first user identification and a first item identification, wherein The first user identifier is an identifier of the first user, and the first item identifier is an identifier of the first item.

再参考图2，在步骤S25，关于每个子类，基于多个所述第一用户标识和多个所述第一物品标识、和多个所述第一用户相对于多个所述第一物品的多个已有评分，通过协同过滤算法预测各个第一用户对其未评分的第一物品的评分。Referring again to FIG. 2, in step S25, based on each of the plurality of the first user identifiers and the plurality of the first item identifiers, and the plurality of the first users relative to the plurality of the first items A plurality of existing scores are predicted by a collaborative filtering algorithm for each first user to score a first item that is not scored.

这里的协同过滤算法可采用各种算法，例如knn算法或矩阵分解算法。下面以矩阵分解算法为例说明根据本说明书实施例的预测评分的过程。图5示出了根据本说明书实施例的通过协同过滤算法预测评分的方法流程图。The collaborative filtering algorithm herein may employ various algorithms such as a knn algorithm or a matrix decomposition algorithm. The process of predicting a score according to an embodiment of the present specification will be described below by taking a matrix decomposition algorithm as an example. FIG. 5 illustrates a flow chart of a method for predicting scores by a collaborative filtering algorithm in accordance with an embodiment of the present specification.

如图5所示，首先在步骤S51，对于每个子类，基于所述多个第一用户标识、所述多个第一物品标识及所述多个第一用户相对于所述多个第一物品的所述多个已有评分，获取用户-物品评分矩阵。图6示意示出了矩阵分解的过程。图6中的左侧的矩阵示意示出了一个用户-物品评分矩阵，其中u ₁、u ₂、u ₃和u ₄为用户标识，v ₁、v ₂、v ₃、v ₄和v ₅为物品标识，u _i与v _j相交的方格中的数字表示u _i对v _j的评分，其中的“？”表示u _i对v _j未评分。 As shown in FIG. 5, first, in step S51, for each subclass, based on the plurality of first user identifiers, the plurality of first item identifiers, and the plurality of first users relative to the plurality of first The plurality of existing scores of the item obtain a user-item rating matrix. Figure 6 shows schematically the process of matrix decomposition. The matrix on the left in Figure 6 schematically shows a user-item scoring matrix, where u ₁ , u ₂ , u ₃ and u ₄ are user identities, v ₁ , v ₂ , v ₃ , v ₄ and v ₅ are The item identification, the number in the square intersecting u _i and v _j represents the score of u _i versus v _j , where "?" indicates that u _{i is} not scored for v _j .

在步骤S52，将所述用户-物品评分矩阵分解为两个低维矩阵，使得所述两个低维矩阵的乘积最接近所述用户-物品评分矩阵。设用户-评分矩阵为R，可将其分解为用户矩阵的转置矩阵U ^T和物品矩阵V，即R＝U ^TV。使得所述两个低维矩阵的乘积最接近所述用户-物品评分矩阵，也就是使得所述两个低维矩阵的乘积与所述用户-物品评分矩阵的差最小。因此，目标函数可设为以下公式(2)： In step S52, the user-item scoring matrix is decomposed into two low-dimensional matrices such that the product of the two low-dimensional matrices is closest to the user-item scoring matrix. Let the user-scoring matrix be R, which can be decomposed into the transposed matrix U ^T of the user matrix and the item matrix V, ie R=U ^T V. The product of the two low dimensional matrices is brought closest to the user-item scoring matrix, ie, the difference between the product of the two low dimensional matrices and the user-item scoring matrix is minimized. Therefore, the objective function can be set to the following formula (2):

可通过例如梯度下降算法迭代计算U和V，从而获得使得所述目标函数最小的两个低维矩阵U和V。例如，如图6所示，图6中间相乘的两个矩阵即为通过例如梯度下降算法获得的两个低维矩阵U ^T和V。 U and V can be iteratively calculated by, for example, a gradient descent algorithm to obtain two low dimensional matrices U and V that minimize the objective function. For example, as shown in FIG. 6, the two matrices multiplied by FIG. 6 are the two low-dimensional matrices U ^T and V obtained by, for example, a gradient descent algorithm.

在步骤S53，根据将两个低维矩阵相乘获得的矩阵，预测所述用户-物品评分矩阵中各个用户对其未评分的物品的评分。例如，如图6所示，通过将U ^T与V相乘，获得图6右侧所示的预测矩阵。对比图6中的的评分矩阵与预测矩阵，可见，预测矩阵中的灰色方格中的评分等于(或尽可能接近)评分矩阵中的已有评分，而预测矩阵中的白色方格中的评分即为通过矩阵分解算法预测的评分。 At step S53, a score of each user in the user-item scoring matrix for the ungraded item is predicted based on a matrix obtained by multiplying two low-dimensional matrices. For example, as shown in FIG. 6, by multiplying U ^T by V, the prediction matrix shown on the right side of FIG. 6 is obtained. Comparing the scoring matrix and the prediction matrix in Figure 6, it can be seen that the score in the gray square in the prediction matrix is equal to (or as close as possible to) the existing score in the scoring matrix, and the score in the white square in the prediction matrix. This is the score predicted by the matrix decomposition algorithm.

图7示出了根据本说明书实施例的一种物品推荐方法的流程图。所述方法包括：在步骤S71，获取多个样本对，所述样本对包括用户标识和物品标识，其中，所述用户标识为待推荐用户的用户标识，所述物品标识为对应于多个待推荐物品的多个物品标识中的任一个物品标识；在步骤S72，在通过上述预测评分的方法获取的多个子类中，确定各个样本对所在的子类；在步骤S73，从通过上述预测评分的方法预测的评分中，获取每个所述样本对在其所属子类中对应的预测评分；在步骤S74，根据所述预测评分，对所述各个样本对中包括的物品标识进行排序；以及，在步骤S75，根据所述排序，对所述用户推荐物品。FIG. 7 shows a flow chart of an item recommendation method in accordance with an embodiment of the present specification. The method includes: in step S71, acquiring a plurality of sample pairs, where the sample pair includes a user identifier and an item identifier, wherein the user identifier is a user identifier of a user to be recommended, and the item identifier corresponds to multiple to-be-requested Determining any one of the plurality of item identifications of the item; in step S72, determining, in the plurality of sub-categories obtained by the above-described method of predictive scoring, a sub-category in which each sample pair is located; and in step S73, scoring from the above-mentioned prediction a method for predicting a score, obtaining a corresponding predicted score for each of the sample pairs in a subclass to which it belongs; and, at step S74, sorting the item identifiers included in the respective sample pairs according to the predicted score; At step S75, an item is recommended to the user based on the ranking.

首先，在步骤S71，获取多个样本对，所述样本对包括用户标识和物品标识，其中，所述用户标识为待推荐用户的用户标识，所述物品标识为对应于多个待推荐物品的多个物品标识中的任一个物品标识。例如，当用户u ₁在豆瓣电影中打开关于电影v ₁的页面之后，或者当用户u ₁在淘宝中打开商品v ₁的购买页面之后，在诸如此类的场景中，系统会启动物品推荐流程。此时，系统根据用户标识u ₁和用户操作的物品的物品标识v ₁召回向用户u ₁推荐的物品候选集。这里的召回是根据预定条件对推荐物品的粗筛，例如根据用户的初始喜好生成候选集、根据物品的属性(例如，当物品为推荐饭店时，该属性例如为地理位置)生成候选集等。将用户标识u ₁分别与候选集中的每个物品的物品标识v _i相组合，从而可获得多个样本对。 First, in step S71, a plurality of sample pairs are acquired, where the sample pair includes a user identifier and an item identifier, wherein the user identifier is a user identifier of a user to be recommended, and the item identifier is corresponding to a plurality of items to be recommended. Any one of a plurality of item identifications. For example, when the user u ₁ Open the page on the film v ₁ in the watercress movie, or when the user u ₁ purchase page opens goods v Taobao in _1, in the sort of scenario, the system will start items recommended procedure. At this time, the system recalls the item candidate set recommended to the user u ₁ based on the user identification u ₁ and the item identification v ₁ of the item operated by the user. The recall here is a coarse screening of recommended items according to predetermined conditions, for example, generating a candidate set according to the user's initial preference, generating a candidate set or the like according to the attributes of the item (for example, when the item is a recommended restaurant, the attribute is, for example, a geographical position). The user identification u _{1 is} combined with the item identification v _i of each item in the candidate set, respectively, such that multiple sample pairs are available.

在步骤S72，在通过上述预测评分的方法获取的多个子类中，确定各个样本对所在的子类。根据上述预测评分方法，可以明确，一个样本对对应于一个特征向量，即对应于向量空间中的一个点。因此，一个样本对只可能被归类到一个子类中。从而，通过样本对中的用户标识和物品标识，可以在上述获得的多个子类中搜索出该样本对，从而确定该样本对所在的子类。类似地，可以获得这里的各个样本对所在的子类。In step S72, among the plurality of subclasses obtained by the above-described method of predictive scoring, the subclass in which each sample pair is located is determined. According to the above prediction scoring method, it can be clarified that one sample pair corresponds to one feature vector, that is, corresponds to one point in the vector space. Therefore, a sample pair can only be classified into one subclass. Thus, by the user identification and the item identification in the sample pair, the sample pair can be searched for among the plurality of sub-classes obtained above, thereby determining the sub-category in which the sample pair is located. Similarly, the subclasses in which each sample pair is located can be obtained.

在步骤S73，从通过上述预测评分的方法预测的评分中，获取每个所述样本对在其所属子类中对应的预测评分。如上述参考图5中所述，在每个子类中，通过协同过滤算法预测子类中的各个用户对其未评分的子类中的物品的评分。从而，在确定样本对所在的子类之后，可从与该子类关联的全部预测评分中获取与该样本对对应的预测评分。In step S73, from the scores predicted by the above-described method of predictive scoring, the predicted scores corresponding to each of the sample pairs in the subclass to which they belong are obtained. As described above with reference to FIG. 5, in each sub-category, the scores of the items in the unclassified sub-categories of the individual users in the sub-categories are predicted by the collaborative filtering algorithm. Thus, after determining the sub-category in which the sample pair is located, the predicted score corresponding to the sample pair can be obtained from all of the predicted scores associated with the sub-category.

在步骤S74，根据所述预测评分，对所述各个样本对中包括的物品标识进行排序。预测评分越高，表示用户对该物品的预估喜好程度越大。从而，可将预测评分高的物品排在靠前的位置。In step S74, the item identifiers included in the respective sample pairs are sorted according to the predicted score. The higher the predicted score, the greater the user's estimated preference for the item. Thus, items with a high predicted score can be placed in the front position.

在步骤S75，根据所述排序，对所述用户推荐物品。根据所述排序，可以以多种方式向用户推荐物品。例如，可仅向用户推荐排序靠前的物品，可向用户优先推荐排序靠前的物品，可以根据排序，顺序(时间顺序或空间顺序)向用户推荐物品，等等。At step S75, an item is recommended to the user based on the ranking. Depending on the ranking, items can be recommended to the user in a variety of ways. For example, the item ranked first can be recommended only to the user, the item ranked first can be preferentially recommended to the user, and the item can be recommended to the user according to the order, order (chronological order or spatial order), and the like.

图8示出了根据本说明书实施例的一种预测用户对物品的评分的装置800，包括：样本对获取单元81，配置为，获取多个样本对，所述样本对包括选自于多个用户标识的任一个用户标识和选自于多个物品标识的任一个物品标识；评分获取单元82，配置为，获取多个已有评分，所述多个已有评分对应于所述多个样本对中的部分样本对；上下文特征获取单元83，配置为，获取分别与各个样本对对应的多组上下文特征，其中，一组上下文特征包括以下至少一类特征：用户特征、物品特征、以及交互特征；聚类单元84，配置为，基于所述多组上下文特征，将所述多个样本对聚类为多个子类，其中每个子类包括取自于所述多个样本对中的多个第一样本对，每个所述第一样本对包括第一用户标识和第一物品标识，其中所述第一用户标识为第一用户的标识，所述第一物品标识为第一物品的标识；以及评分预测单元85，配置为，关于每个子类，基于多个所述第一用户标识和多个所述第一物品标识、和多个所述第一用户相对于多个所述第一物品的多个已有评分，通过协同过滤算法预测各个第一用户对其未评分的第一物品的评分。FIG. 8 illustrates an apparatus 800 for predicting a user's rating of an item according to an embodiment of the present specification, including: a sample pair obtaining unit 81 configured to acquire a plurality of sample pairs, the sample pair including being selected from a plurality of Any one of the user identifiers and any one of the plurality of item identifiers; the score obtaining unit 82 is configured to acquire a plurality of existing scores, wherein the plurality of existing scores correspond to the plurality of samples a partial sample pair of the pair; the context feature obtaining unit 83 is configured to acquire a plurality of sets of context features respectively corresponding to the respective sample pairs, wherein the set of context features comprises at least one of the following characteristics: user features, item features, and interactions a clustering unit 84 configured to cluster the plurality of sample pairs into a plurality of subclasses based on the plurality of sets of context features, wherein each of the subclasses comprises a plurality of the plurality of sample pairs a first sample pair, each of the first sample pairs including a first user identifier and a first item identifier, wherein the first user identifier is an identifier of the first user, the first object An identifier identified as a first item; and a score prediction unit 85 configured to, based on each of the sub-categories, a plurality of the first user identifiers and the plurality of the first item identifiers, and a plurality of the first users And a plurality of existing scores of the plurality of first items, and a score of each first user for the first item that is not scored is predicted by a collaborative filtering algorithm.

在一个实施例中，在上述预测用户对物品的评分的装置800中，所述聚类单元84包括：选择单元841，配置为，在所述多个样本对中随机选择预定数目的初始质心；第一计算单元842，配置为，基于所述上下文特征，计算每个非质心的样本对到各个质心的距离；归类单元843，配置为，根据所述距离，将每个非质心的样本对归类到距离最近的质心；第二计算单元844，配置为，根据所述预定数目的质心及其对应的非质心样本对，计算相同数目的新的质心；判断单元845，配置为，判断所述新的质心是否满足预定条件；以及输出单元846，配置为，在满足所述预定条件的情况中，输出对所述多个样本对的聚类结果。In one embodiment, in the foregoing apparatus 800 for predicting a user's rating of an item, the clustering unit 84 includes: a selecting unit 841 configured to randomly select a predetermined number of initial centroids among the plurality of sample pairs; The first calculating unit 842 is configured to calculate, according to the context feature, a distance of each non-centroid sample pair to each centroid; the categorizing unit 843 is configured to, according to the distance, each non-centroid sample pair Classified to the nearest centroid; the second calculating unit 844 is configured to calculate the same number of new centroids according to the predetermined number of centroids and their corresponding non-centroid sample pairs; the determining unit 845 is configured to determine Whether the new centroid satisfies a predetermined condition; and an output unit 846 configured to output a clustering result for the plurality of sample pairs in a case where the predetermined condition is satisfied.

在一个实施例中，在上述预测用户对物品的评分的装置中，所述评分预测单元85包括：获取单元851，配置为，对于每个子类，基于所述多个第一用户标识、所述多个第一物品标识及所述多个第一用户相对于所述多个第一物品的所述多个已有评分，获取用户-物品评分矩阵；分解单元852，配置为，将所述用户-物品评分矩阵分解为两个低维矩阵，使得所述两个低维矩阵的乘积最接近所述用户-物品评分矩阵；以及预测单元853，配置为，根据将两个低维矩阵相乘获得的矩阵，预测所述用户-物品评分矩阵中各个第一用户对其未评分的第一物品的评分。In one embodiment, in the foregoing apparatus for predicting a user's rating of an item, the rating prediction unit 85 includes: an obtaining unit 851 configured to, based on the plurality of first user identifiers, the each of the subcategories a plurality of first item identifiers and the plurality of existing scores of the plurality of first users relative to the plurality of first items to obtain a user-item scoring matrix; and a decomposing unit 852 configured to: - the item scoring matrix is decomposed into two low dimensional matrices such that the product of the two low dimensional matrices is closest to the user-item scoring matrix; and the predicting unit 853 is configured to obtain by multiplying the two low dimensional matrices a matrix predicting a score of the first item of the first user in the user-item scoring matrix for which the first item was not scored.

图9示出根据本说明书实施例的一种物品推荐装置900，包括：样本对获取单元91，配置为，获取多个第二样本对，所述第二样本对包括第二用户标识和第二物品标识，其中，所述第二用户标识为待推荐用户的用户标识，所述第二物品标识为对应于多个待推荐物品的多个物品标识中的任一个物品标识；确定单元92，配置为，在通过上述预测评分的方法获取的多个子类中，确定各个所述第二样本对所在的子类；预测评分获取单元93，配置为，从通过所述预测评分的方法预测的评分中，获取每个所述第二样本对在其所属子类中对应的预测评分；排序单元94，配置为，根据所述预测评分，对所述各个第二样本对中包括的第二物品标识进行排序；以及推荐单元95，配置为，根据所述排序，对所述第二用户推荐所述第二物品。FIG. 9 illustrates an item recommendation apparatus 900 according to an embodiment of the present specification, including: a sample pair acquisition unit 91 configured to acquire a plurality of second sample pairs, the second sample pair including a second user identifier and a second An item identifier, wherein the second user identifier is a user identifier of a user to be recommended, and the second item identifier is any one of a plurality of item identifiers corresponding to the plurality of items to be recommended; determining unit 92, configuring To determine, in a plurality of sub-categories obtained by the above-described method of predictive scoring, a sub-category in which each of the second sample pairs is located; a prediction score obtaining unit 93 configured to be predicted from a score predicted by the method of predicting scoring Obtaining a prediction score corresponding to each of the second sample pairs in the subclass to which it belongs; the sorting unit 94 is configured to perform, according to the predicted score, the second item identifier included in each of the second sample pairs Sorting; and a recommendation unit 95 configured to recommend the second item to the second user based on the ranking.

本领域普通技术人员应该还可以进一步意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、计算机软件或者二者的结合来实现，为了清楚地说明硬件和软件的可互换性，在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执轨道，取决于技术方案的特定应用和设计约束条件。本领域普通技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art should further appreciate that the elements and algorithm steps of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both, in order to clearly illustrate the hardware. Interchangeability with software, the components and steps of the various examples have been generally described in terms of functionality in the above description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the solution. Different methods may be used to implement the described functionality for each particular application, but such implementation should not be considered to be beyond the scope of the application.

结合本文中所公开的实施例描述的方法或算法的步骤可以用硬件、处理器执轨道的软件模块，或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。The steps of a method or algorithm described in connection with the embodiments disclosed herein may be implemented in hardware, in a software module in a processor orbit, or in a combination of the two. The software module can be placed in random access memory (RAM), memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or technical field. Any other form of storage medium known.

以上所述的具体实施方式，对本发明的目的、技术方案和有益效果进行了进一步详细说明，所应理解的是，以上所述仅为本发明的具体实施方式而已，并不用于限定本发明的保护范围，凡在本发明的精神和原则之内，所做的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The specific embodiments of the present invention have been described in detail with reference to the preferred embodiments of the present invention. All modifications, equivalent substitutions, improvements, etc., made within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims

A method of predicting a user's rating of an item, including:

Obtaining a plurality of sample pairs, the sample pair comprising any one of the user identifiers selected from the plurality of user identifiers and any one of the item identifiers selected from the plurality of item identifiers;

Obtaining a plurality of existing scores corresponding to a partial sample pair of the plurality of sample pairs;

Obtaining a plurality of sets of context features respectively corresponding to respective sample pairs, wherein the set of context features comprises at least one of the following types of features: user features, item features, and interactive features;

Generating the plurality of sample pairs into a plurality of sub-categories based on the plurality of sets of context features, wherein each sub-class comprises a plurality of first sample pairs taken from the plurality of sample pairs, each of the The first sample pair includes a first user identification and a first item identification, wherein the first user identification is an identification of the first user, and the first item identification is an identification of the first item;

Regarding each sub-category, based on a plurality of the first user identifiers, a plurality of the first item identifiers, and a plurality of existing scores of the plurality of first users relative to the plurality of first items, The filtering algorithm predicts the scores of the first item for each of the first users for which they have not scored.

The method of predicting a user's rating of an item of claim 1, wherein the user characteristic comprises a user attribute feature and/or a user rating statistical feature, the item feature comprising an item attribute feature and/or an item rating statistical feature.

The method of predicting a user's rating of an item according to claim 1, wherein the clustering algorithm is a k-means algorithm or a gmm algorithm.

The method of predicting a user's rating of an item according to claim 1, wherein clustering the plurality of sample pairs into a plurality of sub-categories based on the plurality of sets of context characteristics comprises:

Selecting a predetermined number of initial centroids randomly among the plurality of sample pairs;

Calculating a distance from each non-centroid sample pair to each centroid based on the plurality of sets of context features;

According to the distance, each non-centroid sample pair is classified into the closest centroid;

Calculating the same number of new centroids based on the predetermined number of centroids and their corresponding non-centroid sample pairs based on the plurality of sets of context features;

Determining whether the new centroid meets a predetermined condition;

In the case where the predetermined condition is satisfied, a clustering result for the plurality of sample pairs is output.

The method of predicting a user's rating of an item according to claim 1, wherein the collaborative filtering algorithm is a matrix decomposition algorithm or a knn algorithm.

The method of predicting a user's rating of an item according to claim 1, wherein predicting a score of each of the first users for which the first item is not scored by the collaborative filtering algorithm comprises:

Obtaining, for each sub-category, based on the plurality of first user identifiers, the plurality of first item identifiers, and the plurality of existing scores of the plurality of first users relative to the plurality of first items User-item scoring matrix;

Decomposing the user-item scoring matrix into two low-dimensional matrices such that a product of the two low-dimensional matrices is closest to the user-item scoring matrix;

A score of the first item in the user-item scoring matrix for which the first item is not scored is predicted based on a matrix obtained by multiplying two low-dimensional matrices.

The method of predicting a user's rating of an item according to claim 1, wherein the existing rating is a rating directly scored by a user or based on a user operation.

An item recommendation method includes:

Obtaining a plurality of second sample pairs, where the second sample pair includes a second user identifier and a second item identifier, wherein the second user identifier is a user identifier of the user to be recommended, and the second item identifier corresponds to Any one of a plurality of item identifiers of the plurality of items to be recommended;

Determining, in a plurality of subclasses obtained by the method according to any one of claims 1 to 7, a subclass in which each of the second sample pairs is located;

Acquiring a predicted score corresponding to each of the second sample pairs in a subclass to which it belongs, from a score predicted by the method according to any one of claims 1-7;

Sorting the second item identifiers included in the respective second sample pairs according to the predicted score;

The second item is recommended to the second user based on the ranking.

A device for predicting a user's rating of an item, comprising:

The sample pair obtaining unit is configured to acquire a plurality of sample pairs, the sample pair including any one of the user identifiers selected from the plurality of user identifiers and any one of the item identifiers selected from the plurality of item identifiers;

a score obtaining unit configured to acquire a plurality of existing scores, the plurality of existing scores corresponding to a part of the plurality of sample pairs;

The context feature acquiring unit is configured to acquire a plurality of sets of context features respectively corresponding to the respective sample pairs, wherein the set of context features includes at least one of the following types of features: a user feature, an item feature, and an interaction feature;

a clustering unit configured to cluster the plurality of sample pairs into a plurality of sub-categories based on the plurality of sets of context features, wherein each sub-class comprises a plurality of the same from the plurality of sample pairs In the pair, each of the first sample pairs includes a first user identifier and a first item identifier, wherein the first user identifier is an identifier of the first user, and the first item identifier is an identifier of the first item; as well as

a score prediction unit configured to, based on each of the plurality of the first user identifiers and the plurality of the first item identifiers, and the plurality of the first users relative to the plurality of the first items The existing scores are used to predict the scores of the first items that each first user has not scored by the collaborative filtering algorithm.

The apparatus for predicting a user's rating of an item of claim 9, wherein the user characteristic comprises a user attribute feature and/or a user rating statistical feature, the item feature comprising an item attribute feature and/or an item rating statistical feature.

The apparatus for predicting a user's rating of an item according to claim 9, wherein the clustering algorithm is a k-means algorithm or a gmm algorithm.

The apparatus for predicting a user's rating of an item according to claim 9, wherein the clustering unit comprises:

a selecting unit configured to randomly select a predetermined number of initial centroids among the plurality of sample pairs;

a first calculating unit configured to calculate, according to the context feature, a distance of each non-centroid sample pair to each centroid;

a categorizing unit configured to classify each non-centroidal sample pair to a closest centroid according to the distance;

a second calculating unit, configured to calculate the same number of new centroids according to the predetermined number of centroids and their corresponding non-centroid sample pairs;

a determining unit configured to determine whether the new centroid meets a predetermined condition;

And an output unit configured to output a clustering result for the plurality of sample pairs in a case where the predetermined condition is satisfied.

The apparatus for predicting a user's rating of an item according to claim 9, wherein the collaborative filtering algorithm is a matrix decomposition algorithm or a knn algorithm.

The apparatus for predicting a user's rating of an item according to claim 9, wherein the rating prediction unit comprises:

An obtaining unit, configured to, based on the plurality of first user identifiers, the plurality of first item identifiers, and the plurality of first users relative to the plurality of first items for each sub-category Have scored to obtain a user-item rating matrix;

a decomposition unit configured to decompose the user-item scoring matrix into two low-dimensional matrices such that a product of the two low-dimensional matrices is closest to the user-item scoring matrix;

And a prediction unit configured to predict, according to the matrix obtained by multiplying the two low-dimensional matrices, a score of the first item of the user-item scoring matrix for the first item that is not scored.

The apparatus for predicting a user's rating of an item according to claim 9, wherein the existing rating is a rating directly scored by a user or based on a user operation.

An item recommendation device comprising:

The sample pair obtaining unit is configured to acquire a plurality of second sample pairs, where the second sample pair includes a second user identifier and a second item identifier, wherein the second user identifier is a user identifier of the user to be recommended, The second item identifier is any one of the plurality of item identifiers corresponding to the plurality of items to be recommended;

a determining unit, configured to determine, in a plurality of subclasses obtained by the method according to any one of claims 1-7, a subclass in which each of the second sample pairs is located;

a prediction score acquisition unit configured to acquire, from a score predicted by the method according to any one of claims 1 to 7, a predicted score corresponding to each of the second sample pairs in a subclass to which it belongs;

a sorting unit configured to sort the second item identifiers included in the respective second sample pairs according to the predicted score;

a recommendation unit configured to recommend the second item to the second user according to the sorting.