CN111177670B - Heterogeneous account number association method, device, equipment and storage medium - Google Patents

Heterogeneous account number association method, device, equipment and storage medium Download PDF

Info

Publication number
CN111177670B
CN111177670B CN201911302985.2A CN201911302985A CN111177670B CN 111177670 B CN111177670 B CN 111177670B CN 201911302985 A CN201911302985 A CN 201911302985A CN 111177670 B CN111177670 B CN 111177670B
Authority
CN
China
Prior art keywords
account
target
heterogeneous
data set
pair
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911302985.2A
Other languages
Chinese (zh)
Other versions
CN111177670A (en
Inventor
杨帆
王寰东
孙福宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Cloud Computing Beijing Co Ltd
Original Assignee
Tencent Cloud Computing Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Cloud Computing Beijing Co Ltd filed Critical Tencent Cloud Computing Beijing Co Ltd
Priority to CN201911302985.2A priority Critical patent/CN111177670B/en
Publication of CN111177670A publication Critical patent/CN111177670A/en
Application granted granted Critical
Publication of CN111177670B publication Critical patent/CN111177670B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/52Network services specially adapted for the location of the user terminal

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Telephonic Communication Services (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a heterogeneous account correlation method, a heterogeneous account correlation device, a heterogeneous account correlation equipment and a storage medium, wherein the method comprises the following steps: acquiring a first data set and a second data set which are heterogeneous; converting the first data set and the second data set into a plurality of pairs of heterogeneous ternary data pairs according to the position information; taking the time difference between the reporting times of the heterogeneous ternary data pairs, and two accounts corresponding to the heterogeneous ternary data pairs with the time difference less than or equal to a preset error threshold value as target heterogeneous account pairs; determining the space-time mismatching degree of each target heterogeneous account pair based on the corresponding position information and the reporting time of each target heterogeneous account pair; determining a sum of spatiotemporal mismatches of a target heterogeneous account pair comprising a first target account; determining that the sum of the time-space mismatching degrees corresponding to the first target account number obeys the significance of the target chi-square distribution based on chi-square distribution test; and determining a target associated account number between the first system and the second system based on the significance. By the aid of the technical scheme, accuracy of determining the associated account number can be improved.

Description

Heterogeneous account number association method, device, equipment and storage medium
Technical Field
The present application relates to the field of internet communication technologies, and in particular, to a method, an apparatus, a device, and a storage medium for associating a heterogeneous account.
Background
Under the big data environment, certain relevance often exists between data of different systems, and the relevance between the data of different systems (namely, heterogeneous data) is established, so that greater value can be created; for example, some loan systems can be combined with the payment condition of the user in the payment system of daily life to make better risk assessment.
At present, the association between user accounts in different systems is established, and the spatio-temporal data (position information and reporting time) reported by users in different systems are often combined; specifically, in the prior art, when the correlation between the account numbers is established by combining the spatiotemporal data, the reliability of the determined correlation account number corresponding to the same user cannot be ensured based on the difference of the spatiotemporal data in the two systems, so that the accuracy of the finally determined correlation account number is low. Therefore, there is a need to provide a more reliable or efficient solution.
Disclosure of Invention
The application provides a heterogeneous account correlation method, a heterogeneous account correlation device, a heterogeneous account correlation equipment and a heterogeneous account correlation storage medium, which can improve the accuracy of determining a correlation account and effectively ensure the reliability of the determined correlation account corresponding to the same user.
In one aspect, the present application provides a heterogeneous account number association method, including:
acquiring a first data set from a plurality of preset unit time periods of a first system and a second data set from a plurality of preset unit time periods of a second system, wherein the first data set and the second data set both comprise ternary data of a plurality of account numbers, and the ternary data comprises account number identification, position information and reporting time for reporting the position information; the number of accounts corresponding to the first data set is smaller than that of the accounts corresponding to the second data set;
converting the first data set and the second data set of the same preset unit time period into a plurality of pairs of heterogeneous ternary data pairs according to the position information of the first data set and the second data set of the same preset unit time period;
taking the time difference between the reporting times of the heterogeneous ternary data pairs, and two accounts corresponding to the heterogeneous ternary data pairs with the time difference less than or equal to a preset error threshold value as target heterogeneous account pairs;
determining the space-time mismatching degree of each target heterogeneous account pair based on the corresponding position information and the reporting time of each target heterogeneous account pair;
determining the sum of the time-space mismatching degrees of target heterogeneous account pairs including a first target account, wherein the first target account is an account corresponding to any account identifier in the first data set of the preset unit time periods;
determining that the sum of the time-space mismatching degrees corresponding to the first target account numbers obeys the significance of target chi-square distribution based on chi-square distribution inspection, wherein the target chi-square distribution comprises chi-square distribution with the positioning reporting number of the first target account numbers with the degree of freedom being two times, and the significance characterizes the reliability degree of association between the target heterogeneous account numbers including the first target account numbers and the two corresponding heterogeneous account numbers;
and determining a target associated account number between the first system and the second system based on the significance.
Another aspect provides a heterogeneous account number association apparatus, including:
the system comprises a heterogeneous data acquisition module, a position information acquisition module and a position information processing module, wherein the heterogeneous data acquisition module is used for acquiring a first data set from a plurality of preset unit time periods of a first system and a second data set from a plurality of preset unit time periods of a second system, the first data set and the second data set both comprise ternary data of a plurality of account numbers, and the ternary data comprise account number identifications, position information and reporting time for reporting the position information; the number of accounts corresponding to the first data set is smaller than that of the accounts corresponding to the second data set;
the heterogeneous ternary data pair determining module is used for converting the first data set and the second data set of the same preset unit time period into a plurality of pairs of heterogeneous ternary data pairs according to the position information of the first data set and the second data set of the same preset unit time period;
the target heterogeneous account pair determining module is used for taking the time difference between the reporting times of the heterogeneous ternary data pairs and two accounts corresponding to the heterogeneous ternary data pairs smaller than or equal to a preset error threshold value as target heterogeneous account pairs;
the first time-space mismatching degree determining module is used for determining the time-space mismatching degree of each target heterogeneous account pair based on the corresponding position information and the reporting time of each target heterogeneous account pair;
the system comprises a time-space mismatching degree sum determining module, a time-space mismatching degree sum determining module and a time-space mismatching degree sum determining module, wherein the time-space mismatching degree sum determining module is used for determining the time-space mismatching degree sum of a target heterogeneous account number pair comprising a first target account number, and the first target account number is an account number corresponding to any account number identification in a first data set of a plurality of preset unit time periods;
the chi-square distribution inspection module is used for determining the significance of the sum of the time-space mismatching degrees corresponding to the first target account numbers obeying target chi-square distribution based on chi-square distribution inspection, the target chi-square distribution comprises chi-square distribution with the positioning reporting quantity of the first target account numbers with twice freedom degrees, and the significance characterizes the reliability degree of association between the target heterogeneous account numbers including the first target account numbers and the two corresponding heterogeneous account numbers;
and the associated account number determining module is used for determining a target associated account number between the first system and the second system based on the significance.
Another aspect provides a heterogeneous account association device, which includes a processor and a memory, where at least one instruction or at least one program is stored in the memory, and the at least one instruction or the at least one program is loaded and executed by the processor to implement the heterogeneous account association method as described above.
Another aspect provides a computer-readable storage medium, in which at least one instruction or at least one program is stored, and the at least one instruction or the at least one program is loaded and executed by a processor to implement the heterogeneous account association method as described above.
The method, the device, the equipment and the storage medium for associating the heterogeneous account have the following technical effects:
the method comprises the steps of performing space-time intersection on a first data set and a second data set of different sources by combining position information and reporting time to obtain a plurality of pairs of target different source account numbers corresponding to different source data; then, the space-time mismatching degree of the target heterogeneous account pair is calculated by combining the position information and the reporting time; then, the significance representing the associated reliability of the target heterogeneous account for the two corresponding heterogeneous accounts is determined by combining with the chi-square distribution, the measurement of the reliability of whether the two corresponding heterogeneous accounts correspond to the same user is realized, the accuracy of the determined associated accounts is effectively improved, and the gap between heterogeneous data is broken.
Drawings
In order to more clearly illustrate the technical solutions and advantages of the embodiments of the present application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a schematic diagram of an application environment provided by an embodiment of the present application;
fig. 2 is a schematic flowchart of a method for associating a heterogeneous account according to an embodiment of the present disclosure;
fig. 3 is a schematic flowchart of a process for determining spatiotemporal mismatch of target heterogeneous account pairs based on location information and reporting time corresponding to each target heterogeneous account pair according to an embodiment of the present application;
fig. 4 is a schematic flowchart of determining a target associated account number between the first system and the second system based on the saliency according to the embodiment of the present application;
fig. 5 is another schematic flowchart of determining a target associated account number between the first system and the second system based on the saliency according to the embodiment of the present application;
fig. 6 is a schematic structural diagram of a heterogeneous account number association apparatus according to an embodiment of the present disclosure;
fig. 7 is a block diagram of a hardware structure of a server of a heterogeneous account association method according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between similar account numbers and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be implemented in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Referring to fig. 1, fig. 1 is a schematic diagram of an application environment according to an embodiment of the present disclosure, and as shown in fig. 1, the application environment at least includes a network node 01, a network node 02, and a network node 03.
In this embodiment of the present specification, the network node 01 may include, but is not limited to, a background service device corresponding to the first system; specifically, the background service device may store data of an account in the first system.
In this embodiment, the network node 02 may include, but is not limited to, a background service device corresponding to the second system; the second system and the first system are different systems.
In this embodiment of the present specification, the background service device may store data of an account in the second system. Specifically, the background service device may include, but is not limited to, an independently running server, or a distributed server, or a server cluster composed of multiple servers. The background service device may include a network communication unit, a processor, a memory, and the like.
In this embodiment, the network node 03 may be configured to perform heterogeneous account association processing on two heterogeneous data sets (i.e., data originating from the first system and the second system, respectively) of the network node 01 and the network node 02. Specifically, the network node 03 may include, but is not limited to, a smart phone, a desktop computer, a tablet computer, a laptop computer, a digital assistant, an Augmented Reality (AR)/Virtual Reality (VR) device, a smart wearable device, and other types of physical terminal devices, and may also include software running in the physical terminal devices, such as a virtual machine. It may also comprise a server running independently, or a distributed server, or a server cluster consisting of a plurality of servers.
The following describes a method for associating a heterogeneous account number, and fig. 2 is a flowchart of a method for associating a heterogeneous account number provided in an embodiment of the present application, and the present specification provides the method operation steps as described in the embodiment or the flowchart, but may include more or less operation steps based on conventional or non-creative labor. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. In actual system or server product execution, sequential execution or parallel execution (e.g., parallel processor or multithreaded processing environments) may occur according to the embodiments or methods shown in the figures. Specifically, as shown in fig. 2, the method may include:
s201: a first data set derived from a plurality of preset unit time periods of a first system and a second data set derived from a plurality of preset unit time periods of a second system are acquired.
In an embodiment of the present specification, the first data set and the second data set both include ternary data of a plurality of account numbers, where the ternary data includes an account number identifier, location information, and reporting time for reporting the location information.
In the embodiment of the present specification, the account number may include, but is not limited to, an account number of an individual or collective user in a corresponding system. In particular, the account id may include, but is not limited to, information that may be used to uniquely identify a certain account. Specifically, the location information in the ternary information of each account may be location information at a certain moment reported by the account, and specifically, the location information may include, but is not limited to, a longitude and latitude coordinate, or a longitude and latitude coordinate, an altitude coordinate, and other location information.
The preset unit time period may be set in combination with practical applications, for example, an hour, a day, a week, and the like, and if the preset unit time period is a day, the plurality of preset unit time periods may be correspondingly multiple days. Specifically, the number of the first data set and the second data set in the selected preset unit time period may be determined by combining with an actual application situation, so that the selected first data set and the selected second data set include more associated accounts as much as possible, and meanwhile, the change of the association relationship between the accounts, which is caused by too large data volume, cannot be effectively captured is avoided.
The number of accounts corresponding to the first data set is smaller than that of the accounts corresponding to the second data set, and the users of the accounts corresponding to the first data set and the users of the accounts corresponding to the second data set have intersection, namely, the same user reports position information in both the first system and the second system. Specifically, for example, the user a reports the location information in the first system, and the user a reports the similar location information in the second system, but the specific reported location longitude and latitude and time generally have a certain deviation. For example, the first data set is location information reported by a user through an application program 1 on a terminal, the second data set is location information reported by a user terminal system, and normally, reporting of the location information by the application program 1 causes the terminal system to report the location information once, but certain deviations may exist between the two pieces of information in terms of longitude, latitude and time.
S203: and converting the first data set and the second data set of the same preset unit time period into a plurality of pairs of heterogeneous ternary data pairs according to the position information of the first data set and the second data set of the same preset unit time period.
In this embodiment of the present specification, a spatial intersection may be performed on a first data set and a second data set of each preset unit time period; namely, the position information in the ternary data of each account in the first data set and the second data set is combined into a heterogeneous ternary data pair with the position difference within a specified threshold value.
In a specific embodiment, the converting the first data set and the second data set of the same preset unit time period into multiple pairs of heterogeneous ternary data pairs according to the position information in the first data set and the second data set of the same preset unit time period may include:
1) And mapping the first data set and the second data set of the same preset unit time period to a preset grid according to the position information of the first data set and the second data set of the same preset unit time period.
2) And generating a plurality of pairs of heterogeneous ternary data pairs based on pairwise heterogeneous ternary data in the same sub-grid.
In an embodiment of the present specification, the preset mesh includes a plurality of sub-meshes; specifically, a certain space unit δ may be designated to construct a sub-grid with a side length δ, so that the position information in the first data set and the second data set can fall into a certain sub-grid, the ternary data in the first data set and the ternary data in the second data set are uniformly corresponding to each sub-grid in the preset grid, and two different types of ternary data in the same sub-grid can be used as a pair of different types of ternary data. Specifically, the longer the side length of the sub-grid is, the more heterogeneous ternary data pairs in the same sub-grid are; conversely, the shorter the side length of a sub-grid is, the fewer heterogeneous ternary data pairs in the same sub-grid are.
In other embodiments, because the data amount in the first data set is smaller than the data amount in the second data set, some ternary data in the second data set cannot be matched with different source ternary data, and errors often exist in the time-space data corresponding to the same account number among different source data sets, which causes that the ternary data in the first data set cannot be mapped to the same sub-grid to which the ternary data actually corresponding to the second data set is mapped, the problem that the data cannot be associated due to the time-space errors can be solved through diffusion processing. Correspondingly, before generating multiple pairs of heterogeneous ternary data pairs based on pairwise heterogeneous ternary data in the same sub-grid, the method further comprises:
and performing diffusion processing on the ternary data in the first data set in the preset grid.
Correspondingly, the generating multiple pairs of heterogeneous ternary data pairs based on pairwise heterogeneous ternary data in the same sub-grid may include:
and generating multiple pairs of heterogeneous ternary data pairs based on pairwise heterogeneous ternary data in the same sub-grid after diffusion processing.
In a specific embodiment, the number k of the lattice to be diffused may be specified to implement diffusion processing on the ternary data in the first data set, specifically, each ternary data in the first data set may be copied to a k-th order grid adjacent to the sub-grid where the ternary data is located, for example, if k =0, then the ternary data is not diffused; if k =1, a grid is diffused all around, that is, in a two-dimensional grid system, a grid of nine squares is formed by taking a sub-grid where a certain ternary data is located as the center, and the ternary data is added to other sub-grids except the sub-grid where the ternary data is located in the grid of nine squares. Correspondingly, in the three-dimensional space grid system, a twenty-seven grid is formed by taking a sub-grid where certain ternary data is located as the center, and the ternary data is added to other sub-grids except the sub-grid where the ternary data is located in the twenty-seven grid.
S205: and taking the time difference between the reporting times of the heterogeneous ternary data pairs, and the two accounts corresponding to the heterogeneous ternary data pairs with the time difference less than or equal to a preset error threshold value as target heterogeneous account pairs.
In practical applications, there is often a certain difference in reporting time when a user reports location information in the same location in both the first system and the second system. Accordingly, a certain time window threshold (i.e., a preset error threshold) Tt is set for all ternary data in the same sub-grid, for example, tt =30s, and may be a pair of heterogeneous ternary data only matching the time difference within 30 s. Correspondingly, a time difference can be determined based on the reporting time in the heterogeneous ternary data pair, the time difference is compared with a preset error threshold, and when the time difference is smaller than or equal to the preset error threshold, two account numbers corresponding to the heterogeneous ternary data with the time difference smaller than or equal to the preset error threshold are used as a target heterogeneous account number pair.
S207: and determining the space-time mismatching degree of each target heterogeneous account pair based on the corresponding position information and the reporting time of each target heterogeneous account pair.
In an embodiment of the present specification, the spatiotemporal mismatching degree of a target heterogeneous account pair is inversely proportional to the association degree between two heterogeneous accounts corresponding to the target heterogeneous account pair, and specifically, the higher the association degree between two heterogeneous accounts corresponding to the target heterogeneous account pair is, the smaller the spatiotemporal mismatching degree of the target heterogeneous account pair is; conversely, the lower the association degree between two heterogeneous account numbers corresponding to the target heterogeneous account number pair is, the greater the spatiotemporal mismatching degree of the target heterogeneous account number pair is.
Specifically, as shown in fig. 3, determining the space-time mismatch degree of each target heterogeneous account pair based on the location information and the reporting time corresponding to each target heterogeneous account pair may include:
s2071: and determining the space error of each target heterogeneous account pair according to the corresponding position information of each target heterogeneous account pair.
S2073: and determining the time error of each target heterogeneous account pair according to the corresponding report time of each target heterogeneous account pair.
S2075: and acquiring a standard deviation of a preset spatial error and a standard deviation of a preset time error.
S2077: and determining the space-time mismatching degree of the target heterogeneous account number pair based on the spatial error, the time error, the preset standard deviation of the spatial error and the preset standard deviation of the time error.
In this embodiment of the present specification, the spatial error of the target heterogeneous account number pair may include a difference between position information corresponding to two heterogeneous account numbers corresponding to the target heterogeneous account number pair. The time error of the target heterogeneous account number pair may include a difference between reporting times corresponding to two heterogeneous account locks corresponding to the target heterogeneous account number pair. In this embodiment of the present specification, the standard deviation of the preset spatial error and the standard deviation of the preset temporal error may be determined based on the ternary data of the associated account.
In the embodiment of the present specification, it may be assumed that the spatial error and the temporal error of the target heterogeneous account pair both obey gaussian distribution, and after the spatial error and the temporal error are converted into standard gaussian distribution, the sum of their squares will obey chi-square distribution. Specifically, the two standard deviations may be introduced in calculating the degree of mismatch, and the temporal error and the spatial error subject to the gaussian distribution may be converted into a standard gaussian distribution. Accordingly, the spatiotemporal mismatch of the target heterogeneous account pair can be calculated by the following formula:
S=(dt^2/σT^2)+(ds^2/σS^2)
wherein S is the space-time mismatching degree of the target heterogeneous account pair; dt time error of target heterologous account number pair; ds is the spatial error of the target heterogeneous account pair; σ T is the standard deviation of the preset temporal error dt and σ s is the standard deviation of the preset spatial error ds.
In the embodiments of the present description, each target heterogeneous account pair corresponds to a mismatch. In other embodiments, after performing diffusion processing on the ternary data in the first data set in the preset grid, each target heterogeneous account pair may correspond to multiple degrees of mismatch, and accordingly, the method may further include:
when the same target heterogeneous account number pair corresponds to multiple spatiotemporal mismatches, the method further comprises:
1) Comparing the sizes of a plurality of space-time mismatching degrees corresponding to the same target heterogeneous account pair;
2) And taking the minimum space-time mismatching degree as the space-time mismatching degree of the same target heterogeneous account pair.
In the embodiment of the present specification, the two accounts corresponding to the target heterogeneous account pair are heterogeneous, and the errors in time and space are within a certain range.
S209: a sum of spatiotemporal mismatches of a target heterogeneous account pair comprising the first target account is determined.
In this embodiment of the specification, the first target account may be an account corresponding to an account identifier in the first data set of the multiple preset unit time periods. Correspondingly, the target heterogeneous account number pair including the first target account number in the target heterogeneous account number pair can be determined, and the spatiotemporal mismatching degrees of the target heterogeneous account number pair including the first target account number are added to obtain the sum of the spatiotemporal mismatching degrees.
S211: and determining that the sum of the time-space mismatching degrees corresponding to the first target account number obeys the significance of the target chi-square distribution based on chi-square distribution test.
In the embodiment of the present specification, the target heterologous account number pair corresponding to each preset unit time period corresponds to two standard normal distributions: and (3) quasi-normal distribution of time and space, and correspondingly, the target chi-square distribution may include chi-square distribution of the positioning reporting number of the first target account with twice of the degree of freedom. Specifically, the positioning reporting number may be the number of times of reporting the location information.
In this specification, the significance signature includes a reliability degree having an association between two different source account numbers corresponding to a target different source account number pair of the first target account number, and when it is determined that a sum of spatio-temporal mismatching degrees corresponding to the first target account number is subject to a significance degree of target chi-square distribution based on chi-square distribution test, it may be determined whether two variables (two account numbers in the target different source account number pair) have a relationship by using an independence test, and a reliability degree (i.e., a significance degree) of such determination may be determined, specifically, a test x statistic may be calculated in combination with the sum of spatio-temporal mismatching degrees 2 Value of (a), in combination with x 2 And the distribution probability table is used for determining the significance of the relationship between the two accounts by combining the degree of freedom search.
In this embodiment of the present specification, the lower the significance is, the more likely the sum of the spatio-temporal mismatching degrees corresponding to the first target account is subject to the target chi-square distribution, and the smaller the spatio-temporal error of the corresponding target heterogeneous account pair is, the more likely the two accounts corresponding to the target heterogeneous account pair represent the same user (i.e. the higher the reliability of the association between the two accounts is); conversely, the higher the significance is, the less likely the sum of the spatio-temporal mismatching degrees corresponding to the first target account is to comply with the target chi-square distribution, the greater the spatio-temporal error of the corresponding target heterogeneous account pair is, the less likely the two accounts corresponding to the target heterogeneous account pair are to represent the same user (i.e. the lower the reliability degree of association of the two accounts is).
S213: and determining a target associated account number between the first system and the second system based on the significance.
In this embodiment of the present specification, as shown in fig. 4, the determining a target associated account between the first system and the second system based on the saliency may include:
s401: comparing the magnitude of prominence of a target heterogeneous account number pair comprising the first target account number.
S403: and taking the target heterogeneous account number pair corresponding to the minimum significance corresponding to the first target account number as a first primary selection associated account number.
S405: and comparing the significance of target heterogeneous account pairs including a second target account, wherein the second target account is an account corresponding to any account identifier in a second data set of the preset unit time periods.
S407: and taking the target heterogeneous account number pair corresponding to the minimum significance degree corresponding to the second target account number as a second primary selection associated account number.
S409: carrying out duplicate removal processing on the first primary selection associated account and the second primary selection associated account;
s411: when the same account appears in a plurality of de-duplicated primary selection associated accounts, deleting the primary selection associated accounts comprising the same account;
s413: and taking the primarily selected associated account after the deletion processing as a target associated account.
In the embodiment of the present specification, in order to find a uniquely matching associated account number. Respectively aiming at each account corresponding to the first data set and the second data set, finding a target heterogeneous account pair corresponding to each account, taking the target heterogeneous account pair corresponding to the minimum significance as a primary selection associated account, and then, carrying out duplicate removal processing; and then, when the same account appears in the plurality of the de-duplicated primary selected associated accounts, deleting the primary selected associated accounts comprising the same account to obtain a target associated account.
Specifically, for example, the accounts corresponding to the first data set are a, B, C, D, and E, the accounts corresponding to the second data set are a, B, C, D, and E, and it is assumed that the target heterogeneous account pair including account a is (a, a), (a, B), and (a, C); correspondingly, according to the degrees of significance of (a, a), (a, b) and (a, c), assuming that the degree of significance is (a, b), selecting the target heterogeneous account pair (a, b) with the lowest degree of significance as the first initially selected associated account. Further, assuming that possible corresponding target heterogeneous account pairs corresponding to the account B corresponding to the second data set are (a, B), (B, B) and (D, B), assuming that the account B with the lowest significance is (B, B), correspondingly, selecting the target heterogeneous account pair (B, B) with the lowest significance as a second initially-selected associated account; at this time, (a, B) and (B, B) do not belong to the same primary-selected associated account, and are retained after the deduplication processing, but the same account B appears in a plurality of deduplicated primary-selected associated accounts, so that the accounts in the first system and the accounts in the second system cannot be matched one by one, and accordingly, (a, B) and (B, B) can be deleted.
In other embodiments, as shown in fig. 5, determining the target association account between the first system and the second system based on the degree of saliency may include:
s501: determining a confidence level of a target heterogeneous account pair comprising the first target account number based on the corresponding significance of the first target account number.
Specifically, 1 minus the degree of saliency may be taken as the corresponding confidence.
S503: comparing the confidence levels of the target heterogeneous account pairs including the first target account;
s505: taking a target heterogeneous account number pair corresponding to the highest confidence coefficient corresponding to the first target account number as a third primarily selected associated account number;
s507: comparing the confidence degree of a target heterogeneous account pair comprising a second target account, wherein the second target account is an account corresponding to any account identification in a second data set of a plurality of preset unit time periods;
s509: taking a target heterogeneous account number pair corresponding to the highest confidence coefficient corresponding to the second target account number as a fourth primary selection associated account number;
s511: carrying out duplicate removal processing on the third primary selection associated account and the fourth primary selection associated account;
s513: when the same account appears in the plurality of the de-duplicated primary selection associated accounts, deleting the primary selection associated accounts comprising the same account;
s515: and taking the primarily selected associated account after the deletion as a target associated account.
In the embodiments of the present disclosure, the confidence level is proportional to the reliability degree associated with the two heterogeneous account numbers corresponding to the target heterogeneous account number pair. Specifically, the higher the confidence degree is, the more likely the target heterogeneous account pair corresponds to two accounts to represent the same user (i.e., the higher the reliability degree of association between the two accounts is); conversely, the lower the confidence level, the less likely the target heterogeneous account pair corresponds to two accounts representing the same user (i.e., the lower the confidence level associated with the two accounts).
In the embodiments of the present specification, in order to find a uniquely matching associated account number. Respectively aiming at each account corresponding to the first data set and the second data set, finding a target heterogeneous account pair corresponding to each account, taking the target heterogeneous account pair corresponding to the maximum confidence as a primary selection associated account, and then carrying out duplicate removal processing; and then, when the same account appears in the plurality of the duplicate-removed primary selected associated accounts, deleting the primary selected associated accounts comprising the same account to obtain the target associated account.
As can be seen from the technical solutions provided in the embodiments of the present specification, in the technical solutions provided in the embodiments of the present specification, space-time intersection is performed on the first data set and the second data set of heterogeneous sources in combination with the location information and the reporting time, so as to obtain a plurality of pairs of target heterogeneous source account numbers corresponding to the heterogeneous source data; then, the space-time mismatching degree of the target heterogeneous account pair is calculated by combining the position information and the reporting time; then, the significance representing the associated reliability of the target heterogeneous account for the two corresponding heterogeneous accounts is determined by combining with the chi-square distribution, the measurement of the reliability of whether the two corresponding heterogeneous accounts correspond to the same user is realized, the accuracy of the determined associated accounts is effectively improved, and the gap between heterogeneous data is broken.
An embodiment of the present application further provides a heterogeneous account number associating apparatus, as shown in fig. 6, the apparatus includes:
the heterogeneous data obtaining module 610 may be configured to obtain a first data set derived from a plurality of preset unit time periods of a first system and a second data set derived from a plurality of preset unit time periods of a second system, where the first data set and the second data set both include ternary data of a plurality of account numbers, and the ternary data includes account number identifiers, location information, and reporting time for reporting the location information; the number of accounts corresponding to the first data set is smaller than that of the accounts corresponding to the second data set;
a heterogeneous ternary data pair determining module 620, configured to convert the first data set and the second data set of the same preset unit time period into multiple pairs of heterogeneous ternary data pairs according to the position information in the first data set and the second data set of the same preset unit time period;
the target heterogeneous account pair determining module 630 may be configured to use two accounts, corresponding to the heterogeneous ternary data pair, of which a time difference between reporting times of the heterogeneous ternary data pairs is smaller than or equal to a preset error threshold as the target heterogeneous account pair;
the first time-space mismatching degree determining module 640 may be configured to determine, based on the location information and the reporting time corresponding to each target heterogeneous account pair, a time-space mismatching degree of the target heterogeneous account pair;
a time-space mismatching degree sum determination module 650, configured to determine a sum of time-space mismatching degrees of a target heterogeneous account pair including a first target account, where the first target account is an account corresponding to any account identifier in the first data set of the multiple preset unit time periods;
a chi-square distribution verification module 660, configured to determine, based on chi-square distribution verification, a significance of a sum of temporal-spatial mismatching degrees corresponding to the first target account numbers subject to target chi-square distribution, where the target chi-square distribution includes a chi-square distribution with a positioning reporting number of the first target account number with twice a degree of freedom, and the significance characterizes a reliability degree of an association between a target heterogeneous account number including the first target account number and two corresponding heterogeneous account numbers;
a correlation account determination module 670, configured to determine a target correlation account between the first system and the second system based on the significance.
In some embodiments, the heterogeneous triple data pair determination module 620 comprises:
the data mapping unit is used for mapping the first data set and the second data set of the same preset unit time period to a preset grid according to the position information of the first data set and the second data set of the same preset unit time period, and the preset grid comprises a plurality of sub-grids;
and the heterogeneous ternary data pair generation unit is used for generating a plurality of pairs of heterogeneous ternary data pairs based on pairwise heterogeneous ternary data in the same sub grid.
In some embodiments, the apparatus further comprises:
the data diffusion processing module is used for performing diffusion processing on the ternary data in the first data set in the preset grid before generating multiple pairs of heterogeneous ternary data pairs based on pairwise heterogeneous ternary data in the same sub-grid;
correspondingly, the heterogeneous ternary data pair generation module is specifically used for generating multiple pairs of heterogeneous ternary data pairs based on pairwise heterogeneous ternary data in the same sub-grid after diffusion processing.
In some embodiments, when the same target heterogeneous account number pair corresponds to multiple spatiotemporal mismatches, the apparatus further comprises:
the time-space mismatching degree comparison module is used for comparing the sizes of a plurality of time-space mismatching degrees corresponding to the same target heterologous account number pair before determining the sum of the time-space mismatching degrees of the target heterologous account number pair comprising the first target account number;
and the second space-time mismatching degree determining module is used for taking the minimum space-time mismatching degree as the space-time mismatching degree of the same target heterogeneous account pair.
In some embodiments, the first time-space mismatch determination module comprises:
the space error determining unit is used for determining the space error of each target heterogeneous account pair according to the corresponding position information of each target heterogeneous account pair;
the time error determining unit is used for determining the time error of each target heterogeneous account pair according to the corresponding reporting time of each target heterogeneous account pair;
a standard deviation obtaining unit, configured to obtain a standard deviation of a preset spatial error and a standard deviation of a preset time error;
and the space-time mismatching degree determining unit is used for determining the space-time mismatching degree of the target heterogeneous account number pair based on the spatial error, the time error, the standard deviation of the preset spatial error and the standard deviation of the preset time error.
In some embodiments, the associated account number determination module 670 includes:
a first significance comparison unit, configured to compare magnitudes of significance of target heterogeneous account pairs including the first target account;
a first primary selection associated account determining unit, configured to use a target heterogeneous account pair corresponding to the minimum degree of significance corresponding to the first target account as a first primary selection associated account;
a second significance comparison unit, configured to compare significance of target heterogeneous account pairs including a second target account, where the second target account is an account corresponding to any account identifier in the second data set of the multiple preset unit time periods;
a second primary selection associated account determining unit, configured to use a target heterogeneous account pair corresponding to the minimum degree of significance corresponding to the second target account as a second primary selection associated account;
the first duplication elimination processing unit is used for carrying out duplication elimination processing on the first primarily selected associated account and the second primarily selected associated account;
the first primary selection associated account deleting unit is used for deleting the primary selection associated accounts including the same account when the same account appears in a plurality of de-duplicated primary selection associated accounts;
and the first target associated account determining unit is used for taking the primarily selected associated account subjected to deletion processing as a target associated account.
In some embodiments, the associated account number determination module 670 includes:
a confidence degree determining unit, configured to determine a confidence degree of a target heterogeneous account pair including the first target account based on the significance degree corresponding to the first target account;
a first confidence degree comparison unit, configured to compare the magnitude of confidence degrees of target heterogeneous account pairs including the first target account;
a third primary selection associated account determining unit, configured to use a target heterogeneous account pair corresponding to the highest confidence corresponding to the first target account as a third primary selection associated account;
a second confidence degree comparing unit, configured to compare confidence degrees of target heterogeneous account pairs including a second target account, where the second target account identifies a corresponding account for any account in a second data set of the multiple preset unit time periods;
a fourth primary selection associated account determining unit, configured to use a target heterogeneous account pair corresponding to a highest confidence degree corresponding to the second target account as a fourth primary selection associated account;
the second duplicate removal processing unit is used for carrying out duplicate removal processing on the third primarily selected associated account and the fourth primarily selected associated account;
the second primary selection associated account deleting unit is used for deleting the primary selection associated accounts including the same account when the same account appears in the plurality of de-duplicated primary selection associated accounts;
and the second target associated account determining unit is used for taking the primarily selected associated account subjected to deletion processing as a target associated account.
The device and method embodiments in the device embodiment are based on the same application concept.
The embodiment of the application provides a heterogeneous account association device, which includes a processor and a memory, where the memory stores at least one instruction or at least one program, and the at least one instruction or the at least one program is loaded and executed by the processor to implement the heterogeneous account association method provided by the above method embodiment.
The memory may be used to store software programs and modules, and the processor may execute various functional applications and data processing by operating the software programs and modules stored in the memory. The memory can mainly comprise a program storage area and a data storage area, wherein the program storage area can store an operating system, application programs needed by functions and the like; the storage data area may store data created according to use of the apparatus, and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory may also include a memory controller to provide the processor access to the memory.
The method provided by the embodiment of the application can be executed in a mobile terminal, a computer terminal, a server or a similar operation device. Taking the operation on a server as an example, fig. 7 is a block diagram of a hardware structure of the server of the heterogeneous account association method provided in the embodiment of the present application. As shown in fig. 7, the server 700 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 710 (the processors 710 may include but are not limited to Processing devices such as a microprocessor MCU or a programmable logic device FPGA, etc.), a memory 730 for storing data, and one or more storage media 720 (e.g., one or more mass storage devices) for storing applications 723 or data 722. Memory 730 and storage medium 720 may be, among other things, transient storage or persistent storage. The program stored in the storage medium 720 may include one or more modules, each of which may include a series of instruction operations for a server. Still further, central processor 710 may be configured to communicate with storage medium 720 and execute a series of instruction operations in storage medium 720 on server 700. The Server 700 may also include one or more power supplies 760, one or more wired or wireless network interfaces 750, one or more input-output interfaces 740, and/or one or more operating systems 721, such as a Windows Server TM ,Mac OS X TM ,Unix TM ,Linux TM ,FreeBSD TM And so on.
The input/output interface 740 may be used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the server 700. In one example, the input/output Interface 740 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the input/output interface 740 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.
It will be understood by those skilled in the art that the structure shown in fig. 7 is only an illustration and is not intended to limit the structure of the electronic device. For example, server 700 may also include more or fewer components than shown in FIG. 7, or have a different configuration than shown in FIG. 7.
An embodiment of the present application further provides a storage medium, where the storage medium may be disposed in a device to store at least one instruction or at least one program for implementing a method related to a heterogeneous account association method in the method embodiment, and the at least one instruction or the at least one program is loaded and executed by the processor to implement the method related to a heterogeneous account association method provided in the method embodiment.
Alternatively, in this embodiment, the storage medium may be located in at least one network server of a plurality of network servers of a computer network. Optionally, in this embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
According to the embodiments of the method, the device, the equipment or the storage medium for associating the heterogeneous accounts, the space-time intersection is carried out on the first data set and the second data set of the heterogeneous accounts in combination with the position information and the reporting time, and a plurality of pairs of target heterogeneous accounts corresponding to the heterogeneous data are obtained; then, the space-time mismatching degree of the target heterogeneous account pair is calculated by combining the position information and the reporting time; then, the significance representing the associated reliability of the target heterogeneous account for the two corresponding heterogeneous accounts is determined by combining with the chi-square distribution, the measurement of the reliability of whether the two corresponding heterogeneous accounts correspond to the same user is realized, the accuracy of the determined associated accounts is effectively improved, and the gap between heterogeneous data is broken.
It should be noted that: the sequence of the embodiments of the present application is only for description, and does not represent the advantages and disadvantages of the embodiments. And specific embodiments thereof have been described above. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, device and storage medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for relevant points.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware to implement the above embodiments, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk, an optical disk, or the like.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (14)

1. A heterogeneous account association method is characterized by comprising the following steps:
acquiring a first data set from a plurality of preset unit time periods of a first system and a second data set from a plurality of preset unit time periods of a second system, wherein the first data set and the second data set both comprise ternary data of a plurality of account numbers, and the ternary data comprises account number identification, position information and reporting time for reporting the position information; the number of accounts corresponding to the first data set is smaller than that of the accounts corresponding to the second data set;
converting the first data set and the second data set of the same preset unit time period into a plurality of pairs of heterogeneous ternary data pairs according to the position information of the first data set and the second data set of the same preset unit time period;
taking the time difference between the reporting times of the heterogeneous ternary data pairs, and two accounts corresponding to the heterogeneous ternary data pairs with the time difference less than or equal to a preset error threshold value as target heterogeneous account pairs;
determining the space error of each target heterogeneous account pair according to the corresponding position information of each target heterogeneous account pair;
determining the time error of each target heterogeneous account pair according to the corresponding report time of each target heterogeneous account pair;
acquiring a standard deviation of a preset space error and a standard deviation of a preset time error;
determining the space-time mismatching degree of the target heterogeneous account number pair based on the spatial error, the time error, the standard deviation of the preset spatial error and the standard deviation of the preset time error;
determining the sum of the time-space mismatching degrees of target heterogeneous account pairs including a first target account, wherein the first target account is an account corresponding to any account identifier in the first data set of the preset unit time periods;
determining that the sum of the time-space mismatching degrees corresponding to the first target account numbers obeys the significance of target chi-square distribution based on chi-square distribution inspection, wherein the target chi-square distribution comprises chi-square distribution with the positioning reporting number of the first target account numbers with the degree of freedom being two times, and the significance characterizes the reliability degree of association between the target heterogeneous account numbers including the first target account numbers and the two corresponding heterogeneous account numbers;
and determining a target associated account number between the first system and the second system based on the significance.
2. The method according to claim 1, wherein the converting the first data set and the second data set of the same preset unit time period into a plurality of pairs of heterogeneous ternary data pairs according to the position information in the first data set and the second data set of the same preset unit time period comprises:
mapping the first data set and the second data set of the same preset unit time period to a preset grid according to the position information of the first data set and the second data set of the same preset unit time period, wherein the preset grid comprises a plurality of sub-grids;
and generating a plurality of pairs of heterogeneous ternary data pairs based on pairwise heterogeneous ternary data in the same sub-grid.
3. The method of claim 2, wherein prior to generating pairs of heterogeneous ternary data pairs based on pairwise heterogeneous ternary data in the same subgrid, the method further comprises:
performing diffusion processing on the ternary data in the first data set in the preset grid;
correspondingly, the generating multiple pairs of heterogeneous ternary data pairs based on pairwise heterogeneous ternary data in the same sub-grid comprises:
and generating multiple pairs of heterogeneous ternary data pairs based on pairwise heterogeneous ternary data in the same sub-grid after diffusion processing.
4. The method of claim 3, wherein when there are a plurality of spatiotemporal mismatches with respect to the same target heterogeneous account number pair, prior to determining a sum of the spatiotemporal mismatches of the target heterogeneous account number pair including the first target account number, the method further comprises:
comparing the size of the plurality of space-time mismatching degrees corresponding to the same target heterogeneous account number;
and taking the minimum space-time mismatching degree as the space-time mismatching degree of the same target heterogeneous account pair.
5. The method of claim 1, wherein the determining a target association account between the first system and the second system based on the prominence comprises:
comparing the magnitude of the significance of a target heterogeneous account pair comprising the first target account;
taking a target heterogeneous account number pair corresponding to the minimum significance degree corresponding to the first target account number as a first primary selection associated account number;
comparing the significance of target heterogeneous account pairs including a second target account, wherein the second target account is an account corresponding to any account identifier in a second data set of the preset unit time periods;
taking the target heterogeneous account number pair corresponding to the minimum significance degree corresponding to the second target account number as a second primary selection associated account number;
carrying out duplicate removal processing on the first primary selection associated account and the second primary selection associated account;
when the same account appears in a plurality of de-duplicated primary selection associated accounts, deleting the primary selection associated accounts comprising the same account;
and taking the primarily selected associated account after the deletion processing as a target associated account.
6. The method of claim 1, wherein the determining a target association account between the first system and the second system based on the prominence comprises:
determining a confidence level of a target heterogeneous account pair comprising the first target account number based on the corresponding significance of the first target account number;
comparing the confidence levels of the target heterogeneous account pairs including the first target account;
taking the target heterogeneous account number pair corresponding to the highest confidence degree corresponding to the first target account number as a third primary selection associated account number;
comparing the confidence degree of a target heterogeneous account pair comprising a second target account, wherein the second target account is an account corresponding to any account identification in a second data set of a plurality of preset unit time periods;
taking the target heterogeneous account number pair corresponding to the highest confidence degree corresponding to the second target account number as a fourth primarily-selected associated account number;
carrying out duplicate removal processing on the third primary selection associated account and the fourth primary selection associated account;
when the same account appears in a plurality of de-duplicated primary selection associated accounts, deleting the primary selection associated accounts comprising the same account;
and taking the primarily selected associated account after the deletion processing as a target associated account.
7. A heterogeneous account number association apparatus, the apparatus comprising:
the heterogeneous data acquisition module is used for acquiring a first data set which is sourced from a plurality of preset unit time periods of a first system and a second data set which is sourced from a plurality of preset unit time periods of a second system, wherein the first data set and the second data set respectively comprise ternary data of a plurality of account numbers, and the ternary data comprise account number identifications, position information and reporting time for reporting the position information; the number of accounts corresponding to the first data set is smaller than that of the accounts corresponding to the second data set;
the heterogeneous ternary data pair determining module is used for converting the first data set and the second data set of the same preset unit time period into a plurality of pairs of heterogeneous ternary data pairs according to the position information in the first data set and the second data set of the same preset unit time period;
a target heterogeneous account pair determining module, configured to use two accounts, corresponding to the heterogeneous ternary data pair, of which the time difference between reporting times in the heterogeneous ternary data pair is smaller than or equal to a preset error threshold as a target heterogeneous account pair;
the space error determining unit is used for determining the space error of each target heterogeneous account pair according to the corresponding position information of each target heterogeneous account pair;
the time error determining unit is used for determining the time error of each target heterogeneous account pair according to the corresponding report time of each target heterogeneous account pair;
a standard deviation obtaining unit, configured to obtain a standard deviation of a preset spatial error and a standard deviation of a preset time error;
a space-time mismatching degree determination unit, configured to determine a space-time mismatching degree of the target heterogeneous account number pair based on the spatial error, the temporal error, the standard deviation of the preset spatial error, and the standard deviation of the preset temporal error;
the system comprises a time-space mismatching degree sum determining module, a time-space mismatching degree sum determining module and a time-space mismatching degree sum determining module, wherein the time-space mismatching degree sum determining module is used for determining the time-space mismatching degree sum of a target heterogeneous account number pair comprising a first target account number, and the first target account number is an account number corresponding to any account number identification in a first data set of a plurality of preset unit time periods;
a chi-square distribution checking module, configured to determine, based on chi-square distribution checking, that a sum of spatio-temporal mismatching degrees corresponding to the first target account complies with a significance of a target chi-square distribution, where the target chi-square distribution includes a chi-square distribution with a number of positioning reports of the first target account whose degree of freedom is twice as large, and the significance characterizes a reliability degree that a target heterogeneous account including the first target account has an association with two heterogeneous accounts corresponding thereto;
and the associated account number determining module is used for determining a target associated account number between the first system and the second system based on the significance.
8. The apparatus of claim 7, wherein the heterologous triple determination module comprises:
the data mapping unit is used for mapping the first data set and the second data set of the same preset unit time period to a preset grid according to the position information of the first data set and the second data set of the same preset unit time period, and the preset grid comprises a plurality of sub-grids;
and the heterogeneous ternary data pair generation unit is used for generating a plurality of pairs of heterogeneous ternary data pairs based on pairwise heterogeneous ternary data in the same sub grid.
9. The apparatus of claim 8, further comprising:
the data diffusion processing module is used for performing diffusion processing on the ternary data in the first data set in the preset grid before generating multiple pairs of heterogeneous ternary data pairs based on pairwise heterogeneous ternary data in the same sub-grid;
correspondingly, the heterogeneous ternary data pair generation unit is specifically used for generating multiple pairs of heterogeneous ternary data pairs based on pairwise heterogeneous ternary data in the same sub-grid after diffusion processing.
10. The apparatus of claim 9, wherein when there is a spatiotemporal mismatch for a corresponding plurality of heterogeneous target account number pairs, the apparatus further comprises:
the spatiotemporal mismatching degree comparison module is used for comparing the size of a plurality of spatiotemporal mismatching degrees corresponding to the same target heterologous account number pair before determining the sum of the spatiotemporal mismatching degrees of the target heterologous account number pair comprising the first target account number;
and the second space-time mismatching degree determining module is used for taking the minimum space-time mismatching degree as the space-time mismatching degree of the same target heterogeneous account pair.
11. The apparatus of claim 7, wherein the associated account number determination module comprises:
a first significance comparison unit, configured to compare magnitudes of significance of target heterogeneous account pairs including the first target account;
a first primary selection associated account determining unit, configured to use a target heterogeneous account pair corresponding to the minimum degree of significance corresponding to the first target account as a first primary selection associated account;
a second significance comparison unit, configured to compare a significance of a target heterogeneous account pair including a second target account, where the second target account is an account corresponding to any account identifier in a second data set of the multiple preset unit time periods;
a second primary selection associated account determining unit, configured to use a target heterogeneous account pair corresponding to the minimum degree of significance corresponding to the second target account as a second primary selection associated account;
the first duplicate removal processing unit is used for carrying out duplicate removal processing on the first primarily selected associated account and the second primarily selected associated account;
the first primary selection associated account deleting unit is used for deleting the primary selection associated accounts including the same account when the same account appears in a plurality of de-duplicated primary selection associated accounts;
and the first target associated account determining unit is used for taking the primarily selected associated account subjected to deletion processing as a target associated account.
12. The apparatus of claim 7, wherein the associated account number determination module comprises:
a confidence coefficient determining unit, configured to determine a confidence coefficient of a target heterogeneous account pair including the first target account based on a significance corresponding to the first target account;
a first confidence degree comparison unit, configured to compare the magnitude of confidence degrees of target heterogeneous account pairs including the first target account;
a third primary selection associated account determining unit, configured to use a target heterogeneous account pair corresponding to a highest confidence degree corresponding to the first target account as a third primary selection associated account;
a second confidence degree comparing unit, configured to compare confidence degrees of target heterogeneous account pairs including a second target account, where the second target account identifies a corresponding account for any account in a second data set of the multiple preset unit time periods;
a fourth primary selection associated account determining unit, configured to use a target heterogeneous account pair corresponding to a highest confidence degree corresponding to the second target account as a fourth primary selection associated account;
the second duplicate removal processing unit is used for carrying out duplicate removal processing on the third primarily selected associated account and the fourth primarily selected associated account;
the second primary selection associated account deleting unit is used for deleting the primary selection associated accounts including the same account when the same account appears in the plurality of de-duplicated primary selection associated accounts;
and the second target associated account determining unit is used for taking the primarily selected associated account subjected to deletion processing as a target associated account.
13. A heterogeneous account number association server, comprising a processor and a memory, wherein at least one instruction or at least one program is stored in the memory, and the at least one instruction or the at least one program is loaded and executed by the processor to implement the heterogeneous account number association method according to any one of claims 1 to 6.
14. A computer-readable storage medium, wherein at least one instruction or at least one program is stored in the storage medium, and the at least one instruction or the at least one program is loaded and executed by a processor to implement the heterogeneous account association method according to any one of claims 1 to 6.
CN201911302985.2A 2019-12-17 2019-12-17 Heterogeneous account number association method, device, equipment and storage medium Active CN111177670B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911302985.2A CN111177670B (en) 2019-12-17 2019-12-17 Heterogeneous account number association method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911302985.2A CN111177670B (en) 2019-12-17 2019-12-17 Heterogeneous account number association method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111177670A CN111177670A (en) 2020-05-19
CN111177670B true CN111177670B (en) 2023-04-07

Family

ID=70647382

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911302985.2A Active CN111177670B (en) 2019-12-17 2019-12-17 Heterogeneous account number association method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111177670B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116204795A (en) * 2021-11-30 2023-06-02 北京达佳互联信息技术有限公司 Object recognition method, device, electronic equipment and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108322317B (en) * 2017-01-16 2022-07-29 腾讯科技(深圳)有限公司 Account identification association method and server
CN108694216B (en) * 2017-04-12 2020-11-27 杭州海康威视数字技术股份有限公司 Method and device for associating multi-source objects
CN108833453B (en) * 2017-04-26 2021-05-25 腾讯科技(深圳)有限公司 A method and device for determining an application account
CN110019180B (en) * 2017-08-10 2021-04-30 中国电信股份有限公司 Multi-source data account association and device
CN107404408B (en) * 2017-08-30 2020-05-22 北京邮电大学 Virtual identity association identification method and device
CN110232387B (en) * 2019-05-24 2022-08-05 河海大学 Different-source image matching method based on KAZE-HOG algorithm

Also Published As

Publication number Publication date
CN111177670A (en) 2020-05-19

Similar Documents

Publication Publication Date Title
CN109492013B (en) Data processing method, device and system applied to database cluster
CN112395157B (en) Audit log acquisition method and device, computer equipment and storage medium
CN110362492B (en) Artificial intelligence algorithm testing method, device, server, terminal and storage medium
CN110618982B (en) Multi-source heterogeneous data processing method, device, medium and electronic equipment
CN103049355B (en) Method and equipment for database system recovery
CN109917978B (en) BiM model-based annotation corresponding component display method and device and storage device
CN106055630A (en) Log storage method and device
CN114356212A (en) Data processing method, system and computer readable storage medium
CN113392153A (en) Data synchronization method and device, electronic equipment and computer storage medium
CN112528327B (en) Data desensitization method and device, and data restoration method and device
CN104182302A (en) Database backup method and device
CN111367754A (en) Data monitoring method and device, computer equipment and storage medium
CN114817340B (en) Data tracing method and device, computer equipment and storage medium
US10915409B2 (en) Caching of backup chunks
CN108363727B (en) Data storage method and device based on ZFS file system
CN110730207B (en) Location identification method, device, equipment and storage medium based on location service
CN111177670B (en) Heterogeneous account number association method, device, equipment and storage medium
CN112561385A (en) Risk monitoring method and system
US20160092801A1 (en) Using complexity probability to plan a physical data center relocation
CN110298178B (en) Trusted policy learning method and device and trusted security management platform
CN112860694B (en) Service data processing method, device and equipment
CN116909816B (en) Database recovery method and device, storage medium and electronic equipment
CN111097175B (en) Simulator determination method and device and storage medium
EP3096248A1 (en) Data management system and data management method
CN116089658B (en) Methods and apparatus for extracting commonalities of objects, storage media and electronic devices

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant