WO2026063921A1

WO2026063921A1 - Techniques for instance-wise feature selection for machine learning

Info

Publication number: WO2026063921A1
Application number: PCT/US2024/047173
Authority: WO
Inventors: Warren du Preez; Bowen Huang
Original assignee: Equifax Inc
Current assignee: Equifax Inc
Priority date: 2024-09-18
Filing date: 2024-09-18
Publication date: 2026-03-26
Anticipated expiration: 2027-03-18

Abstract

In some aspects, a computing system can train a risk assessment model, using a training process, for determining a risk indicator. The training process can include: accessing a set of features; determining, using a selector network, a set of selected features and an indicator vector; and training the risk assessment model using the set of selected features and the indicator vector. The computing system can determine the risk indicator for a target entity using the trained risk assessment model. The computing system can transmit, to a remote computing device, a responsive message including at least the risk indicator for use in controlling access of the target entity to one or more interactive computing environments.

Description

Attorney Docket No.: 096923-1398652

TECHNIQUES FOR INSTANCE- WISE FEATURE SEEECTION FOR MACHINE EEARNING

Technical Field

[0001] The present disclosure relates generally to risk assessment and interaction control. More specifically, but not by way of limitation, this disclosure relates to using instance-wise feature selection on input vectors that may contain missing values to train a machine learning model to generate a risk prediction.

Background

[0002] In machine learning, input vectors for training a machine learning model can contain missing values. In the context of some applications, systematically replacing the missing values with particular values can lead to practical problems, and in some cases there are no logical replacement values. In addition, especially in applications in which input vectors contain a high frequency of missing values, machine learning models can lack predictive power given replacement values that are ambiguous or inaccurate.

[0003] One approach to handling missing values is imputation, in which missing values are filled dynamically, for example, based on summary statistics across sub-populations. However, this approach can lead to inaccurate machine learning models. For example, in data sets containing a relatively large number of missing values, the basis for determining imputed values is relatively weak, leading to inaccurate machine learning models.

Summary

[0004] Various aspects of the present disclosure provide systems and methods for using instance-wise feature selection on training data containing missing values to train a risk assessment model. In one example, a method is performed by one or more processing devices. The method can include training a risk assessment model, using a training process, for generating a risk indicator for a target entity from predictor variables associated with the target entity. The risk indicator can indicate a level of risk associated with the target entity. The training process can include accessing a set of features. The set of features can be used to train the risk assessment model to output the risk indicator . The training process can include creating, using a selector network, a set of selected features and an indicator vector. Each

1

US2008 301960472 Attorney Docket No.: 096923-1398652 element of the indicator vector can indicate whether a corresponding feature in the input set of features is selected for inclusion in the set of selected features. The training process can include training the risk assessment model using the set of selected features and the indicator vector output by the selector network. The method can include determining, for the target entity, the risk indicator using the trained risk assessment model. The method can include transmitting, to a remote computing device, a responsive message including at least the risk indicator for use in controlling access of the target entity to one or more interactive computing environments. [0005] In another example, a system includes a processing device; and a memory device in which instructions executable by the processing device are stored for causing the processing device to perform various operations. The system can train a risk assessment model, using a training process, for determining a risk indicator for a target entity from predictor variables associated with the target entity. The risk indicator can indicate a level of risk associated with the target entity. The training process can include accessing a set of features. The set of features can be used to train the risk assessment model to output the risk indicator. The training process can include determining, using a selector network, a set of selected features and an indicator vector. Each element of the indicator vector can indicate whether a corresponding feature in the input set of features is selected for inclusion in the set of selected features. The training process can include training the risk assessment model using the set of selected features and the indicator vector output by the selector network. The system additionally can determine, for the target entity, the risk indicator using the trained risk assessment model. The system further can transmit, to a remote computing device, a responsive message including at least the risk indicator for use in controlling access of the target entity to one or more interactive computing environments.

[0006] In yet another example, a non-transitory computer-readable storage medium has program code that is executable by a processor to cause a computing device to perform operations. The operations can include training a risk assessment model, using a training process, for determining a risk indicator for a target entity from predictor variables associated with the target entity. The risk indicator can indicate a level of risk associated with the target entity. The training process can include accessing a set of features. The set of features can be used to train the risk assessment model to output the risk indicator. The training process can include determining, using a selector network, a set of selected features and an indicator vector. Each element of the indicator vector can indicate whether a corresponding feature in the input set of features is selected for inclusion in the set of selected features. The training process can include training the risk assessment model using the set of selected features and the indicator

2

US2008 301960472 Attorney Docket No.: 096923-1398652 vector output by the selector network. The operations additionally can include determining, for the target entity, the risk indicator using the trained risk assessment model. The operations further can include transmitting, to a remote computing device, a responsive message including at least the risk indicator for use in controlling access of the target entity to one or more interactive computing environments.

[0007] This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used in isolation to determine the scope of the claimed subject matter. The subject matter should be understood by reference to appropriate portions of the entire specification, any or all drawings, and each claim.

[0008] The foregoing, together with other features and examples, will become more apparent upon referring to the following specification, claims, and accompanying drawings.

Brief Description of the Drawings

[0009] FIG. 1 is a block diagram depicting an example of an operating environment according to certain aspects of the present disclosure.

[0010] FIG. 2 is a block diagram depicting an example of a process for implementing instance-wise feature selection according to certain aspects of the present disclosure.

[0011] FIG. 3 is a flow chart depicting an example of a process for training a machine learning model according to certain aspects of the present disclosure.

[0012] FIG. 4 is a flow chart depicting an example of a process for determining a risk indicator according to certain aspects of the present disclosure.

[0013] FIG. 5 is a block diagram depicting an example of a computing device suitable for implementing aspects of the techniques and technologies presented herein.

Detailed Description

[0014] Aspects of the present disclosure relate to machine learning using instance-wise feature selection on input vectors that may contain missing values to train a machine learning model to determine a risk prediction. In some machine learning applications, training data containing missing values can be used to train a machine learning model by setting the missing values to zero. However, in some applications, zero, or any other imputed numeric value, is not an appropriate replacement for the missing value and can be misleading. For example, if a credit-seeking customer has no credit card, setting the value of the feature “number of credit

3

US2008 301960472 Attorney Docket No.: 096923-1398652 cards overdue” to zero is misleading. Using such a replacement value can lead to inaccurate or misleading decisioning outcomes.

[0015] Certain aspects described herein provide improvements to machine learning techniques for assessing risks, for example, in access control associated with entities. For example, systems and methods described herein enable a machine learning model to be trained on a dataset using a selected set of features having the greatest importance in the context of the model. In some examples, systems and methods described herein can be used to select a set of features from an input set of features having missing values by implementing instance-wise feature selection. Instance-wise feature selection involves flexibly selecting sets of input features for each particular prediction instance, avoiding any missing values, where the selected set of input features are optimal for subsequent prediction using a machine learning model. This approach is particularly suited to application in cases where significant features have missing values or where value imputation leads to an unacceptable level of inaccuracy. Examples disclosed herein solve the problem of missing values using an instance-wise feature selector neural network architecture.

[0016] For example, a risk assessment computing system can determine a risk indicator using a machine learning model trained using instance-wise feature selection. Thus, the risk indicator can be determined flexibly and accurately on a data set containing missing or invalid values. The risk assessment computing system can determine, using a risk assessment model, i.e., a machine learning model, trained using a training process, a risk indicator for a target entity from predictor variables associated with the target entity. The risk indicator can indicate a level of risk associated with the target entity, such that the risk indicator can be used, e.g., by a remote computing system, as a factor in determining whether the target entity should be granted access to a secure network or resource.

[0017] The training process to train the risk assessment model can include accessing a set of features, e.g., from a data repository. This set of features can be used to train the machine learning model to output the risk indicator. In some examples, the set of features can include one or more missing values. In other examples, the set of features may not include any missing values. The training process can include determining a set of selected features and an indicator vector using a selector network. The selector network can be, for example, a machine learning model such as a neural network. Each element of the indicator vector can be associated with one of the input features. Thus, the indicator vector can, at least in part, dictate the impact of each feature on the machine learning model by indicating whether or not each member of the input feature set is to be used to train the machine learning model.

4

US2008 301960472 Attorney Docket No.: 096923-1398652

[0018] The training process can include training the risk assessment model using the set of selected features and the indicator vector output by the selector network. This set of selected features can represent a set of predictive features and can exclude features that are not predictive or that have missing values that would lead to inaccurate outcomes. The system can then use the trained machine learning model to determine the risk indicator for the target entity and can transmit the risk indicator to a remote computing device. The set of selected features can, in some examples, be of the same dimension as the set of input features. Thus, the set of selected features can include the feature value of selected features and a zero, or other value (e.g., -1000) for features that are not selected.

[0019] Certain aspects described herein improve machine learning outcomes by providing a framework for selecting features having the greatest impact on the machine learning model from a set of input features. These techniques can be used for handling training data containing missing values. For example, disclosed examples can flexibly select a set of input features on an instance-wise basis, selecting features determined to be impactful to the machine learning model and avoiding missing values. Disclosed examples improve machine learning outcomes by enabling training of a machine learning model on data sets that contain missing values and were thereby previously unusable, as they led to inaccurate machine learning outcomes. Further, disclosed examples provide an improved framework over other methods of handling missing values, leading to more accurate outcomes of machine learning models. Rather than imputing misleading or inaccurate values (e.g., values determined based on a sparse data set), disclosed examples avoid missing values by selecting a subset of features that have relatively high predictive power and deselecting unimportant features and those containing missing values.

[0020] Certain aspects described herein, which can include using instance-wise feature selection to train a machine learning model to determine a risk indicator, can improve at least the technical fields of controlling interactions between computing environments, access control for a computing environment, or a combination thereof. For instance, by using instance-wise feature selection to train an accurate machine learning model using data with missing values, the risk assessment computing system can cause access to a computing system to be controlled more accurately. The risk assessment computing system can use the trained machine learning model to determine a risk indicator for use in controlling access to a system or resource. The responsive message can include the risk indicator, which may be used to more efficiently predict a risk associated with the target entity accessing a system based on a machine learning model trained on data containing missing values, and the responsive message can facilitate a

5

US2008 301960472 Attorney Docket No.: 096923-1398652 practical application of the instance-wise feature selection techniques described herein by facilitating control of a real -world process such as controlling access to a system or resource. Additionally or alternatively, by using the techniques described herein, a risk assessment computing system may provide legitimate access to the interactive computing environment more efficiently and using fewer computing resources compared to other risk assessment systems or techniques. For example, the risk assessment computing system can determine a risk indicator or an actionable response message efficiently thereby reducing the (i) memory usage, (ii) processing time, (iii) network bandwidth usage, (iv) response time, and the like for controlling access to the interactive computing environment. Accordingly, the risk assessment computing system improves the access control for the computing environment by reducing memory usage, processing time, network bandwidth consumption, response time, and the like with respect to controlling access to the interactive computing environment using at least the system architecture and techniques described herein.

[0021] These illustrative examples are given to introduce the reader to the general subject matter discussed here and are not intended to limit the scope of the disclosed concepts. The following sections describe various additional features and examples with reference to the drawings in which like numerals indicate like elements, and directional descriptions are used to describe the illustrative examples but, like the illustrative examples, should not be used to limit the present disclosure.

Operating Environment Example for Machine-Learning Operations

[0022] Referring now to the drawings, FIG. 1 is a block diagram depicting an example of an operating environment 100 in which a risk assessment computing system 102 builds and trains a risk assessment model 104 that can be trained to predict risk indicators based on training data provided at least in part by a selector network 106. FIG. 1 depicts examples of hardware components of a risk assessment computing system 102, according to some aspects. The risk assessment computing system 102 is a specialized computing system that may be used for processing large amounts of data using a large number of computer processing cycles. The risk assessment computing system 102 can include a model training server 108 for building and training a risk assessment model 104 used to predict risk indicators associated with an entity accessing controlled resources. The risk assessment computing system 102 can further include a risk assessment server 110 for performing a risk assessment for given predictor variables 112 using the trained risk assessment model 104.

[0023] The model training server 108 can include one or more processing devices that execute program code, such as a model training application 114. The program code is stored

6

US2008 301960472 Attorney Docket No.: 096923-1398652 on a non-transitory computer-readable medium. The model training application 114 can execute one or more processes to train and optimize a risk assessment model 104 for predicting risk indicators based on the predictor variables 112.

[0024] In some aspects, the model training application 114 can build and train a risk assessment model 104 using risk assessment training data 116 in a training process. The risk assessment training data 116 can include multiple training vectors including training predictor variables and training risk indicator outputs corresponding to the training vectors. In some cases, the risk assessment training data 116 may include differing subsets of data sources available. The risk assessment training data 116 can be based on data sets having missing or invalid values. The risk assessment training data 116 can be stored in one or more network- attached storage units on which various repositories, databases, or other structures are stored. An example of these data structures is the risk data repository 118.

[0025] The risk assessment training data 116 may be at least in part generated using the selector network 106. Risk assessment data can include data corresponding to one or more interactions of at least one entity with a computing environment. As an example, the interactions may be generated by the entity using a user computing system 120 to interact with an interactive computing environment provided by a client computing system 122. In some implementations, the model training application 114 can execute one or more processes to use the selector network 106 to generate the risk assessment training data 116 from data containing missing values. In additional or alternative aspects, the model training application 114 can build and train the selector network 106 to predict the missing values.

[0026] Network-attached storage units may store a variety of different types of data organized in a variety of different ways and from a variety of different sources. For example, the network-attached storage unit may include storage other than primary storage located within the model training server 108 that is directly accessible by processors located therein. In some aspects, the network-attached storage unit may include secondary, tertiary, or auxiliary storage, such as large hard drives, servers, virtual memory, among other types. Storage devices may include portable or non-portable storage devices, optical storage devices, and various other mediums capable of storing and containing data. A machine-readable storage medium or computer-readable storage medium may include a non-transitory medium in which data can be stored and that does not include carrier waves or transitory electronic signals. Examples of a non-transitory medium may include, for example, a magnetic disk or tape, optical storage media such as a compact disk or digital versatile disk, flash memory, memory, or memory devices.

7

US2008 301960472 Attorney Docket No.: 096923-1398652

[0027] The risk assessment server 110 can include one or more processing devices that execute program code, such as a risk assessment application 124. The program code is stored on a non-transitory computer-readable medium. The risk assessment application 124 can execute one or more processes to use the risk assessment model 104 trained by the model training application 114 to predict risk indicators based on input predictor variables 112. The risk indicators can be used to protect or allocate computing resources of the risk assessment computing system 102.

[0028] Furthermore, the risk assessment computing system 102 can communicate with various other computing systems, such as client computing systems 122. For example, client computing systems 122 may send risk assessment queries to the risk assessment server 108 for risk assessment or may send signals to the risk assessment server 108 that control or otherwise influence different aspects of the risk assessment computing system 102. The client computing systems 122 may also interact with user computing systems 120 via one or more public data networks 126 to facilitate interactions between users of the user computing systems 120 and interactive computing environments provided by the client computing systems 122.

[0029] Each client computing system 122 may include one or more third-party devices, such as individual servers or groups of servers operating in a distributed manner. A client computing system 122 can include any computing device or group of computing devices operated by a seller, lender, or other providers of products or services. The client computing system 122 can include one or more server devices. The one or more server devices can include or can otherwise access one or more non-transitory computer-readable media. The client computing system 122 can also execute instructions that provide an interactive computing environment accessible to user computing systems 120. Examples of the interactive computing environment include a mobile application specific to a particular client computing system 122, a web-based application accessible via a mobile device, etc. The executable instructions are stored in one or more non-transitory computer-readable media.

[0030] The client computing system 122 can further include one or more processing devices that are capable of providing the interactive computing environment to perform operations described herein. The interactive computing environment can include executable instructions stored in one or more non-transitory computer-readable media. The instructions providing the interactive computing environment can configure one or more processing devices to perform operations described herein. In some aspects, the executable instructions for the interactive computing environment can include instructions that provide one or more graphical interfaces. The graphical interfaces are used by a user computing system 120 to access various

8

US2008 301960472 Attorney Docket No.: 096923-1398652 functions of the interactive computing environment. For instance, the interactive computing environment may transmit data to and receive data from a user computing system 120 to shift between different states of the interactive computing environment, where the different states allow one or more electronics transactions between the user computing system 120 and the client computing system 122 to be performed.

[0031] In some examples, a client computing system 122 may have other computing resources associated therewith (not shown in FIG. 1), such as server computers hosting and managing virtual machine instances for providing cloud computing services, server computers hosting and managing online storage resources for users, server computers for providing database services, and others. The interaction between the user computing system 120 and the client computing system 122 may be performed through graphical user interfaces presented by the client computing system 122 to the user computing system 120, or through application programming interface (API) calls or web service calls.

[0032] A user computing system 120 can include any computing device or other communication device operated by an entity, such as a user, an organization, or a company. The user computing system 120 can include one or more computing devices, such as laptops, smartphones, and other personal computing devices. A user computing system 120 can include executable instructions stored in one or more non-transitory computer-readable media. The user computing system 120 can also include one or more processing devices that are capable of executing program code to perform operations described herein. In various examples, the user computing system 120 can allow a user to access certain online services from a client computing system 122 or other computing resources, to engage in mobile commerce with a client computing system 122, to obtain controlled access to electronic content hosted by the client computing system 122, etc.

[0033] For instance, the user can use the user computing system 120 to engage in an electronic transaction with a client computing system 122 via an interactive computing environment. An electronic transaction between the user computing system 120 and the client computing system 122 can include, for example, the user computing system 120 being used to request online storage resources managed by the client computing system 122, acquire cloud computing resources (e.g., virtual machine instances), and so on. An electronic transaction between the user computing system 120 and the client computing system 122 can also include, for example, querying a set of sensitive or other controlled data, accessing online financial services provided via the interactive computing environment, submitting an online credit card application or other digital application to the client computing system 122 via the interactive

9

US2008 301960472 Attorney Docket No.: 096923-1398652 computing environment, operating an electronic tool within an interactive computing environment hosted by the client computing system (e.g., a content-modification feature, an application-processing feature, etc.).

[0034] In some aspects, an interactive computing environment implemented through a client computing system 122 can be used to provide access to various online functions. As a simplified example, a website or other interactive computing environment provided by an online resource provider can include electronic functions for requesting computing resources, online storage resources, network resources, database resources, or other types of resources. In another example, a website or other interactive computing environment provided by a financial institution can include electronic functions for obtaining one or more financial services, such as loan application and management tools, credit card application and transaction management workflows, electronic fund transfers, etc. A user computing system 120 can be used to request access to the interactive computing environment provided by the client computing system 122, which can selectively grant or deny access to various electronic functions. Based on the request, the client computing system 122 can collect data associated with the user and communicate with the risk assessment server 110 for risk assessment. Based on the risk indicator predicted by the risk assessment server 110, the client computing system 122 can determine whether to grant the access request of the user computing system 120 to certain features of the interactive computing environment.

[0035] In a simplified example, the system depicted in FIG. 1 can configure a selector network 106 to be used for accurately determining risk assessment training data 116 used to train a risk assessment model 104 to determine risk indicators, such as credit scores, using predictor variables 112. In additional or alternative examples, the risk assessment model 104 may determine adverse action codes or other explanation codes for the predictor variables 112. A predictor variable 112 can be any variable predictive of risk that is associated with an entity. Any suitable predictor variable that is authorized for use by an appropriate legal or regulatory framework may be used.

[0036] Examples of predictor variables 112 used for predicting the risk associated with an entity accessing online resources include, but are not limited to, variables indicating the demographic characteristics of the entity (e.g., name of the entity, the network or physical address of the company, the identification of the company, the revenue of the company), variables indicative of prior actions or transactions involving the entity (e.g., past requests of online resources submitted by the entity, the amount of online resource currently held by the entity, and so on.), variables indicative of one or more behavioral traits of an entity (e.g., the

10

US2008 301960472 Attorney Docket No.: 096923-1398652 timeliness of the entity releasing the online resources), etc. Similarly, examples of predictor variables 112 used for predicting the risk associated with an entity accessing services provided by a financial institute include, but are not limited to, indicative of one or more demographic characteristics of an entity (e.g., age, gender, income, etc.), variables indicative of prior actions or transactions involving the entity (e.g., information that can be obtained from credit files or records, financial records, consumer records, or other data about the activities or characteristics of the entity), variables indicative of one or more behavioral traits of an entity, etc.

[0037] The predicted risk indicator can be used by the service provider to determine the risk associated with the entity accessing a service provided by the service provider, thereby granting or denying access by the entity to an interactive computing environment implementing the service. For example, if the service provider determines that the predicted risk indicator is lower than a threshold risk indicator value, then the client computing system 122 associated with the service provider can generate or otherwise provide access permission to the user computing system 120 that requested the access. The access permission can include, for example, cryptographic keys used to generate valid access credentials or decryption keys used to decrypt access credentials. The client computing system 122 associated with the service provider can also allocate resources to the user and provide a dedicated web address for the allocated resources to the user computing system 120, for example, by adding it in the access permission. With the obtained access credentials, the dedicated web address, or both, the user computing system 120 can establish a secure network connection to the computing environment hosted by the client computing system 122 and access the resources via invoking API calls, web service calls, HTTP requests, or other proper mechanisms.

[0038] Each communication within the operating environment 100 may occur over one or more data networks, such as a public data network 126, a network 128 such as a private data network, or some combination thereof. A data network may include one or more of a variety of different types of networks, including a wireless network, a wired network, or a combination of a wired and wireless network. Examples of suitable networks include the Internet, a personal area network, a local area network (“LAN”), a wide area network (“WAN”), or a wireless local area network (“WLAN”). A wireless network may include a wireless interface or a combination of wireless interfaces. A wired network may include a wired interface. The wired or wireless networks may be implemented using routers, access points, bridges, gateways, or the like, to connect devices in the data network.

[0039] The number of devices depicted in FIG. 1 is provided for illustrative purposes. Different numbers of devices may be used. For example, while certain devices or systems are

11

US2008 301960472 Attorney Docket No.: 096923-1398652 shown as single devices in FIG. 1, multiple devices may instead be used to implement these devices or systems. Similarly, devices or systems that are shown as separate, such as the model training server 108 and the risk assessment server 110, may be instead implemented in a single device or system.

Examples of Operations Involving Instance-Wise Feature Selection

[0040] FIG. 2 is a flow diagram depicting an example of a process 200 for using a selector network 106 to modify data consumed by a downstream risk assessment model 104 through a process interpretable as selection of subsets of the complete available feature set contained in training data 116 (on a row-wise basis), where training data 116 can be used to train risk assessment model 104, and may contain missing values. Once trained, the risk assessment model 104 can determine risk indicators for a target entity based on predictor variables 112 associated with the target entity. One or more computing devices (e.g., the risk assessment server 110) implement operations depicted in FIG. 2 by executing suitable program code (e.g., the risk assessment application 124). For illustrative purposes, the process 200 is described with reference to certain examples depicted in the figures. Other implementations, however, are possible.

[0041] Process 200 described below implements a two-step training process. At a soft- selection step, a set of input features undergoes soft selection, in which the input features are multiplied by values between 0 and 1, where the multiplier value indicates the relative importance of the feature in the context of the machine learning model. In some examples, weights of the selector network and the risk assessment machine learning model are trained simultaneously. At a hard-selection step, soft selection is switched to hard selection. For example, the values between 0 and 1 used in the soft-selection step are converted to either 0 or 1 , based on their comparison to a threshold value . At the hard-selection step, the weights of the selector network are fixed or frozen (i.e., they are not subject to adjustment via training), and the weights of the risk assessment machine learning model are further trained.

[0042] The selector network 106 can produce an encoding of the input features (e.g., features of the risk training data 116) using a first layer. In some examples, the first layer can be a fully connected layer. In some examples, the risk training data 116 can include categorical and numerical features. The categorical features can be one-hot encoded or can be converted to floats using a target encoding. In the case of one-hot encoding, a one-hot encoded vector X_cat ^e {0,l}^dl and a vector of numeric features, X_num G R^d2 may each be passed through separate dense layers, LI and L2, to produce two separate encodings. LI and L2 are represented

12

US2008 301960472 Attorney Docket No.: 096923-1398652 as dense layers 202, where dl and d2 are the respective dimensions of X_cat and X_num. These encodings can then be concatenated to produce a new vector: X_enc = Ll(X_cat) ® L2(X_num) where X_enc G Rd=di+d2 j_{n exam}pi_es j_{n w}hich the categorical features are converted to floats, these values can be included in X_num G R^d2, which can be passed to L2 to produce X_enc G R^d=d2. in some examples, the activations of LI and L2 can be tuned/selected as hyperparameters .

[0043] Once the encoded vector, X_enc, is generated, it can be passed through a fully connected layer 204 to generate a soft-selection vector V G [0,1] ^d representative of the relative importance of each feature in the set of features. Masking module 206 can be used to assign a value (e.g., zero) to those features having missing values, as part of the process of generating the soft-selection vector V . For example, elements of soft-selection vector V corresponding, by position, to missing values in the input vector may be replaced with 0, or another replacement value (e.g., -1000). A sigmoid function 208 can be used to limit the range of elements of V to [0,1].

[0044] At operation 212, the output vector V can then be multiplied element- wise with the original set of input features, X, where X = X_cat®X_num. This yields vector X₂ where X₂ = X Q V, such that X₂ E R^d . In other words, soft selection is applied using V to generate X₂ such that the vector X₂ describes the set of selected features from the original vector X of features weighted according to each feature’s relative importance.

[0045] In some examples, a binarizer function 210 can be applied to the output vector V. For example, B-. [0,l]^d -> {0,l}^d where B(x) = 1 if x > b or 0 otherwise, b can be a hyperparameter that can be tuned to influence the number of selected features. At operation 214, the vector X₂ can be concatenated with the binarized vector V to yield X* = X₂ ® B(V). Vector X* captures the original features weighted through soft selection and also differentiates features that happen to have a value of zero from those that were deselected. For example, if a given element of X₂ and the corresponding element of B(V) are both 0, this indicates deselection (i.e., in general, values of 0 in B(V) may indicate deselection). If a given element of X₂ is zero, but the corresponding element of B(V) is not 0, this indicates that the feature happened to have a value of zero, but was still selected. X* can be passed to the risk assessment model 104 and the weights associated with the selector network 106 and risk assessment model 104 can be updated using cross entropy loss with respect to a given classification label.

[0046] Subsequently, soft selection is replaced with hard selection. That is, the binarizer function B can be applied to V before multiplying element-wise with the original vector of

13

US2008 301960472 Attorney Docket No.: 096923-1398652 input features X to generate vector X₃ where X₃ = B(V) Q X. The input to the risk assessment model 104 can then be given as = X₃ ® B(V). The weights of the risk assessment model 104 are then updated using cross entropy loss with respect to the classification label, while the selector network 106 weights are held fixed.

[0047] The resulting weights for the risk assessment model 104 can be used in applying the risk assessment model 104 to the predictor variables 112 to determine the risk indicator. The risk assessment model 104 will receive ** as input, which includes the original input feature vector with non-selected features replaced with zero, and with selection indicator B(V) appended.

Examples of Operations Involving Machine Learning

[0048] FIG. 3 is a flow diagram depicting an example of a process 300 for using a selector network 106 to determine a feature set and an indicator vector for the risk assessment model 104. Once trained, the risk assessment model 104 can determine risk indicators for a target entity based on predictor variables 112 associated with the target entity. One or more computing devices (e.g., the risk assessment server 110) implement operations depicted in FIG. 3 by executing suitable program code (e.g., the risk assessment application 124). For illustrative purposes, the process 300 is described with reference to certain examples depicted in the figures. Other implementations, however, are possible.

[0049] At step 302, the process 300 can involve accessing a set of features. In some examples, the set of features may be received as a training data set including a number of features, where the training data set can include one or more instances of missing feature values. The model training server 108 may access the set of features for training the risk assessment model 104. The process 300 can be implemented to train the risk assessment model 104 based on instance-wise feature selection using a selector network 106.

[0050] At step 304, the process 300 can involve determining, using the selector network 106, a set of selected features and an indicator vector associated with each member of the set of input features. To produce the set of selected features and the indicator vector, the model training application 114 can produce an encoding of categorical features using a layer of the selector network 106, as represented by encoding layers 202. The risk assessment application 114 can determine an encoding of numerical features using a separate layer of the selector network 106 (again, represented by encoding layers 202). A collective encoding vector can be created by concatenating the categorical encoding vector and these encoding vectors and passing the result through a layer (e.g., collective encoding layers 204) of the selector network

14

US2008 301960472 Attorney Docket No.: 096923-1398652

106. A ‘soft selection’ vector can then be created by passing this collective encoding vector through a sigmoid function to generate a vector including a set of elements, each taking a value between 0 and 1. An element having a value of zero can correspond to a missing feature value in the set of features. In some examples, a binarizer function can be applied on top of the sigmoid function such that each element of the binarized output vector is forced to take either a 0 or 1, and this resulting binarized vector can then be multiplied element-wise with the original input vector to create a ‘hard selection’ vector.

[0051] At step 304, the process 300 can further include fixing the output weights of the soft-selection step and fine-tuning the weights of the risk assessment model 104 using hard selection. For example, the weights of the selector network 106 can be fixed, while further tuning the weights of the risk assessment model 104.

[0052] At step 306, the process 300 can include training the risk assessment model 104 using the set of selected features and the indicator vector output by the selector network 106. At this step, the risk assessment model 104 can have feature weights as determined during the hard-selection step of step 304.

[0053] FIG. 4 is a flow diagram depicting an example of a process 400 for using a risk assessment model 104 (e.g., a risk assessment model as generated and trained using process 300) to determine a risk indicator associated with a target entity. Once trained, the risk assessment model 104 can determine risk indicators for a target entity based on predictor variables 112 associated with the target entity. One or more computing devices (e.g., the risk assessment server 110) implement operations depicted in FIG. 4 by executing suitable program code (e.g., the risk assessment application 124). For illustrative purposes, the process 400 is described with reference to certain examples depicted in the figures. Other implementations, however, are possible.

[0054] At block 402, the process 400 involves receiving a risk assessment query. The risk assessment query can be received by the risk assessment server 110 and can be a request for a risk indicator associated with a target entity. The risk indicator can be used by the requesting system, for example, to determine whether to grant or deny access of the target entity to a secure system or resource.

[0055] At block 404, the process 400 involves applying a machine learning model (e.g., the risk assessment model 104) to generate the risk indicator based on input predictor variables (e.g., predictor variables 112) associated with the target entity. In some examples, the risk assessment model 104 can be generated and trained based on process 300, by using the selector network 106 to determine a set of selected features from a set of available features and an

15

US2008 301960472 Attorney Docket No.: 096923-1398652 indicator vector associated with the set of selected features. The set of selected features can correspond, in some examples, to features having no missing values and that are deemed to be most predictive in the context of the risk assessment model 104.

[0056] At block 406, the process 400 involves transmitting, to a remote computing device (e.g., to the client computing device 122 or the user computing device 120), a responsive message including at least the risk indicator for use in controlling access of the target entity to one or more interactive computing environments. Accordingly, the risk indicator can be accurately determined using the risk assessment model 104 that has been trained on a set of selected features determined from a set of input features that may include missing values.

[0057] Disclosed systems and methods improve security by enabling model construction and training from incomplete data sets. Further, disclosed systems and methods enable use of incomplete data sets in situations where missing or invalid values cannot simply be imputed. This enables machine learning models to be developed and trained without data imputation, thereby improving the accuracy and predictive power of machine learning models developed using incomplete data. Accordingly, security of computing systems and environments is improved as system administrators or authentication systems can make more accurate and well- informed decisions on whether to allow access by a target entity to these systems and environments.

Example of Computing System for Machine-Learning Operations

[0058] Any suitable computing system or group of computing systems can be used to perform the operations for the machine -learning operations described herein. For example, FIG. 5 is a block diagram depicting an example of a computing device 500, which can be used to implement the risk assessment server 110 or the model training server 108. The computing device 500 can include various devices for communicating with other devices in the operating environment 100, as described with respect to FIG. 1. The computing device 500 can include various devices for performing one or more transformation operations described above with respect to FIGS. 1-4.

[0059] The computing device 500 can include a processor 502 that is communicatively coupled to a memory 504. The processor 502 executes computer-executable program code stored in the memory 504, accesses information stored in the memory 504, or both. Program code may include machine -executable instructions that may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software package, a class, or any combination of instructions, data structures, or program statements. A code segment may be coupled to another code segment or a hardware circuit by passing or receiving information,

16

US2008 301960472 Attorney Docket No.: 096923-1398652 data, arguments, parameters, or memory contents. Information, arguments, parameters, data, etc. may be passed, forwarded, or transmitted via any suitable means including memory sharing, message passing, token passing, network transmission, among others.

[0060] Examples of a processor 502 include a microprocessor, an application-specific integrated circuit, a field-programmable gate array, or any other suitable processing device. The processor 502 can include any number of processing devices, including one. The processor 502 can include or communicate with a memory 504. The memory 504 stores program code that, when executed by the processor 502, causes the processor to perform the operations described in this disclosure.

[0061] The memory 504 can include any suitable non-transitory computer-readable storage medium. The computer-readable medium can include any electronic, optical, magnetic, or other storage device capable of providing a processor with computer-readable program code or other program code. Non-limiting examples of a computer-readable medium include a magnetic disk, memory chip, optical storage, flash memory, storage class memory, ROM, RAM, an ASIC, magnetic storage, or any other medium from which a computer processor can read and execute program code. The program code may include processor-specific program code generated by a compiler or an interpreter from code written in any suitable computerprogramming language. Examples of suitable programming language include Hadoop, C, C++, C#, Visual Basic, Java, Python, Perl, JavaScript, ActionScript, etc.

[0062] The computing device 500 may also include a number of external or internal devices such as input or output devices. For example, the computing device 500 is shown with an input/output interface 508 that can receive input from input devices or provide output to output devices. A bus 506 can also be included in the computing device 500. The bus 506 can communicatively couple one or more components of the computing device 500.

[0063] The computing device 500 can execute program code 514 that includes the risk assessment application 110 and/or the model training application 108. The program code 514 for the risk assessment application 110 and/or the model training application 108 may be resident in any suitable computer-readable medium and may be executed on any suitable processing device. For example, as depicted in FIG. 5, the program code 514 for the risk assessment application 110 and/or the model training application 108 can reside in the memory 504 at the computing device 500 along with the program data 516 associated with the program code 514, such as the predictor variables 112 and/or the model training samples. Executing the risk assessment application 110 or the model training application 108 can configure the processor 502 to perform the operations described herein.

17

US2008 301960472 Attorney Docket No.: 096923-1398652

[0064] In some aspects, the computing device 500 can include one or more output devices. One example of an output device is the network interface device 510 depicted in FIG. 5. A network interface device 510 can include any device or group of devices suitable for establishing a wired or wireless data connection to one or more data networks described herein. Non-limiting examples of the network interface device 510 include an Ethernet network adapter, a modem, etc.

[0065] Another example of an output device is the presentation device 512 depicted in FIG. 5. A presentation device 512 can include any device or group of devices suitable for providing visual, auditory, or other suitable sensory output. Non-limiting examples of the presentation device 512 include a touchscreen, a monitor, a speaker, a separate mobile computing device, etc. In some aspects, the presentation device 512 can include a remote client-computing device that communicates with the computing device 500 using one or more data networks described herein. In other aspects, the presentation device 512 can be omitted.

[0066] The foregoing description of some examples has been presented only for the purpose of illustration and description and is not intended to be exhaustive or to limit the disclosure to the precise forms disclosed. Numerous modifications and adaptations thereof will be apparent to those skilled in the art without departing from the spirit and scope of the disclosure.

18

US2008 301960472

Claims

Attorney Docket No.: 096923-1398652 Claims

1. A method comprising : training a risk assessment model, using a training process, to determine a risk indicator for a target entity from predictor variables associated with the target entity, wherein the risk indicator indicates a level of risk associated with the target entity, and wherein the training process includes operations comprising: accessing a set of features, wherein the set of features is used to train the risk assessment model to output the risk indicator; determining, using a selector network, a set of selected features and an indicator vector, wherein each element of the indicator vector is associated with a member of the set of features; and training the risk assessment model using the set of selected features and the indicator vector output by the selector network; determining, for the target entity, the risk indicator using the trained risk assessment model; and transmitting, to a remote computing device, a responsive message including at least the risk indicator for use in controlling access of the target entity to one or more interactive computing environments.

2. The method of claim 1, wherein the set of features comprises a subset of numerical features and a subset of categorical features, and wherein the training process includes operations further comprising: determining a first encoding vector of the subset of categorical features using a first layer of the selector network; determining a second encoding vector of the subset of numerical features using a second layer of the selector network; determining a collective encoding vector by concatenating the first and second encoding vectors and passing the concatenated vector through a third layer of the selector network; and determining a soft selection vector by passing the collective encoding vector through a sigmoid function and setting elements corresponding, by position, to missing elements of an input vector comprising the set of features to zero, wherein the

19

US2008 301960472 Attorney Docket No.: 096923-1398652 soft selection vector can be used, in part, to individuate the set of selected features.

3. The method of claim 2, wherein determining a second indicator vector comprises: assigning a 1 to elements of the indicator vector that correspond, by position, to elements of the input vector that are not missing, and a 0 to elements of the indicator vector corresponding, by position, to elements of the input vector that are missing.

4. The method of claim 3, wherein the second indicator vector has the same dimensions as the input vector.

5. The method of claim 2, wherein the training process includes operations further comprising: concatenating the soft selection vector and the indicator vector to create a selector network output vector; and passing the selector network output vector to the risk assessment model to update a set of weights associated with the risk assessment model.

6. The method of claim 5, wherein updating the set of weights comprises using cross entropy loss with respect to a classification label and outputs of the risk assessment model.

7. The method of claim 2, wherein the training process includes operations further comprising: replacing the soft selection vector with a hard selection vector, wherein the hard selection vector is produced by: passing the soft selection vector through a binarizer function to create a binary selection vector, such that each element is set to either one or zero, and wherein the binarizer function comprises an adjustable threshold such that input values greater than the threshold will be set to one and input values less than or equal to the threshold will be set to zero, and multiplying the input vector elementwise with the binary selection vector;

20

US2008 301960472 Attorney Docket No.: 096923-1398652 concatenating the hard selection vector and the indicator vector to generate a final selector network output vector; and passing the final selector network output vector to the risk assessment model as part of a process for fine tuning weights of the risk assessment model.

8. A system comprising: a processing device; and a memory device in which instructions executable by the processing device are stored for causing the processing device to: training a risk assessment model, using a training process, to determine a risk indicator for a target entity from predictor variables associated with the target entity, wherein the risk indicator indicates a level of risk associated with the target entity, and wherein the training process includes operations comprising: accessing a set of features, wherein the set of features is used to train the risk assessment model to output the risk indicator; determining, using a selector network, a set of selected features and an indicator vector, wherein each element of the indicator is associated with a member of the set of selected features; and training the risk assessment model using the set of selected features and the indicator vector output by the selector network; determining, for the target entity, the risk indicator using the trained risk assessment model; and transmitting, to a remote computing device, a responsive message including at least the risk indicator for use in controlling access of the target entity to one or more interactive computing environments.

9. The system of claim 8, wherein the set of features comprises a subset of numerical features and a subset of categorical features and wherein the training process includes operations further comprising: determining a first encoding vector of the subset of categorical features using a first layer of the selector network; determining a second encoding vector of the subset of numerical features using a second layer of the selector network;

21

US2008 301960472 Attorney Docket No.: 096923-1398652 determining a collective encoding vector by concatenating the first and second encoding vectors and passing the concatenated vector through a third layer of the selector network; and determining a soft selection vector by passing the collective encoding vector through a sigmoid function and setting elements corresponding, by position, to missing elements of an input vector comprising the set of features to zero, wherein the soft selection vector can be used, in part, to individuate the set of selected features.

10. The system of claim 9, wherein determining a second indicator vector comprises: assigning a 1 to elements of the indicator vector that correspond, by position, to elements of the input vector that are not missing, and a 0 to elements of the indicator vector corresponding, by position, to elements of the input vector that are missing.

11. The system of claim 10, wherein the second indicator vector has the same dimensions as the input vector.

12. The system of claim 9, wherein the training process includes operations further comprising: concatenating the soft selection vector and the indicator vector to create a selector network output vector; and passing the selector network output vector to the risk assessment model to update a set of weights associated with the risk assessment model.

13. The system of claim 12, wherein updating the set of weights comprises using cross entropy loss with respect to a classification label and outputs of the risk assessment model.

14. The system of claim 9, wherein the training process includes operations further comprising: replacing the soft selection vector with a hard selection vector, wherein the hard selection vector is produced by: passing the soft selection vector through a binarizer function to create a binary selection vector, such that each element is set to either one or zero, and

22

US2008 301960472 Attorney Docket No.: 096923-1398652 wherein the binarizer function comprises an adjustable threshold such that input values greater than the threshold will be set to one and input values less than or equal to the threshold will be set to zero, and multiplying the input vector elementwise with the binary selection vector; concatenating the hard selection vector and the indicator vector to generate a final selector network output vector; and passing the final selector network output vector to the risk assessment model as part of a process for fine tuning weights of the risk assessment model.

15. A non-transitory computer-readable storage medium having program code that is executable by a processor to cause a computing device to perform operations, the operations comprising: training a risk assessment model, using a training process, to determine a risk indicator for a target entity from predictor variables associated with the target entity, wherein the risk indicator indicates a level of risk associated with the target entity, wherein the training process includes operations comprising: accessing a set of features, and wherein the set of features is used to train the risk assessment model to output the risk indicator; determining, using a selector network, a set of selected features and an indicator vector, wherein each element of the indicator vector is associated with a member of the set of selected features; and training the risk assessment model using the set of selected features and the indicator vector output by the selector network; determining, for the target entity, the risk indicator using the trained risk assessment model; and transmitting, to a remote computing device, a responsive message including at least the risk indicator for use in controlling access of the target entity to one or more interactive computing environments.

16. The non-transitory computer-readable storage medium of claim 15, wherein the set of features comprises a subset of numerical features and a subset of categorical features and wherein the training process includes operations further comprising: determining a first encoding vector of the subset of categorical features using a first layer of the selector network;

23

US2008 301960472 Attorney Docket No.: 096923-1398652 determining a second encoding vector of the subset of numerical features using a second layer of the selector network; determining a collective encoding vector by concatenating the first and second encoding vectors and passing the concatenated vector through a third layer of the selector network; and determining a soft selection vector by passing the collective encoding vector through a sigmoid function and setting elements corresponding, by position, to missing elements of an input vector comprising the set of features to zero, wherein the soft selection vector can be used, in part, to individuate the set of selected features.

17. The non-transitory computer-readable storage medium of claim 16, wherein determining the soft selection vector comprises: applying a sigmoid function to a weight vector such that each element of the soft selection vector is between a one or a zero.

18. The non-transitory computer-readable storage medium of claim 17, wherein determining a second indicator vector comprises: assigning a 1 to elements of the indicator vector that correspond, by position, to elements of the input vector that are not missing, and a 0 to elements of the indicator vector corresponding, by position, to elements of the input vector that are missing.

19. The non-transitory computer-readable storage medium of claim 18, wherein the second indicator vector has the same dimensions as the input vector.

20. The non-transitory computer-readable storage medium of claim 19, wherein the training process includes operations further comprising: concatenating the soft selection vector and the indicator vector to create a selector network output vector; and passing the selector network output vector to the risk assessment model to update a set of weights associated with the risk assessment model.

24

US2008 301960472