WO2025021417A1

WO2025021417A1 - Mask 3d (m3d) modeling for lithography simulation

Info

Publication number: WO2025021417A1
Application number: PCT/EP2024/068037
Authority: WO
Inventors: Qiugu Wang; Wen LYU; Xiaobo Xie; Jen-Shiang Wang; Chih-Shiang Chou; Mu FENG; Mingchun TIEN; Rafael C. Howell
Original assignee: ASML Netherlands BV
Current assignee: ASML Netherlands BV
Priority date: 2023-07-25
Filing date: 2024-06-26
Publication date: 2025-01-30
Anticipated expiration: 2026-01-25
Also published as: CN121844252A; KR20260040242A; TW202526498A

Abstract

A method of lithography simulation comprising: obtaining a lower order component of a mask 3D (M3D) mask image that corresponds to a mask pattern; generating a higher order component of the M3D mask image by using a machine learning (ML) model provided with input corresponding to the mask pattern; and combining the lower order component and the higher order component of the M3D mask image to generate a resultant M3D mask image corresponding to the mask pattern.

Description

MASK 3D (M3D) MODELING FOR LITHOGRAPHY SIMULATION

CROSS REFERENCE TO RELATED APPLICATION

[0001] This application claims priority to U.S. Application No. 63/528,865, filed July 25, 2023, and which is incorporated herein in its entirety by reference.

TECHNICAL FIELD

[0002] The description herein relates to a method and system for simulating a lithographic process and system.

BACKGROUND

[0003] A lithographic projection apparatus can be used, for example, in the manufacture of integrated circuits (ICs). A patterning device (e.g., a mask) may contain or provide a pattern corresponding to an individual layer of the IC (“design layout”), and this pattern can be transferred onto a target portion (e.g. comprising one or more dies) on a substrate (e.g., silicon wafer) that has been coated with a layer of radiation-sensitive material (“resist”), by methods such as irradiating the target portion through the pattern on the patterning device. In general, a single substrate includes a plurality of adjacent target portions to which the pattern is transferred successively by the lithographic projection apparatus, one target portion at a time. In one type of lithographic projection apparatus, the pattern on the entire patterning device is transferred onto one target portion in one operation. Such an apparatus is commonly referred to as a stepper. In an alternative apparatus, commonly referred to as a step-and- scan apparatus, a projection beam scans over the patterning device in a given reference direction (the “scanning” direction) while synchronously moving the substrate parallel or anti-parallel to this reference direction. Different portions of the pattern on the patterning device are transferred to one target portion progressively. Since, in general, the lithographic projection apparatus will have a reduction ratio M (e.g., 4), and the reduction ratio can be different in x and y direction features the speed F at which the substrate is moved will be 1/M times that at which the projection beam scans the patterning device. More information with regard to lithographic devices as described herein can be gleaned, for example, from U.S. Patent 6,046,792, incorporated herein by reference.

[0004] Prior to transferring the pattern from the patterning device to the substrate, the substrate may undergo various procedures, such as priming, resist coating and a soft bake. After exposure, the substrate may be subjected to other procedures (“post-exposure procedures”), such as a post-exposure bake (PEB), development, a hard bake and measurement/inspection of the transferred pattern. This array of procedures is used as a basis to make an individual layer of a device, e.g., an IC. The substrate may then undergo various processes such as etching, ion-implantation (doping), metallization, oxidation, chemo-mechanical polishing, etc., all intended to finish the individual layer of the device. If several layers are required in the device, then the whole procedure, or a variant thereof, is repeated for each layer. Eventually, a device will be present in each target portion on the substrate. These devices are then separated from one another by a technique such as dicing or sawing, whence the individual devices can be mounted on a carrier, connected to pins, etc.

[0005] Thus, manufacturing devices, such as semiconductor devices, typically involves processing a substrate (e.g., a semiconductor wafer) using a number of fabrication processes to form various features and multiple layers of the devices. Such layers and features are typically manufactured and processed using, e.g., deposition, lithography, etch, chemical-mechanical polishing, and ion implantation. Multiple devices may be fabricated on a plurality of dies on a substrate and then separated into individual devices. This device manufacturing process may be considered a patterning process. A patterning process involves a patterning step, such as optical and/or nanoimprint lithography using a patterning device in a lithographic apparatus, to transfer a pattern on the patterning device to a substrate and typically, but optionally, involves one or more related pattern processing steps, such as resist development by a development apparatus, baking of the substrate using a bake tool, etching using the pattern using an etch apparatus, etc.

[0006] As noted, lithography is a central step in the manufacturing of devices such as ICs, where patterns formed on substrates define functional elements of the devices, such as microprocessors, memory chips, etc. Similar lithographic techniques are also used in the formation of flat panel displays, micro-electromechanical systems (MEMS) and other devices.

[0007] As semiconductor manufacturing processes continue to advance, the dimensions of functional elements have continually been reduced while the number of functional elements, such as transistors, per device has been steadily increasing over decades, following a trend commonly referred to as “Moore’s law”. At the current state of technology, layers of devices are manufactured using lithographic projection apparatuses that project a design layout onto a substrate using illumination from a deep-ultraviolet illumination source, creating individual functional elements having dimensions well below 100 nm, i.e., less than half the wavelength of the radiation from the illumination source (e.g., a 193 nm illumination source).

[0008] This process in which features with dimensions smaller than the classical resolution limit of a lithographic projection apparatus are printed, is commonly known as low-kl lithography, according to the resolution formula CD = klxk/NA, where I is the wavelength of radiation employed (currently in most cases 248nm or 193nm), NA is the numerical aperture of projection optics in the lithographic projection apparatus, CD is the “critical dimension’ -generally the smallest feature size printed-and kl is an empirical resolution factor. In general, the smaller kl the more difficult it becomes to reproduce a pattern on the substrate that resembles the shape and dimensions planned by a designer in order to achieve particular electrical functionality and performance. To overcome these difficulties, sophisticated fine-tuning steps are applied to the lithographic projection apparatus, the design layout, or the patterning device. These include, for example, but not limited to, optimization of NA and optical coherence settings, customized illumination schemes, use of phase shifting patterning devices, optical proximity correction (OPC, sometimes also referred to as “optical and process correction”) in the design layout, or other methods generally defined as “resolution enhancement techniques” (RET).

SUMMARY

[0009] According to an embodiment, there is provided a method of lithography simulation comprising: obtaining a lower order component of a mask 3D (M3D) mask image that corresponds to a mask pattern; generating a higher order component of the M3D mask image by using a machine learning (ML) model provided with input corresponding to the mask pattern; and combining the lower order component and the higher order component of the M3D mask image to generate a resultant M3D mask image corresponding to the mask pattern.

[0010] In an embodiment, the method further comprises training the ML model to generate output higher order components by using a first set of training data, wherein the first set of training data comprises a set of training mask patterns and training higher order components of training M3D mask images corresponding to the set of training mask patterns.

[0011] In an embodiment, the training higher order components of the M3D mask images comprise determined differences between lower order components of the training M3D mask images and the training M3D mask images for the set of training mask patterns.

[0012] In an embodiment, the method further comprises generating the training higher order component of the training M3D mask image by: obtaining a first training M3D mask image for a first training mask pattern, wherein the first training M3D mask image comprises a linear component and a training high order component; obtaining a linear component of the first training M3D mask image; and determining a training higher order component of the first training M3D mask image based on a difference between the first training M3D mask image and the linear components of the first training M3D mask image for the first training mask pattern.

[0013] In an embodiment, wherein the first set of training mask patterns comprises patterns which are closely spaced relative to a wavelength of illumination of a lithography process.

[0014] In an embodiment, wherein the first set of training mask patterns comprises patterns which experience at least one of a M3D inter-edge effect and a M3D inter-corner effect.

[0015] In an embodiment, obtaining the lower order component of the M3D mask image comprises generating the lower order component of the M3D mask image by using a second model provided with input corresponding to the mask pattern.

[0016] In an embodiment, the method further comprises training the second model to generate an output lower order components by using a second set of training data, wherein the second set of training data comprises a second set of training mask patterns and training lower order components of second training M3D mask images corresponding to the second set of training mask patterns.

[0017] In an embodiment, the second training M3D mask images comprise M3D images generated using rigorous simulation on the second set of training mask patterns. [0018] In an embodiment, the second set of training mask patterns comprises patterns which are widely spaced relative to a wavelength of illumination of a lithography process.

[0019] In an embodiment, the second set of training mask patterns comprises patterns which experience at least one of a single edge effect and an area effect.

[0020] In an embodiment, the second model comprises a ML model.

[0021] In an embodiment, the second model comprises a non-machine learning model.

[0022] In an embodiment, the second model is a trained physical effect model.

[0023] In an embodiment, the resultant M3D mask image, comprised of the lower order component and the higher order component of the M3D mask image, approximates a full representation of the M3D image.

[0024] In an embodiment, the rigorous simulation comprises a finite-discrete-time-domain (FDTD) algorithm applied to the mask pattern.

[0025] In an embodiment, the rigorous simulation comprises a rigorous-coupled waveguide analysis (RCWA) algorithm applied to the mask patterns.

[0026] In an embodiment, the training M3D mask images comprise effective near field images for corresponding mask patterns.

[0027] In an embodiment, the training M3D mask images comprise M3D mask images generated by a model by using metrology mask contours corresponding to the mask pattern.

[0028] In an embodiment, the metrology mask contours are extracted from images of fabricated masks corresponding to the mask patterns.

[0029] In an embodiment, images of fabricated masks comprise scanning electron microscopy (SEM) images of fabricated masks corresponding to the mask patterns.

[0030] In an embodiment, the ML model is a neural network.

[0031] In an embodiment, the ML model is a convolutional neural network (CNN).

[0032] In an embodiment, the ML model is a deep convolutional neural network (DCNN).

[0033] In an embodiment the input corresponding to the mask pattern comprises a thin mask image.

[0034] In an embodiment, the input corresponding to the mask pattern comprises a representation of a mask image.

[0035] In an embodiment, the M3D mask image comprises a transmission function.

[0036] In an embodiment, the M3D mask image comprises a near field image.

[0037] In an embodiment, the M3D mask image comprises an effective representation of a mask image.

[0038] In an embodiment, training the ML model comprises accessing an untrained ML model; and training model formulation and model parameters of the accessed ML model.

[0039] In an embodiment, the method further comprises simulating at least part of a lithography process based on the resultant 3D mask image. [0040] In an embodiment, input corresponding to the mask pattern comprises input corresponding to multiple portions of the mask pattern.

[0041] According to an embodiment, there is provided a method of lithography simulation comprising: obtaining a metrology image of a mask fabricated based on a mask pattern; extracting one or more metrology mask contours from the metrology image of the mask; and simulating at least a portion of a lithography process with an optical model for the mask pattern using the one or more extracted metrology mask contours.

[0042] In an embodiment, simulating the lithography process comprises generating a mask image for the mask pattern using the one or more extracted metrology mask contours.

[0043] In an embodiment, the method further comprises training a model with a set of training data to generate output corresponding to one or more metrology mask contours based on input corresponding to a mask pattern, wherein the set of training data comprises a set of mask patterns and one or more training metrology mask contours extracted from metrology images s corresponding to the set of mask patterns.

[0044] In an embodiment, the method further comprises generating, with the trained model, output corresponding to one or more metrology mask contours based on input corresponding to a mask pattern.

[0045] In an embodiment, the method further comprises training a model with a set of training data to generate output corresponding to one or more metrology mask contours based on input corresponding to a mask pattern and a density map for the mask pattern, wherein the set of training data comprises a set of mask patterns, density maps corresponding to the set of mask patterns, and one or more training metrology mask contours extracted from metrology images corresponding to the set of mask patterns. [0046] In an embodiment, the method further comprises generating, with the trained model, output corresponding to one or more metrology mask contours based on input corresponding to a mask pattern and a density map for the mask pattern.

[0047] In an embodiment, the density map corresponds to a pattern density distribution for the mask pattern.

[0048] In an embodiment, the model is an ensemble model.

[0049] According to an embodiment, there is provided a method to generate a machine learning (ML) model for lithography simulation comprising: obtaining a set of training data comprising a set of mask patterns and higher order components of M3D mask image corresponding to the mask patterns; and training a ML model with the set of training data to generate a higher order residual of an M3D mask image based on an input mask pattern.

[0050] In an embodiment, the method further comprises obtaining a second set of training data comprising a set of mask patterns and M3D mask images corresponding to the mask patterns; and training a second model with the second set of training data to generate a linear approximation of a M3D mask image based on the input mask pattern. [0051] In an embodiment, the second set of training data comprises substantially the same mask patterns as the set of training data.

[0052] In an embodiment, the first set of training data comprises a set of mask patterns which experience inter-edge effect and wherein the second set of training data comprises a set of mask patterns which experience substantially less inter-edge effect than the first set of training data.

[0053] According to another embodiment, there is provided one or more non-transitory, machine- readable medium having instructions thereon, the instructions when executed by a processor being configured to perform the method of any one of another embodiment.

[0054] According to another embodiment, there is provided a system comprising: a processor; and one or more non-transitory, machine -readable medium having instructions thereon, the instructions when executed by the processor being configured to perform the method of any one of another embodiment.

BRIEF DESCRIPTION OF THE DRAWINGS

[0055] The accompanying drawings, which are incorporated in and constitute a part of this specification, show certain aspects of the subject matter disclosed herein and, together with the description, help explain some of the principles associated with the disclosed implementations. In the drawings,

[0056] Figure 1 illustrates a block diagram of various subsystems of a lithographic projection apparatus, according to an embodiment.

[0057] Figure 2 illustrates an exemplary flow chart for simulating lithography in a lithographic projection apparatus, according to an embodiment.

[0058] Figure 3 depicts an exemplary method for simulating a lithography process using a M3D mask image model, according to an embodiment.

[0059] Figure 4 is a schematic overview of M3D mask image modeling with a machine learning (ML) model component to determine M3D mask image residuals, according to an embodiment.

[0060] Figure 5 is a schematic overview of training a machine learning (ML) model to determine M3D mask image residuals for M3D mask image modeling, according to an embodiment.

[0061] Figure 6 illustrates an exemplary method for determining a M3D mask image using a machine learning (ML) model, according to an embodiment.

[0062] Figure 7 illustrates an exemplary method for training a machine learning (ML) model to generate M3D mask image residuals, according to an embodiment.

[0063] Figure 8 illustrates an exemplary method 800 for training a model based on scanning electron microscopy (SEM) images of fabricated masks, according to an embodiment.

[0064] Figure 9 is a block diagram of an example computer system, according to an embodiment. DETAILED DESCRIPTION

[0065] Although specific reference may be made in this text to the manufacture of ICs, it should be explicitly understood that the description herein has many other possible applications. For example, it may be employed in the manufacture of integrated optical systems, guidance and detection patterns for magnetic domain memories, liquid crystal display panels, thin film magnetic heads, etc. The skilled artisan will appreciate that, in the context of such alternative applications, any use of the terms “reticle”, “wafer” or “die” in this text should be considered as interchangeable with the more general terms “mask”, “substrate” and “target portion”, respectively.

[0066] In the present document, the terms “radiation” and “beam” are used to encompass all types of electromagnetic radiation, including ultraviolet radiation (e.g., with a wavelength of 365, 248, 193, 157 or 126 nm) and EUV (extreme ultra-violet radiation, e.g., having a wavelength in the range of about 5-100 nm). In the present document, the term “radiation source” or “source” is used to encompass all types of sources of radiation, including laser sources, incandescent sources, etc. which may include treatment of the radiation between the radiation source and the target or other parts of the optics, including filtering, collimating, focusing, etc. A source may include multiple sources which generate contributions to the radiation used for lithography.

[0067] A patterning device can comprise, or can form, one or more design layouts. The design layout may be generated utilizing CAD (computer-aided design) programs, including general CAD programs such as AutoCAD, Solidworks, etc., or which may be layout specific CAD programs such as LayoutEditor, KLayout, etc. This process is often referred to as EDA (electronic design automation). Most CAD programs follow a set of predetermined design rules in order to create functional design layouts/patterning devices. These rules are set based processing and design limitations. For example, design rules define the space tolerance between devices (such as gates, capacitors, etc.) or interconnect lines, to ensure that the devices or lines do not interact with one another in an undesirable way. One or more of the design rule limitations may be referred to as a “critical dimension” (CD). A critical dimension of a device can be defined as the smallest width of a line or hole, or the smallest space between two lines or two holes. Thus, the CD regulates the overall size and density of the designed device. One of the goals in device fabrication is to faithfully reproduce the original design intent on the substrate (via the patterning device).

[0068] The term “mask” or “patterning device” as employed in this text may be broadly interpreted as referring to a generic patterning device that can be used to endow an incoming radiation beam with a patterned cross-section, corresponding to a pattern that is to be created in a target portion of the substrate. The term “light valve” can also be used in this context. Besides the classic mask (transmissive or reflective; binary, phase-shifting, hybrid, etc.), examples of other such patterning devices include a programmable mirror array. An example of such a device is a matrix-addressable surface having a viscoelastic control layer and a reflective surface. The basic principle behind such an apparatus is that (for example) addressed areas of the reflective surface reflect incident radiation as diffracted radiation, whereas unaddressed areas reflect incident radiation as undiffracted radiation. Using an appropriate filter, the said undiffracted radiation can be filtered out of the reflected beam, leaving only the diffracted radiation behind; in this manner, the beam becomes patterned according to the addressing pattern of the matrix-addressable surface. The required matrix addressing can be performed using suitable electronic means. Examples of other such patterning devices also include a programmable LCD array. An example of such a construction is given in U.S. Patent No. 5,229,872, which is incorporated herein by reference in its entirety.

[0069] The term “projection optics” as used herein should be broadly interpreted as encompassing various types of optical systems, including refractive optics, reflective optics, apertures and catadioptric optics, for example. The term “projection optics” may also include components operating according to any of these design types for directing, shaping or controlling the projection beam of radiation, collectively or singularly. The term “projection optics” may include any optical component in the lithographic projection apparatus, no matter where the optical component is located on an optical path of the lithographic projection apparatus. Projection optics may include optical components for shaping, adjusting and/or projecting radiation from the source before the radiation passes the patterning device, and/or optical components for shaping, adjusting and/or projecting the radiation after the radiation passes the patterning device.

[0070] Figure 1 illustrates a block diagram of various subsystems of a lithographic projection apparatus 10A, according to an embodiment. Major components are a radiation source 12A, which may be a deep-ultraviolet excimer laser source or other type of source including an extreme ultra violet (EUV) source (the lithographic projection apparatus itself need not have the radiation source), illumination optics which, e.g., define the partial coherence (denoted as sigma) and which may include optics 14A, 16Aa and 16Ab that shape radiation from the source 12A; a patterning device (or mask) 18 A; and transmission optics 16Ac that project an image of the patterning device pattern onto a substrate plane 22A.

[0071] A lithographic process performed with the lithographic apparatus 10A, with the patterning device 18 A, may be modeled. The interaction of illumination with the patterning device 18 A may be modeled and output as a M3D mask image, which may represent the electric field of the illumination transmitted through the patterning device 18 A. The mask model may output a M3D mask image which may include three-dimensional effects of caused by non-negligible thickness of printed, absorptive features of the mask (e.g., mask edge-to-edge (e2e) effects, mask edge-to-edge-to-edge (e2e2e) effects, edge-to-corner effects, corner-to-corner effects, etc.). A M3D mask image may include effects which may not be included in a thin mask image. A thin mask image may approximate the patterning device 18 A using the Kirchoff boundary condition, in which the thickness of the absorptive regions is approximated as very small as compared to the wavelength of illumination, while the size of the structures on the patterning device 18 A are approximated as very large when compared to the wavelength of illumination. A thin mask image may have significant inaccuracies when the wavelength of illumination approaches the size of the features of the patterning device 18 A and where the thickness of the absorptive regions of the patterning device 18 A are not negligible with respect to the wavelength of illumination, and these inaccuracies may be ameliorated by using a M3D mask image which may account for these and other interactions of illumination with mask features.

[0072] According to an embodiment of the present disclosure, the illumination transmitted by the patterning device 18A may be modeled using a ML model, which may include a non-linear ML model. The output of the patterning device 18A may be modeled — e.g., as a M3D mask image — by an output of a first model, which may generate to lower order components of the mask image, and by an output of a second ML model, which may generate higher order components of the mask image. Hereinafter, lower order may correspond to zeroth order, first order, second order, and, in some cases, third or fourth order components of the mask image. Lower order may correspond to linear components. Higher order may correspond to second order, third order, and above components of the mask image. The order of the components may correspond to the number of surfaces the corresponding radiation is diffracted from. For example, zeroth order components may be transmitted components, including components which may be components which would be present in a thin mask image. First order components may correspond to linear effects, such as may be modeled using a Kirchoff function. Second order components may correspond to cross terms (e.g., cross terms between horizontal, vertical, and corner Fourier decomposition of transmission functions) which arise from polarization, incident angle, and other effects in a non-thin mask. Second order (and higher order) components may allow the M3D mask image to capture feature to feature interaction due to three-dimensional scattering and other effects. Lower order and higher order may be relative, where in some embodiments a division between lower order and higher order may be varied. In some embodiments, some effects of a given order may be included in the lower order components while other effects of the same order may be included in the higher order components.

[0073] A pupil 20A can be included with transmission optics 16Ac. In some embodiments, there can be one or more pupils before and/or after mask 18 A. As described in further detail herein, pupil 20A can provide patterning of the light that ultimately reaches substrate plane 22A. An adjustable filter or aperture at the pupil plane of the projection optics may restrict the range of beam angles that impinge on the substrate plane 22A, where the largest possible angle defines the numerical aperture of the projection optics NA= n sin(0max), wherein n is the refractive index of the media between the substrate and the last element of the projection optics, and ©max is the largest angle of the beam exiting from the projection optics that can still impinge on the substrate plane 22A.

[0074] In a lithographic projection apparatus, a source provides illumination (i.e., radiation) to a patterning device and projection optics direct and shape the illumination, via the patterning device, onto a substrate. In some instances, the source may provide patterning, directing, or shaping to the radiation. In some instances, patterning, directing, or shaping of radiation may occur between the source and the projection optics. The projection optics may include at least some of the components 14A, 16Aa, 16Ab and 16Ac. An aerial image (Al) is the radiation intensity distribution at substrate level. A resist model can be used to calculate the resist image from the aerial image, an example of which can be found in U.S. Patent Application Publication No. US 2009-0157630, the disclosure of which is hereby incorporated by reference in its entirety. The resist model is related to properties of the resist layer (e.g., effects of chemical processes which occur during exposure, post-exposure bake (PEB) and development). Optical properties of the lithographic projection apparatus (e.g., properties of the illumination, the patterning device and the projection optics) dictate the aerial image and can be defined in an optical model. Since the patterning device used in the lithographic projection apparatus can be changed, it is desirable to separate the optical properties of the patterning device from the optical properties of the rest of the lithographic projection apparatus including at least the source and the projection optics. Details of techniques and models used to transform a design layout into various lithographic images (e.g., an aerial image, a resist image, etc.), apply OPC using those techniques and models and evaluate performance (e.g., in terms of process window) are described in U.S. Patent Application Publication Nos. US 2008-0301620, 2007-0050749, 2007-0031745, 2008-0309897, 2010-0162197, and 2010-0180251, the disclosure of each which is hereby incorporated by reference in its entirety.

[0075] One aspect of simulating a lithographic process is interaction of the radiation and the patterning device. The electromagnetic field of the radiation after the radiation passes the patterning device may be determined from the electromagnetic field of the radiation before the radiation reaches the patterning device and a function that characterizes the interaction. This function may be referred to as the mask transmission function (which can be used to describe the interaction by a transmissive patterning device and/or a reflective patterning device). A mask image herein represents the electromagnetic near field resulting from the radiation interaction with the mask transmission function.

[0076] The thin-mask approximation, also called the Kirchhoff boundary condition, is widely used to simplify the determination of the interaction of the radiation and the patterning device. The thin-mask approximation assumes that the thickness of the structures on the patterning device is very small compared with the wavelength and that the widths of the structures on the mask are very large compared with the wavelength. Therefore, the thin-mask approximation assumes the electromagnetic field after the patterning device is the multiplication of the incident electromagnetic field with the mask transmission function. However, as lithographic processes use radiation of shorter and shorter wavelengths, and the structures on the patterning device become smaller and smaller, the assumption of the thin-mask approximation can break down. For example, interaction of the radiation with the structures (e.g., edges between the top surface and a sidewall) because of their finite thicknesses (“mask 3D effect” or “M3D”) may become significant. Encompassing this scattering in the mask transmission function may enable the mask transmission function to better capture the interaction of the radiation with the patterning device. A mask transmission function under the thin-mask approximation may be referred to as a thin-mask transmission function. A mask transmission function encompassing M3D may be referred to as a M3D mask transmission function.

[0077] Figure 2 illustrates an exemplary flow chart for simulating lithography in a lithographic projection apparatus, according to an embodiment. Source model 31 represents optical characteristics (including radiation intensity distribution and/or phase distribution) of the source. Projection optics model 32 represents optical characteristics (including changes to the radiation intensity distribution and/or the phase distribution caused by the projection optics) of the projection optics. Design layout model 35 represents optical characteristics of a design layout (including changes to the radiation intensity distribution and/or the phase distribution caused by design layout 33), which is the representation of an arrangement of features on or formed by a patterning device. Design layout model 35 can be a near field model, a M3D model, etc. or can otherwise include a model of the illumination (including phase and intensity) which may pass through a patterning device (e.g., mask). Aerial image 36 can be simulated from design layout model 35, projection optics model 32, and design layout model 35. Aerial image 36 can represent an image projected by the projection optics, such as onto a photoresist, using the source and a patterning device (e.g., mask). Aerial image 36 can incorporate a mask 3D (M3D) effect. That is, aerial image 36 can include contributions due to lensing, diffraction, and other light-involved operations withing the mask stack which cause deformations of light in the near field. Resist image 38 can be simulated from aerial image 36 using resist model 37. Simulation of lithography can, for example, predict contours and CDs in the resist image.

[0078] More specifically, source model 31 can represent the optical characteristics of the source that include, but are not limited to, numerical aperture settings, illumination sigma (o) settings as well as any particular illumination shape (e.g., off-axis radiation sources such as annular, quadrupole, dipole, etc.). Projection optics model 32 can represent the optical characteristics of the projection optics, including aberration, distortion, one or more refractive indexes, one or more physical sizes, one or more physical dimensions, etc. Design layout model 35 can represent one or more physical properties of a physical patterning device, as described, for example, in U.S. Patent No. 7,587,704, which is incorporated by reference in its entirety. Design layout model 35 can output an intended pattern for a patterning device (e.g., mask), including in addition to the aerial image 36. Design layout model 35 may be in communication with a mask model, such as a mask model which may output a mask design based on a design layout. The mask model may output a mask design for the design layout 33 which may include assist features, such as sub-resolution assist features (SRAFs) which may be mask patterns which are not printed on the device but which assist in printing of the design layout 33 on the device. The objective of the simulation is to accurately predict, for example, edge placement, aerial image intensity slope and/or CD, which can then be compared against an intended design (e.g., design layout 33). The intended design is generally defined as a pre-OPC design layout which can be provided in a standardized digital file format such as GDSII or OASIS or another file format. [0079] From this design layout, one or more portions may be identified, which are referred to as “clips”. In an embodiment, a set of clips is extracted, which represents complicated patterns in the design layout (typically about 50 to 1000 clips, although any number of clips may be used). These patterns or clips represent small portions (i.e., circuits, cells or patterns) of the design and more specifically, the clips typically represent small portions for which particular attention and/or verification is needed. In other words, clips may be the portions of the design layout, or may be similar or have a similar behavior of portions of the design layout, where one or more critical features are identified either by experience (including clips provided by a customer), by trial and error, or by running a full-chip simulation. Clips may contain one or more test patterns or gauge patterns.

[0080] An initial larger set of clips may be provided a priori by a customer based on one or more known critical feature areas in a design layout which requires particular image optimization. Alternatively, in another embodiment, an initial larger set of clips may be extracted from the entire design layout by using some kind of automated (such as machine vision) or manual algorithm that identifies the one or more critical feature areas.

[0081] In a lithographic projection apparatus, as an example, a cost function may be expressed as

where (z_1; z₂, --- , z_N') are N design variables or values thereof. f_p (z_1; z₂, • • • , z_N) can be a function of the design variables (z_1; z₂, • • • , z_w) such as a difference between an actual value and an intended value of a characteristic for a set of values of the design variables of (z_1; z₂, ••• , z_N). w_p is a weight constant associated with f_p (z_1; z₂, • • • , z_w). For example, the characteristic may be a position of an edge of a pattern, measured at a given point on the edge. Different f_p (z_1; z₂, • • • , z_w) may have different weight w_p. For example, if a particular edge has a narrow range of permitted positions, the weight w_p for the f_p(z^, z₂, ••• , z_N) representing the difference between the actual position and the intended position of the edge may be given a higher value. f_p (z_1; z₂, • • • , z_w) can also be a function of an interlayer characteristic, which is in turn a function of the design variables (z_1; z₂, ••• , z_N). Of course, CF Z₁, Z₂, • • • , z_N) is not limited to the form in Eq. 1. CF(z^, z₂, ••• , z_N) can be in any other suitable form.

[0082] The cost function may represent any one or more suitable characteristics of the lithographic projection apparatus, lithographic process or the substrate, for instance, focus, CD, image shift, image distortion, image rotation, stochastic variation, throughput, local CD variation, process window, an interlayer characteristic, or a combination thereof. In one embodiment, the design variables (z_1; z₂, • • • , z_N) comprise one or more selected from dose, global bias of the patterning device, and/or shape of illumination. Since it is the resist image that often dictates the pattern on a substrate, the cost function may include a function that represents one or more characteristics of the resist image. For example, f_p (z_1; z₂, ' ' ' > ^ZN) ^can be simply a distance between a point in the resist image to an intended position of that point (i.e., edge placement error EPE_p (z₁ , z₂, • • • , z_w). The design variables can include any adjustable parameter such as an adjustable parameter of the source, the patterning device, the projection optics, dose, focus, etc. The cost function may be used to determine, including iteratively, a patterning device (e.g., mask) and process conditions which bring the aerial image 36 into agreement with the design layout 33, including to withing a tolerance threshold.

[0083] The lithographic apparatus may include components collectively called a “wavefront manipulator” that can be used to adjust the shape of a wavefront and intensity distribution and/or phase shift of a radiation beam. In an embodiment, the lithographic apparatus can adjust a wavefront and intensity distribution at any location along an optical path of the lithographic projection apparatus, such as before the patterning device, near a pupil plane, near an image plane, and/or near a focal plane. The wavefront manipulator can be used to correct or compensate for certain distortions of the wavefront and intensity distribution and/or phase shift caused by, for example, the source, the patterning device, temperature variation in the lithographic projection apparatus, thermal expansion of components of the lithographic projection apparatus, etc. Adjusting the wavefront and intensity distribution and/or phase shift can change values of the characteristics represented by the cost function. Such changes can be simulated from a model or actually measured. The design variables can include parameters of the wavefront manipulator.

[0084] The design variables may have constraints, which can be expressed as (z_1; z₂, ••• , z_N) G Z, where Z is a set of possible values of the design variables. One possible constraint on the design variables may be imposed by a desired throughput of the lithographic projection apparatus. Without such a constraint imposed by the desired throughput, the optimization may yield a set of values of the design variables that are unrealistic. For example, if the dose is a design variable, without such a constraint, the optimization may yield a dose value that makes the throughput economically impossible. However, the usefulness of constraints should not be interpreted as a necessity. For example, the throughput may be affected by the pupil fill ratio. For some illumination designs, a low pupil fill ratio may discard radiation, leading to lower throughput. Throughput may also be affected by the resist chemistry. Slower resist (e.g., a resist that requires higher amount of radiation to be properly exposed) leads to lower throughput.

[0085] As used herein, the term “process model” means a model that includes one or more models that simulate a patterning process. For example, a process model can include any combination of: an optical model (e.g., that models a lens system/proj ection system used to deliver light in a lithography process and may include modelling the final optical image of light that goes onto a photoresist), a resist model (e.g., that models physical effects of the resist, such as chemical effects due to the light), an optical proximity correction (OPC) model (e.g., that can be used to make masks or reticles and may include sub-resolution assist features (SRAFs), etc.).

[0086] As used herein, the term “concurrently” means that two or more things are occurring at approximately, but not necessarily exactly, at the same time. For example, varying a pupil design concurrently with a mask pattern can mean making a small modification to a pupil design, then making a small adjustment to a mask pattern, and then another modification to the pupil design, and so on. However, the present disclosure contemplates that in some parallel processing applications, concurrency can refer to operations occurring at the same time, or having some overlapping in time. [0087] The present disclosure provides apparatuses, methods and computer program products which, among other things, relate to modifying or optimizing features of a lithography apparatus in order to increase performance and manufacturing efficiency. The features that can be modified can include an optical spectrum of light used in the lithography process, a mask, a pupil, etc. Any combination of these features (and possibly others) can be implemented in order to improve, for example, a depth of focus, a process window, a contrast, or the like, of a lithography apparatus. In some embodiments, modification of one feature affects the other features. In this way, to achieve the desired improvements, multiple features can be concurrently modified/varied, as described below.

[0088] Figure 3 depicts an exemplary method 300 for simulating a lithography process using a M3D mask image model, the M3D mask image generated according to an embodiment of the present disclosure. The operations of method 300 presented below are intended to be illustrative. In some embodiments, method 300 may be accomplished with one or more additional operations not described, and/or without one or more of the operations discussed. Additionally, the order in which the operations of method 300 are illustrated in Figure 3 and described below is not intended to be limiting. In some embodiments, one or more portions of method 300 may be implemented (e.g., by simulation, modeling, etc.) in one or more processing devices (e.g., one or more processors). The one or more processing devices may include one or more devices executing some or all of the operations of method 600 in response to instructions stored electronically on an electronic storage medium. The one or more processing devices may include one or more devices configured through hardware, firmware, and/or software to be specifically designed for execution of one or more of the operations of method 300, for example.

[0089] At an operation 302, a design layout is obtained. The design layout may be the design layout 33 of Figure 2. The design layout may be in the form of a polygon-based hierarchical data file in the GDS or OASIS format. The design layout may be a mask pattern — that is, a pattern which is to be printed on a patterning device for projection on a substrate prepared for patterning (e.g., a resist). The design layout may include information about multiple layers. The design layout may include information about three-dimensional shapes of features contained in the design layout (e.g., both arial and depth information). The design layout may correspond to a continuous transmission mask (CTM) optimization, a polygon optimization, a Manhattanized optimization, etc. The design layout may include one or more assist features.

[0090] At an operation 304, a thin mask image for the mask pattern of the design layout may be created. The thin mask image may be create by any appropriate means, such as by use of a transmission function, including an effective transmission function or any appropriate approximation thereof. Other representations of the design layout (e.g., in addition to or instead of the thin mask image) may also be created. In some embodiments, one or more representations corresponding to various features of the design layout may be created. The representations may be created based on the design layout (e.g., from GDSII information) or based on the thin mask image (e.g., based on selecting patterns present in the thin mask image generated for the design layout). For example, an areal representation of the mask pattern may be generated, where the areal representation may be a two-dimensional map of the areas of the mask (or areas of the thin mask image) which are absorptive and transmissive. The areal representation may have binary values, such as 0 for transmissive regions and 1 for absorptive regions (e.g., regions of mask features). In another example, a representation of the horizontal lines of the mask pattern may be created. The representation of the horizontal lines may be a two-dimensional map of the horizontal (e.g., substantially along a major axis of the design layout) edges of the absorbative features of the design layout (or of the horizontal lines present in the thin mask image). The representation of the horizontal lines may have binary values, such as 0 for the absence of a horizontal edge and 1 for the presence of a horizontal edge. Likewise, a representative of the vertical lines of the mask pattern (from either the design layout or based on the thin mask image) may be created. The horizontal and vertical axes of the design layout may be defined relative to the design layout, the primary axes, the lithography apparatus, etc. The horizontal and vertical axes may be orthogonal. The horizontal and vertical axes may be defined in the plane of the patterning device (e.g., perpendicular to the angle of incident illumination).

[0091] At an operation 306, a M3D mask image of the mask pattern is generated based on a corresponding thin mask image. A mask image may be represented in the form of a two-dimensional image (e.g., having pixels along two planar dimensions) in which pixel values may correspond to an intensity of an electric field. The mask image may be converted to an EM field (e.g., an electric field having vector values along two planar dimensions) and vice versa. The M3D mask image may be generated based on the corresponding thin mask image and one or more representation of the design layout, such as the areal representation, the representation of the horizontal lines, and the representation of the vertical lines, as described above. According to an embodiment of the present disclosure, the M3D mask image may be generated (e.g., approximated) by use of a machine learning model, such as a neural network, convolutional neural network, deep convolutional neural network, etc. The M3D mask image may be generated by an ensemble of machine leaning models, including any appropriate combinations thereof. The M3D mask image is generated based on a first sub-model, which produces lower order components of the M3D mask image, and based on a second sub-model, which produces higher order components of the M3D mask image, the outputs of the first sub-model and the second sub-model being combined to create a resultant M3D mask image. The first submodel and the second sub-model are described below in conjunction with Figure 4.

[0092] A M3D mask image may be a mask image which accounts for, at least partially, three- dimensional effects in the mask. Three-dimensional effects may include effects which are caused by a non-infinite (with respect to a wavelength of transmitted light) thickness, which may include intensity and phase effects. As used herein “thin mask” may describe an object (e.g., approximation, transmission-function, mask image, etc.) for which a mask is approximated as a perfectly transmissive (e.g., for void areas) and perfectly absorptive (e.g., for filled areas) shaper of illumination (such as coherent illumination emitted substantially perpendicular to the face of the mask). For a thin mask, the transmission function may be approximately a Kirchoff scalar approximation of the mask pattern. As used herein “M3D” may describe an object (e.g., approximation, transmission-function, mask image, etc.) for which at least some of the interaction of illumination with edges of the mask pattern (e.g., edges between void and fill features) and diffractions (e.g., at interfaces within the mask stack) are accounted for. M3D objects may account for distortions caused by physical materials of the mask, which may not be accounted for in thin mask objects. M3D objects may be particularly valuable for modeling when mask features are of comparable (or smaller) dimensions as the wavelength of illumination used for patterning and for mask absorber features of thickness comparable to (or smaller than) the wavelength of illumination used for patterning. M3D objects may account for both intensity and phase effects of patterning devices (e.g., masks). The M3D mask image may be a transmission function, a near field approximation, etc.

[0093] According to an embodiment of the present disclosure, the M3D mask image may be generated based on one or more machine learning model trained based on M3D mask images generated in any other appropriate manner. For example, ground truth data may be generated and used to train the first sub-model or the second sub-model, as will be described below in reference to Figure 5. In one or more embodiments, a M3D mask image may be generated by methods including rigorous simulation, including by using a finite-discrete-time-domain (FDTD) algorithm, a rigorous- coupled waveguide analysis (RCWA), or any other appropriate method. A M3D mask image, such as for use in generating training data, may be generated by any appropriate method, such as described in U.S. Patent No. 7,703,069, U.S. Patent No. 8,352,885, U.S. Patent No. 8,589,829, U.S. Patent No. 8,938,694, U.S. Patent No. 9,372,957, U.S. Patent 10,198,549, U.S. Patent 10,839,131, and U.S. Patent No. 11,461,532, which are incorporated herein by reference in their entirety.

[0094] According to another embodiment of the present disclosure, the M3D mask image may also be generated based on a density map, which may be map of feature density (such as aerial feature density) for the design layout. The density map may be determined based on the design layout, the mask image, etc. The density may be determined based on the design layout, or obtained (e.g., from storage, from a model, etc.) with or based on the design layout. [0095] In an operation 308, an optical model of a lithography tool (e.g., an exposure tool) and a resist model (or another process model) are obtained. The lithography process may be any type of lithography process, including UV, EUV, etc. The lithography process may be characterized by the design layout and a set of exposure settings. The set of exposure settings may be instrumentation settings, material properties, customer settings, etc. and may include ranges, optimal values, stochastic effects, etc.

[0096] At an operation 310, an aerial image may be simulated using the optical model and the M3D mask image according to the embodiment. The aerial image may be the aerial image 36 of Figure 2. [0097] At an operation 312, a resist pattern (or another process step) on the substrate prepared for patterning may be modeled, based on the aerial image. The resist pattern may be the resist image 38 of Figure 2.

[0098] As described above, method 300 (and/or the other methods and systems described herein) is configured to simulate a lithographic process using a M3D mask image generated by a machine learning (ML) model, according to one or more embodiments of the present disclosure.

[0099] Figure 4 is an exemplary method 400 of M3D mask image modeling with a machine learning (ML) model component to determine the higher order component of the M3D mask image. In the illustrated embodiment of the present disclosure, the model 430 is operable to generate a M3D mask image from input of a thin mask image. The model 430 includes two sub-models, one configured to generate an approximation (e.g., linear component or lower order components) of a M3D mask image, and the other configured to generate the residual (e.g., non-linear components or higher order component) of the M3D mask image. As referred to herein, “residual” may be a difference between two images. The residual may be added to a first image to determine a second image, or determined based on a difference between two images. For example, for a ground truth M3D mask image A and an approximated (e.g., linear approximation) M3D mask image B, the residual may be computed as a two-dimensional version of A-B=C. In order to reconstruct a ground truth M3D mask image (e.g., A), a residual M3D mask image (e.g., C) may be added to an approximated M3D mask image (e.g., B), where A=B+C. The residual may be determined by a different means than the approximation — for example, in embodiments of the present disclosure, the approximation may be determined by a first sub-model and the residual may be determined by a second sub-model — which may allow the residual to account for effects of different orders, smaller sizes, etc. which may compensate for deficits in the approximation. The residual may represent smaller effects than accounted for in the approximation (e.g., of a higher order, of a smaller magnitude, etc.). By adjusting the division between the approximation and the residual (e.g., the ratio of B to C), such as in the training data, in some embodiments the model may be tuned to account for various effects in either sub-model.

[00100] The model 430 may be an ensemble model. The model 430 may be a set (for example, a pair) of models. The model 430 may operate based on an input 402. The input 402 may be any type of representation of a mask pattern. The input 402 may include one or more representation 404 of a mask pattern. The representation 404 of a mask pattern may be a subset of the mask pattern, determined based on the mask pattern (e.g., extracted from the mask pattern), etc. The representation 404 of the mask pattern may be multiple different representations of the mask pattern, including different types of representations. The representation 404 of the mask pattern may include a thin mask image 406. For example, the representation 404 of the mask pattern may include an areal representation 408a (e.g., a representation of the area of the absorptive regions of the mask), a horizontal edge representation 408b (e.g., a representation of the horizontal edges of the absorptive regions of the mask), a vertical edge representation 408c (e.g., a representation of the vertical edges of the absorptive regions of the mask). The input 402 may be fed to the model 430, including concurrently, sequentially, etc.

[00101] The model 430 may include a first sub-model 420 and a second sub-model 422, wherein “first” and “second” are arbitrary descriptors applied for ease of description and place no ordinal, size, or importance restrictions on said sub-models. In other areas of the present disclosure, the models may be referred to by other terms, including other ordinal terms (e.g., the first sub-model 420 may be referred to as a second model while the second sub-model 422 may be referred to as a first model). The first sub-model 420 and the second sub-model 422 may operate sequentially, in parallel, synchronously, asynchronously, etc. In some embodiments, the operations performed by the first submodel 420 may be performed by an operation, such as producing lower order components from memory, in addition to or instead of a model. The first sub-model 420 and the second sub-model 422 may be independent models — that is, models which function without reference to one another, such as on their own. The output results from the two models can be combined to form a resultant M3D mask image.

[00102] In some embodiments, given a mask pattern, the first sub-model is operable to generate the lower order component of the M3D mask image and second sub-model is operable to generate the residual (e.g., the higher order component) of the M3D mask image. In some embodiments, the first- sub model can generate the linear approximation of the M3D mask image only. In some other embodiments, the lower order output accounts for second or third order approximation as well, which may depend on the training data as described below. In some embodiments, lower order component may account for M3D effect of single edge or isolated edges and area transmission, while the higher order component may account for inter-edge M3D effect.

[00103] In some embodiments, the training data (first training data) for the first sub-model can be generated through rigorous training or a calibrated empirical model. For example, the first training data can be ground truth M3D mask image generated from rigorous simulation on first mask patterns with large CDs that cause minimal inter-edge M3D effect. The first mask patterns, used to generate the lower order component training data, may be spatially distant from each other.

[00104] In some embodiments, the training data (second training data) for the second sub-model can be generated through rigorous training or a calibrated empirical model. For example, the second training data can be ground truth M3D mask image generated from rigorous simulation on second mask patterns with significant inter-edge M3D effect. For example, the second mask patterns, used to generate the higher order component training data, may have smaller CDs and higher pattern density than the first mask patterns.

[00105] The first sub-model 420 may be a M3D mask image approximator. For example, the first sub-model 420 may be a linear model. The first sub-model 420 may be a physical model. The first sub-model 420 may determine an approximation 452 of the M3D mask image based on the input 402. The approximation 452 may be a linear approximation, such as described in U.S. Patent No. 7,703,069, U.S. Patent No. 8,352,885, U.S. Patent No. 8,589,829, U.S. Patent No. 8,938,694, U.S. Patent No. 9,372,957, U.S. Patent 10,198,549, U.S. Patent 10,839,131, and U.S. Patent No.

11,461,532, which are incorporated herein by reference in their entirety. The approximation 452 may be a first order approximation. The approximation 452 may be or include a second-order or third- order approximation. The approximation 452 may include various order components, where the order of the components included may depend on the training data selected to train the first sub-model 420. The approximation 452 may include various order components, where the order of components included may depend on the type of model selected for the first sub-model 420. For example, if the first sub-model is a mathematically linear model, it may not account for second order or higher components even if such components are present in its training data. The first sub-model 420 may be a machine learning model, a physical mode, a machine learning model with physics-inspired terms, etc. The first sub-model 420 may be a trained model, including a model in which physical parameters (e.g., corresponding to optical effects) are trained.

[00106] The second sub-model 422 may be a M3D mask image residual model. The second submodel 422 may be a machine learning (ML) model. The second sub-model 422 may generate residual 454 of the M3D mask image based on the input 402. The residual 454 may be or include higher order components of the M3D mask image. In some embodiments, the residual 454 may include components (for example, some second order components) where other components of the same order (for example, e2e effects) may be include in the approximation 452.

[00107] The residual 454 may include diffraction effects which are non-linear, associated with M3D. The second sub-model 422 may be a neural network. The second sub-model 422 may be a convolutional neural network. The second sub-model 422 may be a deep convolutional neural network. The second sub-model 422 may have one or more encoding steps. The second sub-model 422 may have one or more decoding steps. The second sub-model 422 may have a bottleneck step. The second sub-model 422 may have one or more nonlinear activation layers. The second sub-model 422 may contain batch normalization. The second sub-model 422 may include downsampling. The second sub-model 422 may be a fully convolutional neural network. The second sub-model 422 may be any appropriate ML model, including an ensemble of machine learning models. [00108] The model 430 may generate an output 440 which is a modeled M3D mask image 460. The output 440 may be a combination (e.g., a sum) of the approximation 452 and the residual 454 of the M3D mask image. The approximation 452 of the M3D mask image may not fully approximate the M3D mask image. That is, the approximation 452 of the M3D mask image may not account for higher order effects present in a “ground truth” 470 M3D mask image. The residual 454 of the M3D mask image may account for higher order effects in the ground truth 470 M3D mask image, but may not account for (or model efficiently) linear effects in the ground truth 470 M3D mask image. By using both the first sub-model 420 and the second sub-model 422, the output 440 of the modeled M3D mask image 460 may be closer to the ground truth 470 M3D mask image than an output obtained by either sub-model alone. The output 440 of the model 430 may be obtained more quickly (e.g., based on use of the first sub-model 420 and the second sub-model 422) than the ground truth 470 data may be obtained. The ground truth 470 data for the M3D mask image may be generated using any algorithm, method, process, etc. that is well known in the art, such as using a rigorous 3D electromagnetic field (EMF) solver, FDTD, RCWA, and any methods described in in U.S. Patent No. 7,703,069, U.S. Patent No. 8,352,885, U.S. Patent No. 8,589,829, U.S. Patent No. 8,938,694, U.S. Patent No. 9,372,957, U.S. Patent 10,198,549, U.S. Patent 10,839,131, and U.S. Patent No.

11,461,532, which are incorporated herein by reference in their entirety. The ground truth 470 data for the M3D mask image may be obtained through rigorous simulation (e.g., by the above-described methods), which may be a relatively slower or more computationally intensive process than the use of the model 430.

[00109] In some embodiments, using the model 430 to generate the approximation 452 and the residual 454 may be faster than using rigorous simulation to obtain the approximation 452, the residual 454, or a combination thereof of the M3D mask image. In some embodiments, using the second sub-model 422 to obtain the residual 454 may be more accurate than using a single model to obtain an approximation of the M3D mask image. In some embodiments, using the second sub-model 422 to obtain the residual 454 and the first sub-model 420 to obtain the approximation 452 may be more accurate than generating a M3D mask image approximation based on a single model. The combination of the first sub-model 420 and the second sub-model 422 may be more predictable, more accurate, experience less erosion, be more accurate for non-periodic mask patterns, etc.

[00110] In some embodiments, the model 430 may generate metrology mask contours 472, such as instead of or in addition to a M3D mask image. In some embodiments, the model 430 may generate M3D mask images (such as the approximation 452 and the residual 454) which account for fabrication effects of masks in the M3D mask image. For example, shot noise, quantum effects, etc. may cause fabricated masks to vary from mask patterns. The model 430 may be trained to (such as by training based on measured SEM images or metrology mask contours therefrom) account for variations in masks-as-fabricated based on mask patterns. In some embodiments, the model 430 (or another model) may generate metrology mask contours 472 such as would be found in SEM images of masks fabricated based on a given mask pattern. The model 430 may account for the effects of fabrication on masks when determining an M3D mask image. In some embodiments, the model 430 may operation upon metrology mask contours 472 for a given mask pattern, such as instead of or in addition to input 402.

[00111] Figure 5 is a schematic overview 500 of training a machine learning (ML) model to determine M3D mask image residuals for M3D mask image modeling. The first sub-model 420 may be trained based on a set of training data containing various mask patterns and corresponding ground truth 570 M3D mask images. The mask patterns of the set of training data may be representations of mask patterns, such as input 402, representation 404, thin mask image 406, areal representation 408a, horizontal edge representation 408b, vertical edge representation 408c (as previously described in reference to Figure 4), or any other appropriate mask pattern or mask pattern representation. The set of training data may be input to a M3D mask image approximation training module 520. The training module 520 may train the first sub-model 420 in any appropriate manner, such as through supervised training, through adjustment of linear relationships or parameters, etc. In some embodiments, the first sub-model may have a predefined model form and the model parameters are adjusted or trained through the training process. In some other embodiments, the first sub-model may hve both the model form and model parameters trained. The first sub-model 420 may be trained until a training criterion is reached, such as a number of iterations, a variance on a validation set, an accuracy level, training loss threshold, etc. Once the first sub-model 420 is trained, the first sub-model 420 may be used to generate an approximation 452 of one or more M3D mask images. The ground truth 570 M3D mask images which are part of the set of training data used to train the first sub-model 420 may be generated by any appropriate method. The ground truth 570 M3D mask images may be generated by rigorous simulation. The ground truth 570 M3D mask images may be effective near fields.

Effective near fields (e.g., electric fields generated by transmitted illumination) may be obtained by applying an effective transmission function — that is, an approximation of the transmission function or near field such as described in U.S. Patent 10,839,131, which is herein incorporated by reference in its entirety. The ground truth 570 M3D mask images may be obtained through step which include approximation. That is, the ground truth 570 M3D mask images may be effective near fields, or approximated in any other appropriate manner. The set of training data used to train the first submodel 420 may contain patterns which are spatially distant from each other, such that higher order effects are minimized in a M3D mask image for the mask patterns.

[00112] The second sub-model 422 may be trained based on a set of training data containing various mask patterns and corresponding ground truth 580 M3D mask image residuals. The ground truth M3D mask images may be used to generate the ground truth 580 M3D mask image residuals, but are different from there. Herein, for training purposes, “ground truth” refers to a best (as is reasonable in time, expense, etc.) estimate of a quantity which is obtained and used to train a model to reproduce said quantity. The mask patterns of the set of training data may be the same or different mask patterns than included in the set of training data used to train the first sub-model 420. The mask patterns of the set of training data may be representations of mask patterns, such as input 402, representation 404, thin mask image 406, areal representation 408a, horizontal edge representation 408b, vertical edge representation 408c (as previously described in reference to Figure 4), or any other appropriate mask pattern or mask pattern representation. The set of training data used to train the second sub-model 422 may contain patterns which are closely spaces, such that higher order effects are expected to be present (for example, are maximized) in a M3D mask image for the mask patterns. For example, the mask patterns for the second sub-model 422 may have smaller CDs and higher pattern density than the mask patterns used to train the first sub-model 420.

[00113] The ground truth 580 M3D mask image residuals may be generated based on the first submodel 420. For example, the ground truth 580 M3D mask image residuals may be generated based on a difference between ground truth 570 M3D mask images and approximations 452 of the M3D mask image, such as generated by the first sub-model 420. A residual determiner 550 may operate based on a set of ground truth 570 M3D mask images and corresponding approximations 452 of the M3D mask images to generated ground truth 580 M3D mask image residuals. The set of training data may be input to a M3D mask image residual training operation 522. The training operation 522 may train the second sub-model 422 in any appropriate manner, such as through use of a loss function, use of an optimization function, use of a cost function, through gradient descent, backpropagation, etc. The training operation 522 may train parameters of the second sub-model 422. The trainer may train a configuration of the second sub-model 422. For example, the training operation 522 may determine a number of layers, a size of a convolution, etc. of the second sub-model 422. The training operation 522 may determine a dimensionality of an output of the second sub-model 422. For example, the training operation 522 may determine a size of the output of the second sub-model 422 to be the size of the input to the second sub-model 422 (which may be the size of the ground truth 570 M3D mask image). The training operation 522 may determine a number of residual blocks to be included in the second sub-model 422.

[00114] The training operation 522 may train the second sub-model 422 in any appropriate manner, such as through supervised training. The second sub-model 422 may additionally (or instead) be based on a pre-trained model, which the training operation 522 may adjust. The second sub-model 422 may additionally (or instead) be based on a pre-selected architecture, which may or may not be pre-trained, and which the training operation 522 may adjust. The second sub-model 422 may be trained until a training criterion is reached, such as a number of iterations, a variance on a validation set, an accuracy level, training loss threshold, etc. Once the second sub-model 422 is trained, the second sub-model 422 may be used to generate a residual 454 of one or more M3D mask images. The second sub-model 422 may be trained in concert with the training of the first sub-model 420. The output of the second sub-model 422 may be evaluated (e.g., for detection of a termination criterion) against the ground truth 580 M3D mask image residuals. The output of the second sub-model 422 may be added to the output of the first sub-model 420 and the sum (e.g., the resultant M3D mask image) may be evaluated (e.g., for detection of a termination criterion) against the ground truth 570 M3D mask images.

[00115] The ground truth 580 M3D mask image residuals which are part of the set of training data used to train the second sub-model 422 may be generated by any appropriate method. The ground truth 580 M3D mask image residuals may be generated by rigorous simulation, including rigorous simulation of the ground truth 580 M3D mask images. The ground truth 580 M3D mask image residuals may be effective near fields or based on effective near fields.

[00116] Figure 6 illustrates an exemplary method 600 for determining a M3D mask image using a machine learning (ML) model, according to an embodiment of the present disclosure. The operations of method 600 presented below are intended to be illustrative. In some embodiments, method 600 may be accomplished with one or more additional operations not described, and/or without one or more of the operations discussed. Additionally, the order in which the operations of method 600 are illustrated in Figure 6 and described below is not intended to be limiting. In some embodiments, one or more portions of method 600 may be implemented (e.g., by simulation, modeling, etc.) in one or more processing devices (e.g., one or more processors). The one or more processing devices may include one or more devices executing some or all of the operations of method 600 in response to instructions stored electronically on an electronic storage medium. The one or more processing devices may include one or more devices configured through hardware, firmware, and/or software to be specifically designed for execution of one or more of the operations of method 600, for example. [00117] At an operation 602 a mask pattern is obtained. The mask pattern may be any appropriate pattern, such as GDSII pattern, a design layout, a target design, a continuous transmission mask (CTM) pattern, a Manhattanized pattern, a set of metrology mask contours, a measured mask pattern determined based on a fabricated mask, etc. The mask pattern may be a representation of the mask pattern. The representation of the mask pattern may be a thin mask image. The representation of the mask pattern may include a representation of the areal shape of the mask pattern. The representation of the mask pattern may include a representation of the horizontal edges of the mask pattern. The representation of the mask pattern may include a representation of the vertical edges of the mask pattern. The representation of the mask pattern may include any appropriate representation, include a representation identifying features (e.g., features at the critical dimension) for feature engineering. The mask pattern may include a density map. The density map may be a measure of density of features on the mask pattern.

[00118] At an operation 604, an approximation of the M3D mask image for the mask pattern is obtained. The approximation may be a linear approximation. The approximation may be a lower resolution M3D mask image — which may be of a resolution between a thin mask image and a ground truth M3D mask image. The approximation may be limited in resolution by computational cost (e.g., processing power, time constraints, etc.). The approximation may be obtained from a physical model. The approximation may be obtained from a trained model, such as as-described in reference to Figures 4 and 5. The approximation may be obtained from a model operating on the mask pattern or representations of the mask pattern. The approximation may be obtained from a measured lithography result, a measured mask transmission function, etc.

[00119] At an operation 606, a residual of the M3D mask image for the mask pattern is generated by using an exemplary high order M3D mask model. The residual may be a higher order approximation of the M3D mask image. The residual may represent a difference between the approximation and a ground truth M3D mask image. The residual may be of a smaller size (e.g., in intensity, in contributing pixels, etc.) than the approximation. For example, the approximation may represent 90% of a ground truth M3D mask image, while the residual may contribute to modeling the remaining 10% of a ground truth M3D mask image. The residual may have positive and negative contributions (e.g., in intensity, in phase, etc.) to the approximation of the M3D mask image.

[00120] The residual of the M3D mask image may be generated in any appropriate manner, such as those described in reference to the second sub-model 422 of Figures 4 and 5. The residual of the M3D mask image may be generated by a ML model. The residual of the M3D mask image may be generated by the ML model operating on the mask pattern. The residual of the M3D mask image may be generated by the ML model operation on representation(s) of the mask pattern. The residual of the M3D mask image may be generated by a neural network, a convolutional neural network, a deep convolutional neural network, etc.

[00121] At an operation 608, the approximation of the M3D mask image and the residual of the M3D mask image may be combined. The combination may be additive. The combination may be multiplicative. The combination may be any appropriate mathematical operation. The combination may, in some embodiments, be convolutional. The approximation of the M3D mask image and the residual of the M3D mask image may have the same dimensionality. The approximation of the M3D mask image and the residual of the M3D mask image may have different dimensionality. The approximation of the M3D mask image and the residual of the M3D mask image may be combined using a ratio, including a ratio which may be trained to normalize or otherwise adjust relative contributions of the approximation and the residual. The approximation of the M3D mask image and the residual of the M3D mask image may be a resultant M3D mask image.

[00122] At an operation 610, the resultant M3D mask image may be output. The resultant M3D mask image may be output to storage. The resultant M3D mask image may be output to a process model — for example, a lithography process model, an optical model, a resist model, etc. The resultant M3D mask image may be output in any appropriate manner. The resultant M3D mask image may be compared against a ground truth M3D mask image, such as during training, to determine if further training is needed, etc. The resultant M3D mask image may be compared to results of a process, such as lithography, development, etch, etc., including using a process model. [00123] As described above, method 600 (and/or the other methods and systems described herein) is configured to determine a M3D mask image using a machine learning (ML) model.

[00124] Figure 7 illustrates an exemplary method 700 for training a machine learning (ML) model to generate M3D mask image residuals. The operations of method 700 presented below are intended to be illustrative. In some embodiments, method 700 may be accomplished with one or more additional operations not described, and/or without one or more of the operations discussed. Additionally, the order in which the operations of method 700 are illustrated in Figure 7 and described below is not intended to be limiting. In some embodiments, one or more portions of method 700 may be implemented (e.g., by simulation, modeling, etc.) in one or more processing devices (e.g., one or more processors). The one or more processing devices may include one or more devices executing some or all of the operations of method 700 in response to instructions stored electronically on an electronic storage medium. The one or more processing devices may include one or more devices configured through hardware, firmware, and/or software to be specifically designed for execution of one or more of the operations of method 700, for example.

[00125] At an operation 702, a training data set is obtained. The training data set may contain multiple mask patterns. The mask patterns may be any appropriate pattern, such as previously described in reference to the operation 602 of Figure 6. The mask patterns may be representation of mask patterns, such as previously described in reference to Figures 4-6. The training data set may contain M3D mask images corresponding to the mask patterns. The M3D mask images may be ground truth M3D mask images. As used herein, “ground truth” may refer to a best, to within reason, measurement, approximation, or model of a quantity. For example, ground truth quantities may be limited by instrument resolution (e.g., for measured quantities), computational power or time (e.g., for modeled or derived quantities), uncertainties in various parameters (e.g., for modeled quantities, such as generated by an optical model), etc. A ground truth quantity may represent a best guess, approximation, etc. for that quantity. The accuracy of the ground truth may be limited, especially for quantities which cannot be directly measured. A ground truth quantity may be a “gold standard” quantity or other equivalent terms may be used. Using “ground truth” does not require that the term so described is 100% accurate, only that the quantity is a best or reasonable approximation, such as may be determined by a reasonable expenditure of resources, which is not to say that a ground truth obtained by a large expenditure of resources is disclaimed.

[00126] At an operation 704, residuals of the M3D mask images for the training data set are obtained. The residuals of the M3D mask images may be obtained based on a trained approximation model. The residuals of the M3D mask images may be obtained based on a physical model. The residuals of the M3D mask images may represent a difference between an approximation of the M3D mask images and the ground truth M3D mask images. The residuals of the M3D mask images may be obtained using any appropriate approximation of the M3D mask images. The approximation of the M3D mask images may be a linear approximation. The residuals of the M3D mask images may be ground truth residuals of the M3D mask images.

[00127] The residuals of the M3D mask images may be desired to be as small as possible. That is, it may be desired that the approximation of the M3D mask image is as close to the ground truth M3D mask image as reasonably (e.g., based on model complexity, computational time, etc.) possible. In such an embodiment, the process which generates the approximation may be highly trained, computational rigorous, etc. A model (e.g., the first sub-model) may be trained based on a relatively small threshold for a training loss in order to provide an accurate-as-possible approximation of the M3D mask image. In some embodiments, the residuals of the M3D mask images may not be desired to be as small as possible. That is, it may be desired that the residual of the M3D mask image accounts for a larger portion of the M3D mask image than is strictly required. This may be the case for a ML model (e.g., the second sub-model) which is capable of accounting for at least some linear effects in the M3D mask image model. In such an embodiment, the process which generates the approximation may be purposefully trained to a less exacting standard than in the above-described embodiment. In some embodiments, the division between what is accounted for in the approximation and what is accounted for in the residual of the M3D mask image may be adjustable.

[00128] At an operation 706, a model is trained to obtain the approximation of a M3D mask image based on an input mask pattern. The input may be one or more representation of a mask pattern. The model is trained based on the set of training data obtained in the operation 702. The model may be trained in any appropriate manner, such as those described in reference to the training module 520 of Figure 5. Alternatively, or additionally, a pre-trained model, including a physical model, may be obtained.

[00129] At an operation 708, residuals of the M3D mask image are determined. The residuals of the M3D mask image may be determined based on the output of the trained model trained in the operation 706 or based on the output of any appropriate model. The residuals of the M3D mask image may be ground truth M3D mask image residuals. The residuals of the M3D mask image may be the difference between an appropriate approximation of the M3D mask image and any appropriate ground truth M3D mask image. The obtained residuals, which may be ground truth M3D mask image residuals, may be included in a training data set for training the ML model.

[00130] At an operation 710, a ML model is trained to generate residuals of a M3D mask image for an input mask pattern. The input may be one or more representation of a mask pattern. The ML model is trained based on the set of training data obtained in the operation 708. The model may be trained in any appropriate manner, such as those described in reference to the training operation 522 of Figure 4. Alternatively, or additionally, a pre-trained model, may be obtained, including a pretrained model which is further trained. The trained ML model may be output, including to storage. The architecture of the ML model and parameters thereof may be stored. The trained ML model may be used to determine one or more residual M3D mask images based on input mask patterns (or representations of mask patterns), such as during a lithography simulation.

[00131] As described above, method 700 (and/or the other methods and systems described herein) is configured to train a machine learning (ML) model to generate M3D mask image residuals.

[00132] Figure 8 illustrates an exemplary method 800 for training or calibrating a model based on scanning electron microscopy (SEM) images of fabricated masks. The model can be in any form, including a machine learning model, rigorous physical model, semi-physical model, empirical or semi empirical model, etc. The operations of method 800 presented below are intended to be illustrative. In some embodiments, method 800 may be accomplished with one or more additional operations not described, and/or without one or more of the operations discussed. Additionally, the order in which the operations of method 800 are illustrated in Figure 8 and described below is not intended to be limiting. In some embodiments, one or more portions of method 800 may be implemented (e.g., by simulation, modeling, etc.) in one or more processing devices (e.g., one or more processors). The one or more processing devices may include one or more devices executing some or all of the operations of method 800 in response to instructions stored electronically on an electronic storage medium. The one or more processing devices may include one or more devices configured through hardware, firmware, and/or software to be specifically designed for execution of one or more of the operations of method 800, for example.

[00133] At an operation 804, an SEM image of a mask fabricated based on the mask pattern is obtained. The SEM image may be obtained in any appropriate manner. The mask may be fabricated in any appropriate manner. The SEM image may be subject to one or more post-processing operations, such as intensity balancing, focusing, geometric rotation, template matching, etc. One or more mask metrology contours may be extracted from the SEM image. The mask metrology contours may be compared to the mask pattern, where the difference between the mask metrology contours and the mask pattern may be due to errors or otherwise non-idealities in the mask fabrication process (for example, defocus, dose effects, etc.) and to quantum effects (e.g., variable photon shot energy effects). [00134] At an operation 806, mask metrology contours extracted from the SEM image of the mask are used in a model. The mask metrology contours may be used to replace the mask pattern. The mask metrology contours may be used in addition to the mask pattern. The mask metrology contours may be used as a representation of the mask pattern.

[00135] In some embodiments, a model may be trained to generate mask metrology contours based on an input mask pattern. The model may be a ML model. The model may be trained, including in a supervised manner, on a training data set containing a set of mask patterns and corresponding SEM images or mask metrology contours extracted therefrom for the mask patterns. In some embodiments, the metrology mask contours may be used (e.g., as measured) directly in other models, such as process models, lithography models, etc. Mask metrology contours are not limited to use in ML models or in ML model trainings, which is not to imply that any other use described herein in limiting. Mask metrology contours may be used in the calibration of physical, semi-physical, etc. models. Mask metrology contours may be fed into optical models, such as to generate aerial images. Mask metrology contours may be fed into mask models, such as to generate mask images (for example, thin mask images, thick mask images, M3D mask images, etc.). Mask metrology contours may be used to calibrate mask fabrication processes and models.

[00136] At an operation 808, a model may be trained to operate based on mask metrology contours. The model of the operation 808 may be different from the model of the operation 806. For example, the model 430 of Figure 4 may be trained to operate based on mask metrology contours instead of or in addition to mask patterns, representations of mask patterns, etc. The training data of the M3D mask image ML model, as described above, may be generated by using mask metrology contours, including measured mask metrology contours, mask metrology contours generated by a model (such as a model trained as described in the operation 806), etc. The training data of the M3D mask image ML model may be trained using mask metrology contours instead of or in addition to OPC generated contours. A process model, lithography model, etc. may be trained, including re-trained, additionally trained, etc., to operate based on one or more mask metrology contours. A model which generates mask metrology contours may be used. A model with generates mask metrology contours may be incorporated into the other model, such as to generate mask metrology contours based on an input mask patterns, which the other model then operates upon as input. Wherever mask patterns are conventionally used, the metrology mask contours (e.g., as measured, as modeled, etc.) may be used to replace OPC-resultant mask patterns (or other mask patterns or models which do not account for mask fabrication effects).

[00137] As described above, method 800 (and/or the other methods and systems described herein) is configured to train a model based on scanning electron microscopy (SEM) images of fabricated masks.

[00138] Figure 9 is a block diagram of an example computer system CS, according to an embodiment. Computer system CS may assist in implementing the methods, flows, or the apparatus disclosed herein. Computer system CS includes a bus BS or other communication mechanism for communicating information, and a processor PRO (or multiple processors) coupled with bus BS for processing information. Computer system CS also includes a main memory MM, such as a randomaccess memory (RAM) or other dynamic storage device, coupled to bus BS for storing information and instructions to be executed by processor PRO. Main memory MM also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor PRO, for example. Computer system CS includes a read only memory (ROM) ROM or other static storage device coupled to bus BS for storing static information and instructions for processor PRO. A storage device SD, such as a magnetic disk or optical disk, is provided and coupled to bus BS for storing information and instructions. [00139] Computer system CS may be coupled via bus BS to a display DS, such as a cathode ray tube (CRT) or flat panel or touch panel display for displaying information to a computer user. An input device ID, including alphanumeric and other keys, is coupled to bus BS for communicating information and command selections to processor PRO. Another type of user input device is cursor control CC, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor PRO and for controlling cursor movement on display DS. This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane. A touch panel (screen) display may also be used as an input device.

[00140] In some embodiments, portions of one or more methods described herein may be performed by computer system CS in response to processor PRO executing one or more sequences of one or more instructions contained in main memory MM. Such instructions may be read into main memory MM from another computer-readable medium, such as storage device SD. Execution of the sequences of instructions contained in main memory MM causes processor PRO to perform the process steps described herein. One or more processors in a multi-processing arrangement may also be employed to execute the sequences of instructions contained in main memory MM. In some embodiments, hard-wired circuitry may be used in place of or in combination with software instructions. Thus, the description herein is not limited to any specific combination of hardware circuitry and software.

[00141] The term “computer-readable medium” as used herein refers to any medium that participates in providing instructions to processor PRO for execution. Such a medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media. Non-volatile media include, for example, optical or magnetic disks, such as storage device SD. Volatile media include dynamic memory, such as main memory MM. Transmission media include coaxial cables, copper wire and fiber optics, including the wires that comprise bus BS. Transmission media can also take the form of acoustic or light waves, such as those generated during radio frequency (RF) and infrared (IR) data communications. Computer-readable media can be non-transitory, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge. Non- transitory computer readable media can have instructions recorded thereon. The instructions, when executed by a computer, can implement any of the features described herein. Transitory computer- readable media can include a carrier wave or other propagating electromagnetic signal.

[00142] Various forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to processor PRO for execution. For example, the instructions may initially be borne on a magnetic disk of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to computer system CS can receive the data on the telephone line and use an infrared transmitter to convert the data to an infrared signal. An infrared detector coupled to bus BS can receive the data carried in the infrared signal and place the data on bus BS. Bus BS carries the data to main memory MM, from which processor PRO retrieves and executes the instructions. The instructions received by main memory MM may optionally be stored on storage device SD either before or after execution by processor PRO.

[00143] Computer system CS may also include a communication interface CI coupled to bus BS. Communication interface CI provides a two-way data communication coupling to a network link NDL that is connected to a local network LAN. For example, communication interface CI may be an integrated services digital network (ISDN) card or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, communication interface CI may be a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links may also be implemented. In any such implementation, communication interface CI sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.

[00144] Network link NDL typically provides data communication through one or more networks to other data devices. For example, network link NDL may provide a connection through local network LAN to a host computer HC. This can include data communication services provided through the worldwide packet data communication network, now commonly referred to as the “Internet” INT. Local network LAN (Internet) both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network data link NDL and through communication interface CI, which carry the digital data to and from computer system CS, are exemplary forms of carrier waves transporting the information.

[00145] Computer system CS can send messages and receive data, including program code, through the network(s), network data link NDL, and communication interface CL In the Internet example, host computer HC might transmit a requested code for an application program through Internet INT, network data link NDL, local network LAN and communication interface CL One such downloaded application may provide all or part of a method described herein, for example. The received code may be executed by processor PRO as it is received, and/or stored in storage device SD, or other nonvolatile storage for later execution. In this manner, computer system CS may obtain application code in the form of a carrier wave.

[00146] The concepts disclosed herein may simulate or mathematically model any generic imaging system for imaging subwavelength features, and may be especially useful with emerging imaging technologies capable of producing increasingly shorter wavelengths. Emerging technologies already in use include EUV (extreme ultraviolet), DUV lithography that is capable of producing a 193nm wavelength with the use of an ArF laser, and even a 157nm wavelength with the use of a Fluorine laser. Moreover, EUV lithography is capable of producing wavelengths within a range of 20-50nm by using a synchrotron or by hitting a material (either solid or a plasma) with high energy electrons in order to produce photons within this range.

[00147] Embodiments of the present disclosure can be further described by the following clauses.

1. A method of lithography simulation comprising: obtaining a lower order component of a mask 3D (M3D) mask image that corresponds to a mask pattern; generating a higher order component of the M3D mask image by using a machine learning (ML) model provided with input corresponding to the mask pattern; and combining the lower order component and the higher order component of the M3D mask image to generate a resultant M3D mask image corresponding to the mask pattern.

2. The method of clause 1 , further comprising: training the ML model to generate output higher order components by using a first set of training data, wherein the first set of training data comprises a set of training mask patterns and training higher order components of training M3D mask images corresponding to the set of training mask patterns.

3. The method of clause 2, wherein the training higher order components of the M3D mask images comprise determined differences between lower order components of the training M3D mask images and the training M3D mask images for the set of training mask patterns.

4. The method of clause 2, wherein the training M3D mask images comprise M3D mask images generated using rigorous simulation on the set of training mask patterns.

5. The method of clause 2, further comprising generating the training higher order component of the training M3D mask image by: obtaining a first training M3D mask image for a first training mask pattern, wherein the first training M3D mask image comprises a linear component and a training high order component; obtaining a linear component of the first training M3D mask image; and determining a training higher order component of the first training M3D mask image based on a difference between the first training M3D mask image and the linear components of the first training M3D mask image for the first training mask pattern.

6. The method of clause 2, wherein the first set of training mask patterns comprises patterns which are closely spaced relative to a wavelength of illumination of a lithography process.

7. The method of clause 2, wherein the first set of training mask patterns comprises patterns which experience at least one of a M3D inter-edge effect and a M3D inter-corner effect.

8. The method of clause 1, wherein obtaining the lower order component of the M3D mask image comprises generating the lower order component of the M3D mask image by using a second model provided with input corresponding to the mask pattern. 9. The method of clause 8, further comprising: training the second model to generate an output lower order components by using a second set of training data, wherein the second set of training data comprises a second set of training mask patterns and training lower order components of second training M3D mask images corresponding to the second set of training mask patterns.

10. The method of clause 9, wherein the second training M3D mask images comprise M3D images generated using rigorous simulation on the second set of training mask patterns.

11. The method of clause 9, wherein the second set of training mask patterns comprises patterns which are widely spaced relative to a wavelength of illumination of a lithography process.

12. The method of clause 9, wherein the second set of training mask patterns comprises patterns which experience at least one of a single edge effect and an area effect.

13. The method of clause 8, wherein the second model comprises a ML model.

14. The method of clause 8, wherein the second model comprises a non-machine learning model.

15. The method of clause 8, wherein the second model is a trained physical effect model.

16. The method of clause 1, wherein the resultant M3D mask image, comprised of the lower order component and the higher order component of the M3D mask image, approximates a full representation of the M3D image.

17. The method of clauses 4 or 10, wherein the rigorous simulation comprises a finite-discrete-time- domain (FDTD) algorithm applied to the mask pattern.

18. The method of clauses 4 or 10, wherein the rigorous simulation comprises a rigorous-coupled waveguide analysis (RCWA) algorithm applied to the mask patterns.

19. The method of clause 3, wherein the training M3D mask images comprise effective near field images for corresponding mask patterns.

20. The method of clause 3, wherein the training M3D mask images comprise M3D mask images generated by a model by using input metrology mask contours corresponding to the mask pattern.

21. The method of clause 20, wherein the metrology mask contours are extracted from images of fabricated masks corresponding to the mask patterns.

22. The method of clause 21, wherein images of fabricated masks comprise scanning electron microscopy (SEM) images of fabricated masks corresponding to the mask patterns.

23. The method of clause 1, wherein the ML model is a neural network.

24. The method of clause 1, wherein the ML model is a convolutional neural network (CNN).

25. The method of clause 1, wherein the ML model is a deep convolutional neural network (DCNN).

26. The method of clause 1 , wherein the input corresponding to the mask pattern comprises a thin mask image.

27. The method of clause 1, wherein the input corresponding to the mask pattern comprises a representation of a mask image.

28. The method of clause 1, wherein the M3D mask image comprises a transmission function. 29. The method of clause 1, wherein the M3D mask image comprises a near field image.

30. The method of clause 1 , wherein the M3D mask image comprises an effective representation of a mask image.

31. The method of clause 2, wherein training the ML model comprises: accessing an untrained ML model; and training model formulation and model parameters of the accessed ML model.

32. The method of clause 1 , further comprising simulating at least part of a lithography process based on the resultant 3D mask image.

33. The method of clause 1, wherein input corresponding to the mask pattern comprises input corresponding to multiple portions of the mask pattern.

34. A method of lithography simulation comprising: obtaining a metrology image of a mask fabricated based on a mask pattern; extracting one or more metrology mask contours from the metrology image of the mask; and simulating at least a portion of a lithography process with an optical model for the mask pattern using the one or more extracted metrology mask contours.

35. The method of clause 34, simulating the lithography process comprises generating a mask image for the mask pattern using the one or more extracted metrology mask contours.

36. The method of clause 34, further comprising: training a model with a set of training data to generate output corresponding to one or more metrology mask contours based on input corresponding to a mask pattern, wherein the set of training data comprises a set of mask patterns and one or more training metrology mask contours extracted from metrology images s corresponding to the set of mask patterns.

37. The method of clause 36, further comprising: generating, with the trained model, output corresponding to one or more metrology mask contours based on input corresponding to a mask pattern.

38. The method of clause 34, further comprising: training a model with a set of training data to generate output corresponding to one or more metrology mask contours based on input corresponding to a mask pattern and a density map for the mask pattern, wherein the set of training data comprises a set of mask patterns, density maps corresponding to the set of mask patterns, and one or more training metrology mask contours extracted from metrology images corresponding to the set of mask patterns.

39. The method of clause 38, further comprising: generating, with the trained model, output corresponding to one or more metrology mask contours based on input corresponding to a mask pattern and a density map for the mask pattern.

40. The method of clauses 38 or 39, wherein the density map corresponds to a pattern density distribution for the mask pattern.

41. The method of any one of clauses 34 to 40, where the model is an ensemble model. 42. A method to generate a machine learning (ML) model for lithography simulation comprising: obtaining a set of training data comprising a set of mask patterns and higher order components of M3D mask image corresponding to the mask patterns; and training a ML model with the set of training data to generate a higher order residual of an M3D mask image based on an input mask pattern.

43. The method of clause 42, further comprising: obtaining a second set of training data comprising a set of mask patterns and M3D mask images corresponding to the mask patterns; and training a second model with the second set of training data to generate a lower order component of a M3D mask image based on the input mask pattern.

44. The method of clause 43, wherein the second set of training data comprises substantially the same mask patterns as the set of training data.

45. The method of clause 43, wherein the set of training data comprises a set of mask patterns which experience inter-edge effect and wherein the second set of training data comprises a set of mask patterns which experience substantially less inter-edge effect than the set of training data.

46. One or more non-transitory, machine-readable medium having instruction thereon, the instructions when executed by a processor being configured to perform the method of any one of clauses 1 to 45. [00148] While the concepts disclosed herein may be used for imaging on a substrate such as a silicon wafer, it shall be understood that the disclosed concepts may be used with any type of lithographic imaging systems, e.g., those used for imaging on substrates other than silicon wafers. In addition, the combination and sub-combinations of disclosed elements may comprise separate embodiments. For example, adding single or multiple assist features as described herein may comprise their own separate embodiments, or they may be included with one or more other embodiments described herein.

[00149] The descriptions above are intended to be illustrative, not limiting. Thus, it will be apparent to one skilled in the art that modifications may be made as described without departing from the scope of the claims set out below.

Claims

CLAIMS:

2. The method of claim 1, further comprising: training the ML model to generate output higher order components by using a first set of training data, wherein the first set of training data comprises a set of training mask patterns and training higher order components of training M3D mask images corresponding to the set of training mask patterns, wherein the ML model comprises a neural network.

3. The method of claim 2, wherein the training higher order components of the M3D mask images comprise determined differences between lower order components of the training M3D mask images and the training M3D mask images for the set of training mask patterns.

4. The method of claim 2, wherein the training M3D mask images comprise M3D mask images generated using rigorous simulation on the set of training mask patterns.

5. The method of claim 2, further comprising generating the training higher order component of the training M3D mask image by: obtaining a first training M3D mask image for a first training mask pattern, wherein the first training M3D mask image comprises a linear component and a training high order component; obtaining a linear component of the first training M3D mask image; and determining a training higher order component of the first training M3D mask image based on a difference between the first training M3D mask image and the linear components of the first training M3D mask image for the first training mask pattern.

6. The method of claim 2, wherein the first set of training mask patterns comprises patterns with at least one of a M3D inter-edge effect and a M3D inter-corner effect.

7. The method of claim 1, wherein obtaining the lower order component of the M3D mask image comprises generating the lower order component of the M3D mask image by using a second model provided with input corresponding to the mask pattern, wherein the second model comprises one of a machine learning model, non-machine learning model, or a trained physical effect model.

8. The method of claim 7, further comprising: training the second model to generate an output lower order components by using a second set of training data, wherein the second set of training data comprises a second set of training mask patterns and training lower order components of second training M3D mask images corresponding to the second set of training mask patterns.

9. The method of claim 8, wherein the second training M3D mask images comprise M3D images generated using rigorous simulation on the second set of training mask patterns.

10. The method of claim 8, wherein the first set of training mask patterns comprises patterns closely spaced relative to a wavelength of illumination of a lithography process, and wherein the second set of training mask patterns comprises patterns which are widely spaced relative to a wavelength of illumination of a lithography process.

11. The method of claim 9, wherein the second set of training mask patterns comprises patterns with at least one of a single edge effect and an area effect.

12. The method of claim 1, wherein the resultant M3D mask image, comprised of the lower order component and the higher order component of the M3D mask image, approximates a full representation of the M3D image.

13. The method of claim 4, wherein the rigorous simulation comprises a finite-discrete-time-domain (FDTD) algorithm or a rigorous-coupled waveguide analysis (RCWA) algorithm applied to the mask pattern.

14. The method of claim 3, wherein the training M3D mask images comprise effective near field images for corresponding mask patterns.

15. The method of claim 1, wherein the input corresponding to the mask pattern comprises a representation of thin mask image, and wherein the M3D mask image is represented as a transmission function, a near field image, or an effective representation of a mask image.