CN119948451A

CN119948451A - Device, method and graphical user interface for interacting with a three-dimensional environment

Info

Publication number: CN119948451A
Application number: CN202380068456.9A
Authority: CN
Inventors: A·E·德多纳托; I·帕斯特拉纳文森特; N·吉特; C·D·麦肯齐; S·O·勒梅; Z·C·泰勒; V·克拉玛尔; B·海拉科; S·S·戴夫; D·艾耶; L·A·海斯丁; M·阿胡贾; N·A·福恩谢尔; C·J·罗姆尼; J·G·洛博费雷拉达席尔瓦; S·M·思班; A·C·戴伊
Original assignee: Apple Inc
Current assignee: Apple Inc
Priority date: 2022-09-24
Filing date: 2023-09-20
Publication date: 2025-05-06
Also published as: AU2023347428A1; EP4577900A1; JP2025534284A; CN120723115A; KR20250049408A

Abstract

When the application user interface is displayed, the device detects a first input to an input device of the one or more input devices, the input device being disposed on a housing of the device that includes the one or more display generating components. In response to detecting the first input, the device replaces display of at least a portion of the application user interface by displaying a main menu user interface via the one or more display generating components. The device detects a second input to the input device disposed on the housing of the device when the main menu user interface is displayed, and cancels the main menu user interface in response to detecting the second input to the input device disposed on the housing of the device.

Description

Apparatus, method and graphical user interface for interacting with a three-dimensional environment

Related patent application

The present application is a continuation of U.S. patent application Ser. No. 18/369,628, filed on 18 th month 9 of 2023, and also claims priority from U.S. patent application Ser. No. 18/369,502, filed on 18 th month 9 of 2023, U.S. patent application Ser. No. 18/369,459, filed on 18 th month 9 of 2023, U.S. patent application Ser. No. 18/369,462, filed on 4 th month 6 of 2023, and U.S. provisional application Ser. No. 63/470,921, filed on 24 th month 9 of 2022.

Technical Field

The present disclosure relates generally to computer systems in communication with a display generation component and one or more input devices that provide a computer-generated experience, including, but not limited to, electronic devices that provide virtual reality and mixed reality experiences via a display.

Background

In recent years, the development of computer systems for augmented reality has increased significantly. An example augmented reality environment includes at least some virtual elements that replace or augment the physical world. Input devices (such as cameras, controllers, joysticks, touch-sensitive surfaces, and touch screen displays) for computer systems and other electronic computing devices are used to interact with the virtual/augmented reality environment. Example virtual elements include virtual objects such as digital images, videos, text, icons, and control elements (such as buttons and other graphics).

Disclosure of Invention

Some methods and interfaces for interacting with environments that include at least some virtual elements (e.g., applications, augmented reality environments including augmented reality environments, mixed reality environments, and virtual reality environments) are cumbersome, inefficient, and limited. For example, providing a system for performing an insufficient channel or mechanism of actions associated with navigating within an augmented reality environment, a system that requires a series of inputs to achieve a desired result in an augmented reality environment, and a system that is complex, cumbersome, and error-prone to virtual object manipulation can place a significant cognitive burden on the user and detract from the feel of the virtual/augmented reality environment. In addition, these methods take longer than necessary, wasting energy from the computer system. This latter consideration is particularly important in battery-powered devices.

Accordingly, there is a need for a computer system with improved methods and interfaces to provide a user with a computer-generated experience, thereby making user interactions with the computer system more efficient and intuitive for the user. Such methods and interfaces optionally complement or replace conventional methods for providing an augmented reality experience to a user. Such methods and interfaces reduce the number, extent, and/or nature of inputs from a user by helping the user understand the association between the inputs provided and the response of the device to those inputs, thereby forming a more efficient human-machine interface.

The above-described drawbacks and other problems associated with user interfaces of computer systems are reduced or eliminated by the disclosed systems. In some embodiments, the computer system is a desktop computer with an associated display. In some embodiments, the computer system is a portable device (e.g., a notebook computer, tablet computer, or handheld device). In some embodiments, the computer system is a personal electronic device (e.g., a wearable electronic device such as a watch or a head-mounted device). In some embodiments, the computer system has a touch pad. In some embodiments, the computer system has one or more cameras. In some implementations, the computer system has a touch-sensitive display (also referred to as a "touch screen" or "touch screen display"). In some embodiments, the computer system has one or more eye tracking components. In some embodiments, the computer system has one or more hand tracking components. In some embodiments, the computer system has, in addition to the display generating component, one or more output devices including one or more haptic output generators and/or one or more audio output devices. In some embodiments, a computer system has a Graphical User Interface (GUI), one or more processors, memory and one or more modules, a program or set of instructions stored in the memory for performing a plurality of functions. In some embodiments, the user interacts with the GUI through contact and gestures of a stylus and/or finger on the touch-sensitive surface, movement of the user's eyes and hands in space relative to the GUI (and/or computer system) or user's body (as captured by cameras and other motion sensors), and/or voice input (as captured by one or more audio input devices). In some embodiments, the functions performed by the interactions optionally include image editing, drawing, presentation, word processing, spreadsheet making, game playing, phone calls, video conferencing, email sending and receiving, instant messaging, test support, digital photography, digital video recording, web browsing, digital music playing, notes taking, and/or digital video playing. Executable instructions for performing these functions are optionally included in a transitory and/or non-transitory computer readable storage medium or other computer program product configured for execution by one or more processors.

There is a need for an electronic device with improved methods and interfaces to interact with a three-dimensional environment. Such methods and interfaces may supplement or replace conventional methods for interacting with a three-dimensional environment. Such methods and interfaces reduce the amount, degree, and/or nature of input from a user and result in a more efficient human-machine interface. For battery-powered computing devices, such methods and interfaces conserve power and increase the time interval between battery charges.

According to some embodiments, a method is performed at a device that includes or communicates with one or more display generating components and one or more input devices. The method includes detecting, when an application user interface is displayed via the one or more display generating components, a first input to an input device of the one or more input devices, the input device being disposed on a housing of the device that includes the one or more display generating components, and in response to detecting the first input to the input device disposed on the housing of the device, replacing a display of at least a portion of the application user interface by displaying a main menu user interface via the one or more display generating components. The method includes detecting a second input to the input device disposed on the housing of the device when the main menu user interface is displayed via the one or more display generating components, and canceling the main menu user interface in response to detecting the second input to the input device disposed on the housing of the device.

According to some embodiments, a method is performed at a computer system that includes or is in communication with a display generation component and one or more input devices. The method includes detecting a first input to an input device of the one or more input devices when the application user interface is displayed via the display generating component, and in response to detecting the first input to the input device, displaying the application user interface via the display generating component in a second display mode in accordance with a determination that the application user interface is in a first display mode, wherein the first display mode includes an immersive mode in which only content of the application user interface is displayed, wherein the second display mode includes a non-immersive mode in which corresponding content and other content of the application user interface are simultaneously displayed, and in accordance with a determination that the application user interface is in the second display mode, replacing display of at least a portion of the application user interface by displaying a main menu user interface via the display generating component.

According to some embodiments, a method is performed at a computer system that includes or is in communication with a display generation component and one or more input devices. The method includes detecting a first input to an input device of the one or more input devices when an application user interface of an application is displayed via the display generating component, and in response to detecting the first input to the input device, displaying a main menu user interface via the display generating component, and in accordance with a determination that the application is currently being shared in a content sharing session, wherein content of the application is concurrently visible to a plurality of participants in the content sharing session, maintaining display of at least a portion of the application user interface while the main menu user interface is displayed, and in accordance with a determination that the application is not being shared in the content sharing session, stopping display of the application user interface.

According to some embodiments, a method is performed at a computer system that includes or is in communication with a display generation component and one or more input devices. The method includes detecting a first input of a first type of input via an input device of the one or more input devices when the computer system is in operation, wherein the first type of input is determined based on a location and/or movement of a first biometric feature, and in response to detecting the first input via the input device, performing a first operation in accordance with the first input. The operation is determined at least in part by first input registration information from a previous input registration process for the first type of input. The method includes, after performing the first operation in accordance with the first input, detecting a second input of a second type of input via an input device of the one or more input devices, and in response to detecting the second input, initiating an input registration process for the first type of input.

According to some embodiments, a method is performed at a computer system that includes or is in communication with a display generation component and one or more input devices. The method includes detecting a first input on a rotatable input mechanism of an input device of the one or more input devices, the method including, in response to detecting the first input on the rotatable input mechanism, in accordance with a determination that the first input is a first type of input, changing an immersion level associated with a display of an augmented reality (XR) environment generated by the display generating component to a first immersion level in which the display of the XR environment includes both virtual content from an application and a transparent portion of a physical environment of the computer system. The method includes, in accordance with a determination that the first input is a second type of input, performing an operation that is different from changing the immersion level associated with the display of the XR environment.

According to some embodiments, a method is performed at a wearable device that includes or communicates with a display generation component and one or more input devices. The method includes detecting a first signal indicating that the wearable device has been removed when a respective session is active in a respective application and when the wearable device is being worn, and causing the respective session of the respective application to become inactive in response to detecting the first signal. The method includes detecting a second signal indicating that the wearable device is being worn when the respective application is inactive, and in response to detecting the second signal, in accordance with a determination that respective criteria are met, restoring the respective session of the respective application, and in accordance with a determination that respective criteria are not met, discarding restoring the respective session of the respective application, wherein the respective criteria include criteria that are met when a current user of the wearable device is determined to be an authorized user of the wearable device.

According to some embodiments, a method is performed at a computer system that includes or is in communication with one or more display generating components and one or more input devices. The method includes detecting a first input directed to a first input device of the one or more input devices while a configuration of the computer system is being performed, wherein the computer system includes one or more sensors that detect input, including one or more of an air gesture and a gaze input. The method also includes, in response to detecting the first input to the first input device, displaying a menu including a plurality of selectable options for configuring one or more interaction models.

It is noted that the various embodiments described above may be combined with any of the other embodiments described herein. The features and advantages described in this specification are not all-inclusive and, in particular, many additional features and advantages will be apparent to one of ordinary skill in the art in view of the drawings, specification, and claims. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter.

Drawings

For a better understanding of the various described embodiments, reference should be made to the following detailed description taken in conjunction with the following drawings, in which like reference numerals designate corresponding parts throughout the several views.

FIG. 1A is a block diagram illustrating an operating environment of a computer system for providing an augmented reality (XR) experience, according to some embodiments.

FIGS. 1B-1P are examples of computer systems for providing an XR experience in the operating environment of FIG. 1A.

FIG. 2 is a block diagram illustrating a controller of a computer system configured to manage and coordinate a user's XR experience, according to some embodiments.

FIG. 3 is a block diagram illustrating a display generation component of a computer system configured to provide visual components of an XR experience to a user, in accordance with some embodiments.

FIG. 4 is a block diagram illustrating a hand tracking unit of a computer system configured to capture gesture inputs of a user, according to some embodiments.

Fig. 5 is a block diagram illustrating an eye tracking unit of a computer system configured to capture gaze input of a user, in accordance with some embodiments.

Fig. 6 is a flow diagram illustrating a flash-assisted gaze tracking pipeline in accordance with some embodiments.

Fig. 7A-7O illustrate example techniques for displaying a main menu user interface within a three-dimensional environment, according to some embodiments.

Fig. 8A-8G illustrate example techniques for performing different operations based on input to an input device depending on a current display mode, according to some embodiments.

Fig. 9A-9D illustrate example techniques for performing one or more different operations based on input to an input device depending on characteristics of a displayed application user interface, according to some embodiments.

Fig. 10A-10D illustrate example techniques for a reset input registration process according to some embodiments.

Fig. 11A-11F illustrate example techniques for adjusting immersion levels of an augmented reality (XR) experience of a user in a three-dimensional environment, according to some embodiments.

Fig. 12A-12G 2 illustrate example techniques for controlling a computer system based on physical positioning of the computer system relative to a user and changes in physical positioning and a state of the computer system, according to some embodiments.

FIG. 13 is a flowchart of a method of displaying a main menu user interface within a three-dimensional environment, according to various embodiments.

FIG. 14 is a flowchart of a method of performing different operations based on input to an input device depending on a current display mode, according to various embodiments.

FIG. 15 is a flowchart of a method of performing one or more different operations based on input to an input device depending on characteristics of a displayed application user interface, according to various embodiments.

Fig. 16 is a flow diagram of a method of resetting a biometric input enrollment process according to various embodiments.

Fig. 17 is a flow diagram of a method of adjusting an immersion level of an augmented reality (XR) experience of a user in a three-dimensional environment, in accordance with various embodiments.

FIG. 18 is a flowchart of a method of controlling a computer system based on the physical location of the computer system relative to a user and changes in physical location and the state of the computer system, according to various embodiments.

Fig. 19A-19P illustrate example techniques for navigating an unobstructed menu during system configuration, according to some embodiments.

Fig. 20 is a flow chart of a method of navigating an unobstructed menu during system configuration, according to some embodiments.

Detailed Description

According to some embodiments, the present disclosure relates to a user interface for providing an augmented reality (XR) experience to a user.

The systems, methods, and GUIs described herein improve user interface interactions with virtual/augmented reality environments in a variety of ways.

In some embodiments, the device allows a user to gain access to different sets of representations using a single input to an input device (e.g., disposed on a housing of one or more display generating components through which portions of the physical environment and virtual environment are rendered visible) without displaying additional controls. Using a single input to the input device reduces the amount of time required to navigate within or transition out of the virtual environment. The physical location of the input device provides an intuitive and reliable mechanism (e.g., a haptic touch/mechanical actuation mechanism) for receiving user input, which improves the reliability and operational efficiency of the device (e.g., a computer system).

In some embodiments, a single input to the input device transitions the computer system from a high level of immersion (e.g., a full immersion mode in which only the content of the respective application is displayed) to a lower immersion mode or a non-immersion mode, or from a non-immersion mode to a mode in which the main menu user interface is also displayed), and provides intuitive top-level access to the different sets of representations while the user is in a non-immersion experience without displaying additional controls (e.g., without requiring the user to view user interface elements), thereby improving the operational efficiency of the user-machine interaction based on the single input. Using a single input to the input device reduces the amount of time required to navigate within or transition out of the virtual environment.

In some embodiments, a single input to the input device maintains display of the application user interface of one or more shared applications while ceasing to display the application user interface of one or more private applications and helps reduce the amount of interference that a user may experience while in a group interaction session. Canceling one or more private applications while continuing to display the shared application in response to a single input enables the user to focus the shared application without having to display additional controls. Furthermore, the amount of input required to cancel the private application and maintain the display of the shared application is reduced—instead of having to minimize or cancel one or more private applications alone, a single input is sufficient to maintain the display of one or more shared applications while stopping the display of one or more private applications.

In some embodiments, the second type of input initiates a biometric input enrollment reset for the first type of input, allowing more accurate and precise input enrollment information to be used to calibrate and/or perform operations based on the first type of input. Instead of having the user use the first type of input to navigate through the user interface element (e.g., menu or other control element) in order to reset the input registration for the first type of input (e.g., the first type of input may need to be reset due to inaccurate calibration, making it difficult to navigate the interface control element using the inaccurately calibrated first type of input), using the second type of input to initialize the input registration improves operational efficiency, reduces user frustration, and reduces the number of inputs required to initialize the input registration reset process. Resetting the input registration using the second type of input also helps to reduce the amount of time required to begin the input registration reset process. For example, using the second type of input enables an input registration reset to be initialized without displaying additional controls (e.g., using the first type of input to browse user interface elements).

In some embodiments, a single input device accepts two or more different types of inputs, which reduces the number of different input devices that must be provided to request and/or indicate different functionalities. The use of a rotary input mechanism allows a user to provide a continuous input range, and the bi-directionality of the rotary input mechanism allows the input to be easily and intuitively changed in either direction without having to display additional controls to the user. The same rotary input mechanism is capable of receiving a second type of input that implements a discrete function. Reducing the number of input devices that must be provided reduces physical clutter on the device, freeing up more physical space on the device and helping to prevent accidental input from inadvertent contact. The use of a rotary input mechanism provides direct access to changes in immersion levels and execution of different operations, thereby reducing the amount of time required to achieve a particular result, and thus improving the operating efficiency of the computer system. Increasing the immersion level helps remove constraints in the physical environment of the computer system (e.g., by blocking sensory output input from the physical environment (e.g., blocking visual input from a confined room and/or removing (audio) echoes from a small physical space) to realistically simulate a more spacious virtual environment to provide a virtual environment that is more beneficial for a user to interact with an application).

In some embodiments, using respective criteria to determine whether to automatically resume a respective session of a respective application enables the respective session to be resumed without any active user input and without displaying additional controls. Using the respective criteria causes the device to automatically resume the respective session when the respective criteria are met, thereby providing a more efficient human-machine interface for the wearable device, which provides a more efficient way for the user to control the wearable device while minimizing interference or requiring the user to browse additional control elements before the respective session can be resumed. Determining whether the current user of the wearable device is an authorized user of the wearable device provides improved security and/or privacy by ensuring that the respective session of the respective application is restored only when the authorized user is detected.

In some embodiments, when a configuration of a computer system is being performed, the computer system detects a first input directed to a first input device of one or more input devices, wherein the computer system includes one or more sensors to detect inputs including one or more of an air gesture and a gaze input, and in response to detecting the first input to the first input device, displays a menu including a plurality of selectable options for configuring one or more interaction models. Providing (e.g., whether to display and/or read) a menu of options for different interaction models with the computer system during configuration of the computer system (e.g., during initial setup of the computer system) enables users to select in advance a preferred manner in which they interact with the computer system, including a more intuitive manner for users to later reduce the amount and/or extent of input and/or the amount of time required to interact with the computer system, and in particular enables users using interaction models other than default and that would otherwise require assistance to use the computer system to set up a computer system with an interaction model appropriate to the user only once assistance (e.g., at the beginning of initializing the computer system), enabling users to later use the computer system independently.

Fig. 1A-6 provide a description of an example computer system for providing an XR experience to a user. Fig. 7A-7O illustrate example techniques for displaying a main menu user interface within a three-dimensional environment, according to some embodiments. FIG. 13 is a flow chart (also referred to as a flow chart) of a method of displaying a main menu user interface within a three-dimensional environment, according to various embodiments. The user interfaces in fig. 7A to 7O are used to illustrate the process in fig. 13. Fig. 8A-8G illustrate example techniques for performing different operations based on input to an input device depending on a current display mode, according to some embodiments. FIG. 14 is a flowchart of a method of performing different operations based on input to an input device depending on a current display mode, according to various embodiments. The user interfaces in fig. 8A to 8G are used to illustrate the process in fig. 14. Fig. 9A-9D illustrate example techniques for performing one or more different operations based on input to an input device depending on characteristics of a displayed application user interface, according to some embodiments. FIG. 15 is a flowchart of a method of performing one or more different operations based on input to an input device depending on characteristics of a displayed application user interface, according to various embodiments. The user interfaces in fig. 9A to 9D are used to illustrate the process in fig. 15. Fig. 10A-10D illustrate example techniques for a reset input registration process according to some embodiments. Fig. 16 is a flow diagram of a method of resetting an input registration process according to various embodiments. The user interfaces in fig. 10A to 10D are used to illustrate the process in fig. 16. Fig. 11A-11F illustrate example techniques for adjusting immersion levels of an augmented reality (XR) experience of a user in a three-dimensional environment, according to some embodiments. Fig. 17 is a flow diagram of a method of adjusting an immersion level of an augmented reality (XR) experience of a user in a three-dimensional environment, in accordance with various embodiments. The user interfaces in fig. 9A to 9D are used to illustrate the process in fig. 17. Fig. 12A-12G illustrate example techniques for controlling a computer system based on physical positioning of the computer system relative to a user and changes in physical positioning and a state of the computer system, according to some embodiments. FIG. 18 is a flowchart of a method of controlling a computer system based on the physical location of the computer system relative to a user and changes in physical location and the state of the computer system, according to various embodiments. The user interfaces in fig. 12A to 12G are used to illustrate the process in fig. 18. Fig. 19A-19P illustrate example techniques for navigating an unobstructed menu during system configuration, according to some embodiments. Fig. 20 is a flow chart of a method of navigating an unobstructed menu during system configuration, according to some embodiments. The user interfaces in fig. 19A to 19P are used to illustrate the process in fig. 20.

The processes described below enhance operability of a device and make a user-device interface more efficient (e.g., by helping a user provide appropriate input and reducing user errors in operating/interacting with the device) through various techniques, including by providing improved visual feedback to the user, reducing the number of inputs required to perform an operation, providing additional control options without cluttering the user interface with additional display controls, performing an operation when a set of conditions has been met without further user input, improving privacy and/or security, providing a richer, more detailed and/or more realistic user experience while conserving storage space, and/or additional techniques. These techniques also reduce power usage and extend battery life of the device by enabling a user to use the device faster and more efficiently. Saving battery power and thus weight, improves the ergonomics of the device. These techniques also enable real-time communication, allow fewer and/or less accurate sensors to be used, resulting in a more compact, lighter, and cheaper device, and enable the device to be used under a variety of lighting conditions. These techniques reduce energy usage, and thus heat emitted by the device, which is particularly important for wearable devices, where wearing the device can become uncomfortable for the user if the device generates too much heat completely within the operating parameters of the device components.

Furthermore, in a method described herein in which one or more steps are dependent on one or more conditions having been met, it should be understood that the method may be repeated in multiple iterations such that during the iteration, all conditions that determine steps in the method have been met in different iterations of the method. For example, if a method requires performing a first step (if a condition is met) and performing a second step (if a condition is not met), one of ordinary skill will know that the stated steps are repeated until both the condition and the condition are not met (not sequentially). Thus, a method described as having one or more steps depending on one or more conditions having been met may be rewritten as a method that repeats until each of the conditions described in the method have been met. However, this does not require the system or computer-readable medium to claim that the system or computer-readable medium contains instructions for performing the contingent operation based on the satisfaction of the corresponding condition or conditions, and thus is able to determine whether the contingent situation has been met without explicitly repeating the steps of the method until all conditions to decide on steps in the method have been met. It will also be appreciated by those of ordinary skill in the art that, similar to a method with optional steps, a system or computer readable storage medium may repeat the steps of the method as many times as necessary to ensure that all optional steps have been performed.

In some embodiments, as shown in FIG. 1A, an XR experience is provided to a user via an operating environment 100 including a computer system 101. Computer system 101 includes a controller 110 (e.g., a processor or remote server of a portable electronic device), a display generation component 120 (e.g., a Head Mounted Device (HMD), a display, a projector, or a touch screen), one or more input devices 125 (e.g., eye tracking device 130, hand tracking device 140, other input devices 150), one or more output devices 155 (e.g., speakers 160, haptic output generator 170, and other output devices 180), one or more sensors 190 (e.g., image sensor, light sensor, depth sensor, haptic sensor, orientation sensor, proximity sensor, temperature sensor, position sensor, motion sensor, or speed sensor), and optionally one or more peripheral devices 195 (e.g., a household appliance or wearable device). In some implementations, one or more of the input device 125, the output device 155, the sensor 190, and the peripheral device 195 are integrated with the display generation component 120 (e.g., in a head-mounted device or a handheld device).

In describing an XR experience, various terms are used to refer differently to several related but different environments that a user may sense and/or interact with (e.g., interact with inputs detected by computer system 101 that generated the XR experience, such inputs causing the computer system that generated the XR experience to generate audio, visual, and/or tactile feedback corresponding to the various inputs provided to computer system 101). The following are a subset of these terms:

Physical environment-a physical environment refers to the physical world in which people can sense and/or interact without the assistance of an electronic system. Physical environments such as physical parks include physical objects such as physical trees, physical buildings, and physical people. People can directly sense and/or interact with a physical environment, such as by visual, tactile, auditory, gustatory, and olfactory.

Augmented reality-conversely, an augmented reality (XR) environment refers to a completely or partially simulated environment in which people sense and/or interact via an electronic system. In XR, a subset of the physical movements of the person, or a representation thereof, is tracked, and in response, one or more characteristics of one or more virtual objects simulated in the XR environment are adjusted in a manner consistent with at least one physical law. For example, an XR system may detect a person's head rotation and, in response, adjust the graphical content and sound field presented to the person in a manner similar to the manner in which such views and sounds change in a physical environment. In some cases (e.g., for reachability reasons), the adjustment of the characteristics of the virtual object in the XR environment may be made in response to a representation of the physical motion (e.g., a voice command). A person may utilize any of his sensations to sense and/or interact with XR objects, including vision, hearing, touch, taste, and smell. For example, a person may sense and/or interact with audio objects that create a 3D or spatial audio environment that provides perception of a point audio source in 3D space. As another example, an audio object may enable audio transparency that selectively introduces environmental sounds from a physical environment with or without computer generated audio. In some XR environments, a person may sense and/or interact with only audio objects.

Examples of XRs include virtual reality and mixed reality.

Virtual reality-Virtual Reality (VR) environment refers to a simulated environment designed to be based entirely on computer-generated sensory input for one or more senses. The VR environment includes a plurality of virtual objects that a person can sense and/or interact with. For example, computer-generated images of trees, buildings, and avatars representing people are examples of virtual objects. A person may sense and/or interact with virtual objects in a VR environment through a simulation of the presence of the person within the computer-generated environment and/or through a simulation of a subset of the physical movements of the person within the computer-generated environment.

Mixed reality-in contrast to VR environments that are designed to be based entirely on computer-generated sensory input, mixed Reality (MR) environments refer to simulated environments that are designed to introduce sensory input, or representations thereof, from a physical environment in addition to including computer-generated sensory input (e.g., virtual objects). On a virtual continuum, a mixed reality environment is any condition between, but not including, a full physical environment as one end and a virtual reality environment as the other end. In some MR environments, the computer-generated sensory input may be responsive to changes in sensory input from the physical environment. In addition, some electronic systems for rendering MR environments may track the position and/or orientation relative to the physical environment to enable virtual objects to interact with real objects (i.e., physical objects or representations thereof from the physical environment). For example, the system may cause movement such that the virtual tree appears to be stationary relative to the physical ground.

Examples of mixed reality include augmented reality and augmented virtualization.

Augmented Reality (AR) environment refers to a simulated environment in which one or more virtual objects are superimposed over a physical environment or a representation of a physical environment. For example, an electronic system for presenting an AR environment may have a transparent or translucent display through which a person may directly view the physical environment. The system may be configured to present the virtual object on a transparent or semi-transparent display such that a person perceives the virtual object superimposed over the physical environment with the system. Alternatively, the system may have an opaque display and one or more imaging sensors that capture images or videos of the physical environment, which are representations of the physical environment. The system combines the image or video with the virtual object and presents the composition on an opaque display. A person utilizes the system to indirectly view the physical environment via an image or video of the physical environment and perceive a virtual object superimposed over the physical environment. As used herein, video of a physical environment displayed on an opaque display is referred to as "pass-through video," meaning that the system captures images of the physical environment using one or more image sensors and uses those images when rendering an AR environment on the opaque display. Further alternatively, the system may have a projection system that projects the virtual object into the physical environment, for example as a hologram or on a physical surface, such that a person perceives the virtual object superimposed on top of the physical environment with the system. An augmented reality environment also refers to a simulated environment in which a representation of a physical environment is transformed by computer-generated sensory information. For example, in providing a passthrough video, the system may transform one or more sensor images to apply a selected viewing angle (e.g., a viewpoint) that is different from the viewing angle captured by the imaging sensor. As another example, the representation of the physical environment may be transformed by graphically modifying (e.g., magnifying) portions thereof such that the modified portions may be representative but not real versions of the original captured image. For another example, the representation of the physical environment may be transformed by graphically eliminating or blurring portions thereof.

Enhanced virtual-enhanced virtual (AV) environments refer to simulated environments in which a virtual environment or computer-generated environment incorporates one or more sensory inputs from a physical environment. The sensory input may be a representation of one or more characteristics of the physical environment. For example, an AV park may have virtual trees and virtual buildings, but a person's face is realistically reproduced from an image taken of a physical person. As another example, the virtual object may take the shape or color of a physical object imaged by one or more imaging sensors. For another example, the virtual object may employ shadows that conform to the positioning of the sun in the physical environment.

In an augmented reality, mixed reality, or virtual reality environment, a view of the three-dimensional environment is visible to the user. A view of a three-dimensional environment is typically viewable to a user via one or more display generating components (e.g., a display or a pair of display modules that provide stereoscopic content to different eyes of the same user) through a virtual viewport having a viewport boundary that defines a range of the three-dimensional environment viewable to the user via the one or more display generating components. In some embodiments, the area defined by the viewport boundary is less than the user's field of vision in one or more dimensions (e.g., based on the user's field of vision, the size of one or more display generating components, optical properties or other physical characteristics, and/or the position and/or orientation of one or more display generating components relative to the user's eyes). In some embodiments, the area defined by the viewport boundary is greater in one or more dimensions than the user's field of vision (e.g., based on the user's field of vision, the size of one or more display generating components, optical properties or other physical characteristics, and/or the position and/or orientation of one or more display generating components relative to the user's eyes). The viewport and viewport boundaries typically move with movement of one or more display generating components (e.g., with movement of the user's head for a head-mounted device, or with movement of the user's hand for a handheld device such as a tablet or smart phone). The user's viewpoint determines what is visible in the viewport, the viewpoint typically specifies a position and direction relative to the three-dimensional environment, and as the viewpoint moves, the view of the three-dimensional environment will also shift in the viewport. For a head-mounted device, the viewpoint is typically based on the position and orientation of the user's head, face, and/or eyes to provide a view of the three-dimensional environment that is perceptually accurate and provides an immersive experience while the user is using the head-mounted device. For a handheld or stationary device, the point of view moves (e.g., the user moves toward, away from, up, down, right, and/or left) as the handheld or stationary device moves and/or as the user's positioning relative to the handheld or stationary device changes. For devices that include a display generation component having virtual passthrough, a portion of the physical environment that is visible (e.g., displayed and/or projected) via the one or more display generation components is based on the field of view of one or more cameras in communication with the display generation component, which one or more cameras generally move with movement of the display generation component (e.g., with movement of the user's head for a head-mounted device or with movement of the user's hand for a hand-held device such as a tablet or smart phone), because the user's point of view moves with movement of the field of view of the one or more cameras (and updates the appearance of one or more virtual objects displayed via the one or more display generation components based on the user's point of view (e.g., updates the displayed position and pose of the virtual object based on movement of the user's point of view)). For display generation components having optical passthrough, portions of the physical environment that are visible via the one or more display generation components (e.g., optically visible through one or more partially or fully transparent portions of the display generation component) are based on the user's field of view through the partially or fully transparent portions of the display generation component (e.g., for a head mounted device to move with movement of the user's head, or for a handheld device such as a tablet or smart phone to move with movement of the user's hand), because the user's point of view moves with movement of the user through the partially or fully transparent portions of the display generation component (and the appearance of the one or more virtual objects is updated based on the user's point of view).

In some implementations, the representation of the physical environment (e.g., via a virtual or optical passthrough display) may be partially or completely obscured by the virtual environment. In some implementations, the amount of virtual environment displayed (e.g., the amount of physical environment not displayed) is based on the immersion level of the virtual environment (e.g., relative to a representation of the physical environment). For example, increasing the immersion level optionally causes more virtual environments to be displayed, more physical environments to be replaced and/or occluded, and decreasing the immersion level optionally causes fewer virtual environments to be displayed, revealing portions of physical environments that were not previously displayed and/or occluded. In some embodiments, at a particular immersion level, one or more first background objects (e.g., in a representation of a physical environment) are visually de-emphasized (e.g., dimmed, displayed with increased transparency) more than one or more second background objects, and one or more third background objects cease to be displayed. In some embodiments, the level of immersion includes an associated degree to which virtual content (e.g., virtual environment and/or virtual content) displayed by the computer system obscures background content (e.g., content other than virtual environment and/or virtual content) surrounding/behind the virtual environment, optionally including a number of items of background content displayed and/or a displayed visual characteristic (e.g., color, contrast, and/or opacity) of the background content, an angular range of the virtual content displayed via the display generation component (e.g., 60 degrees of content displayed at low immersion, 120 degrees of content displayed at medium immersion, or 180 degrees of content displayed at high immersion), and/or a proportion of a field of view occupied by the virtual content displayed via the display generation component (e.g., 33% of a field of view occupied by the virtual content at low immersion, 66% of a field of view occupied by the virtual content at medium immersion, or 100% of a field of view occupied by the virtual content at high immersion). in some implementations, the background content is included in a background on which the virtual content is displayed (e.g., background content in a representation of the physical environment). In some embodiments, the background content includes a user interface (e.g., a user interface generated by a computer system that corresponds to an application), virtual objects that are not associated with or included in the virtual environment and/or virtual content (e.g., a representation of a file or other user generated by the computer system, etc.), and/or real objects (e.g., passthrough objects that represent real objects in a physical environment surrounding the user, visible such that they are displayed via a display generating component and/or visible via a transparent or translucent component of the display generating component because the computer system does not obscure/obstruct their visibility through the display generating component). In some embodiments, at low immersion levels (e.g., a first immersion level), the background, virtual, and/or real objects are displayed in a non-occluded manner. For example, a virtual environment with a low level of immersion is optionally displayed simultaneously with background content, which is optionally displayed at full brightness, color, and/or translucency. In some implementations, at a higher immersion level (e.g., a second immersion level that is higher than the first immersion level), the background, virtual, and/or real objects are displayed in an occluded manner (e.g., dimmed, obscured, or removed from the display). For example, the corresponding virtual environment with a high level of immersion is displayed without simultaneously displaying the background content (e.g., in full screen or full immersion mode). As another example, a virtual environment displayed at a medium level of immersion is displayed simultaneously with background content that is darkened, obscured, or otherwise de-emphasized. In some embodiments, the visual characteristics of the background objects differ between the background objects. For example, at a particular immersion level, one or more first background objects are visually de-emphasized (e.g., dimmed, obscured, and/or displayed with increased transparency) more than one or more second background objects, and one or more third background objects cease to be displayed. In some embodiments, zero immersion or zero level of immersion corresponds to a virtual environment that ceases to be displayed, and instead displays a representation of the physical environment (optionally with one or more virtual objects, such as applications, windows, or virtual three-dimensional objects) without the representation of the physical environment being obscured by the virtual environment. adjusting the immersion level using physical input elements provides a quick and efficient method of adjusting the immersion, which enhances the operability of the computer system and makes the user-device interface more efficient.

Virtual object with viewpoint locked when the computer system displays the virtual object at the same location and/or position in the user's viewpoint, the virtual object is viewpoint locked even if the user's viewpoint is offset (e.g., changed). In embodiments in which the computer system is a head-mounted device, the user's point of view is locked to the forward direction of the user's head (e.g., the user's point of view is at least a portion of the user's field of view when the user is looking directly in front), and thus, without moving the user's head, the user's point of view remains fixed even when the user's gaze is offset. In embodiments in which the computer system has a display generating component (e.g., a display screen) that is repositionable with respect to the user's head, the user's point of view is an augmented reality view presented to the user on the display generating component of the computer system. For example, a viewpoint-locked virtual object displayed in the upper left corner of the user's viewpoint continues to be displayed in the upper left corner of the user's viewpoint when the user's viewpoint is in a first orientation (e.g., the user's head faces north), even when the user's viewpoint changes to a second orientation (e.g., the user's head faces west). In other words, the position and/or orientation of the virtual object in which the viewpoint lock is displayed in the viewpoint of the user is independent of the position and/or orientation of the user in the physical environment. In embodiments in which the computer system is a head-mounted device, the user's point of view is locked to the orientation of the user's head, such that the virtual object is also referred to as a "head-locked virtual object.

Environment-locked visual objects when the computer system displays a virtual object at a location and/or position in the viewpoint of the user, the virtual object is environment-locked (alternatively, "world-locked"), the location and/or position being based on (e.g., selected and/or anchored to) a location and/or object in a three-dimensional environment (e.g., a physical environment or virtual environment) with reference to the location and/or object. As the user's point of view moves, the position and/or object in the environment relative to the user's point of view changes, which results in the environment-locked virtual object being displayed at a different position and/or location in the user's point of view. For example, an environmentally locked virtual object that locks onto a tree immediately in front of the user is displayed at the center of the user's viewpoint. When the user's viewpoint is shifted to the right (e.g., the user's head is turned to the right) such that the tree is now to the left of center in the user's viewpoint (e.g., the tree positioning in the user's viewpoint is shifted), the environmentally locked virtual object that is locked onto the tree is displayed to the left of center in the user's viewpoint. In other words, the position and/or orientation at which the environment-locked virtual object is displayed in the user's viewpoint depends on the position and/or orientation of the object in the environment to which the virtual object is locked. In some embodiments, the computer system uses a stationary frame of reference (e.g., a coordinate system anchored to a fixed location and/or object in the physical environment) in order to determine the location of the virtual object that displays the environmental lock in the viewpoint of the user. The environment-locked virtual object may be locked to a stationary portion of the environment (e.g., a floor, wall, table, or other stationary object), or may be locked to a movable portion of the environment (e.g., a vehicle, animal, person, or even a representation of a portion of a user's body such as a user's hand, wrist, arm, or foot that moves independent of the user's point of view) such that the virtual object moves as the point of view or the portion of the environment moves to maintain a fixed relationship between the virtual object and the portion of the environment.

In some implementations, the environmentally or view-locked virtual object exhibits an inert follow-up behavior that reduces or delays movement of the environmentally or view-locked virtual object relative to movement of a reference point that the virtual object follows. In some embodiments, the computer system intentionally delays movement of the virtual object when detecting movement of a reference point (e.g., a portion of the environment, a viewpoint, or a point fixed relative to the viewpoint, such as a point between 5cm and 300cm from the viewpoint) that the virtual object is following while exhibiting inert follow-up behavior. For example, when a reference point (e.g., the portion or viewpoint of the environment) moves at a first speed, the virtual object is moved by the device to remain locked to the reference point, but moves at a second speed that is slower than the first speed (e.g., until the reference point stops moving or slows down, at which point the virtual object begins to catch up with the reference point). In some embodiments, when the virtual object exhibits inert follow-up behavior, the device ignores small movements of the reference point (e.g., ignores movements of the reference point below a threshold amount of movement, such as 0 degrees to 5 degrees or 0cm to 50 cm). For example, when a reference point (e.g., a portion or point of view of an environment to which a virtual object is locked) moves a first amount, the distance between the reference point and the virtual object increases (e.g., because the virtual object is being displayed so as to maintain a fixed or substantially fixed position relative to a different point of view or portion of the environment than the reference point to which the virtual object is locked), and when the reference point (e.g., a portion or point of view of an environment to which the virtual object is locked) moves a second amount greater than the first amount, the distance between the reference point and the virtual object increases (e.g., because the virtual object is being displayed so as to maintain a fixed or substantially fixed position relative to a different point of view or portion of the environment than the point of reference to which the virtual object is locked) then decreases as the amount of movement of the reference point increases above a threshold (e.g., an "inertia following" threshold) because the virtual object is moved by the computer system to maintain a fixed or substantially fixed position relative to the reference point. In some embodiments, maintaining a substantially fixed location of the virtual object relative to the reference point includes the virtual object being displayed within a threshold distance (e.g., 1cm, 2cm, 3cm, 5cm, 15cm, 20cm, 50 cm) of the reference point in one or more dimensions (e.g., up/down, left/right, and/or forward/backward of the location relative to the reference point).

Hardware there are many different types of electronic systems that enable a person to sense and/or interact with various XR environments. Examples include head-mounted systems, projection-based systems, head-up displays (HUDs), vehicle windshields integrated with display capabilities, windows integrated with display capabilities, displays formed as lenses designed for placement on a person's eyes (e.g., similar to contact lenses), headphones/earphones, speaker arrays, input systems (e.g., wearable or handheld controllers with or without haptic feedback), smart phones, tablet devices, and desktop/laptop computers. The head-mounted system may have one or more speakers and an integrated opaque display. Alternatively, the head-mounted system may be configured to accept an external opaque display (e.g., a smart phone). The head-mounted system may incorporate one or more imaging sensors for capturing images or video of the physical environment and/or one or more microphones for capturing audio of the physical environment. The head-mounted system may have a transparent or translucent display instead of an opaque display. A transparent or translucent display may have a medium through which light representing an image is directed to a person's eye. The display may utilize digital light projection, OLED, LED, uLED, liquid crystal on silicon, laser scanning light sources, or any combination of these techniques. The medium may be an optical waveguide, a holographic medium, an optical combiner, an optical reflector, or any combination thereof. In one embodiment, the transparent or translucent display may be configured to selectively become opaque. Projection-based systems may employ retinal projection techniques that project a graphical image onto a person's retina. The projection system may also be configured to project the virtual object into the physical environment, for example as a hologram or on a physical surface. In some embodiments, the controller 110 is configured to manage and coordinate the XR experience of the user. In some embodiments, controller 110 includes suitable combinations of software, firmware, and/or hardware. The controller 110 is described in more detail below with respect to fig. 2. In some implementations, the controller 110 is a computing device that is in a local or remote location relative to the scene 105 (e.g., physical environment). For example, the controller 110 is a local server located within the scene 105. As another example, the controller 110 is a remote server (e.g., a cloud server or a central server) located outside of the scene 105. In some implementations, the controller 110 is communicatively coupled with the display generation component 120 (e.g., HMD, display, projector, or touch screen) via one or more wired or wireless communication channels 144 (e.g., bluetooth, IEEE 802.11x, IEEE 802.16x, or IEEE 802.3 x). As another example, the controller 110 is included within a housing (e.g., a physical enclosure) of the display generation component 120 (e.g., an HMD or portable electronic device including a display and one or more processors), one or more of the input devices 125, one or more of the output devices 155, one or more of the sensors 190, and/or one or more of the peripheral devices 195, or shares the same physical housing or support structure with one or more of the above.

In some embodiments, display generation component 120 is configured to provide an XR experience (e.g., at least a visual component of the XR experience) to a user. In some embodiments, display generation component 120 includes suitable combinations of software, firmware, and/or hardware. The display generation component 120 is described in more detail below with respect to fig. 3. In some embodiments, the functionality of the controller 110 is provided by and/or combined with the display generation component 120.

According to some embodiments, display generation component 120 provides an XR experience to a user when the user is virtually and/or physically present within scene 105.

In some embodiments, the display generating component is worn on a portion of the user's body (e.g., on his/her head or on his/her hand). As such, display generation component 120 includes one or more XR displays provided for displaying XR content. For example, in various embodiments, the display generation component 120 encloses a field of view of a user. In some embodiments, display generation component 120 is a handheld device (such as a smart phone or tablet device) configured to present XR content, and the user holds the device with a display facing the user's field of view and a camera facing scene 105. In some embodiments, the handheld device is optionally placed within a housing that is worn on the head of the user. In some embodiments, the handheld device is optionally placed on a support (e.g., tripod) in front of the user. In some embodiments, display generation assembly 120 is an XR room, housing, or room configured to present XR content, wherein a user does not wear or hold display generation assembly 120. Many of the user interfaces described with reference to one type of hardware for displaying XR content (e.g., a handheld device or a device on a tripod) may be implemented on another type of hardware for displaying XR content (e.g., an HMD or other wearable computing device). For example, a user interface showing interactions with XR content triggered based on interactions occurring in a space in front of a handheld device or a tripod-mounted device may similarly be implemented with an HMD, where the interactions occur in the space in front of the HMD and responses to the XR content are displayed via the HMD. Similarly, a user interface showing interaction with XR content triggered based on movement of a handheld device or tripod-mounted device relative to a physical environment (e.g., a scene 105 or a portion of a user's body (e.g., a user's eye, head, or hand)) may similarly be implemented with an HMD, where the movement is caused by movement of the HMD relative to the physical environment (e.g., the scene 105 or a portion of the user's body (e.g., a user's eye, head, or hand)).

While relevant features of the operating environment 100 are shown in fig. 1A, those of ordinary skill in the art will recognize from this disclosure that various other features are not illustrated for the sake of brevity and so as not to obscure more relevant aspects of the example embodiments disclosed herein.

Fig. 1A-1P illustrate various examples of computer systems for performing the methods and providing audio, visual, and/or tactile feedback as part of the user interfaces described herein. In some embodiments, the computer system includes one or more display generating components (e.g., first display assembly 1-120a and second display assembly 1-120b and/or first optical module 11.1.1-104a and second optical module 11.1.1-104 b) for displaying to a user of the computer system a representation of the virtual element and/or physical environment, optionally generated based on the detected event and/or user input detected by the computer system. the user interface generated by the computer system is optionally corrected by one or more correction lenses 11.3.2-216, which are optionally removably attached to one or more of the optical modules, to make the user interface easier to view by a user who would otherwise use glasses or contact lenses to correct their vision. While many of the user interfaces illustrated herein show a single view of the user interface, the user interfaces in HMDs are optionally displayed using two optical modules (e.g., first display assembly 1-120a and second display assembly 1-120b and/or first optical module 11.1.1-104a and second optical module 11.1.1-104 b), one for the user's right eye and a different optical module for the user's left eye, and presenting slightly different images to the two different eyes to generate illusions of stereoscopic depth, the single view of the user interface is typically a right eye view or a left eye view, the depth effects being explained in text or using other schematics or views. In some embodiments, the computer system includes one or more external displays (e.g., display fittings 1-108) for displaying status information of the computer system to a user of the computer system (when the computer system is not being worn) and/or to others in the vicinity of the computer system, the status information optionally being generated based on detected events and/or user inputs detected by the computer system. In some embodiments, the computer system includes one or more audio output components (e.g., electronic components 1-112) for generating audio feedback, the audio feedback optionally being generated based on detected events and/or user inputs detected by the computer system. In some embodiments, the computer system includes one or more input devices for detecting input, such as one or more sensors (e.g., sensor assemblies 1-356 and/or one or more sensors in fig. 1I) for detecting information about a physical environment of the device, which can be used (optionally in conjunction with one or more illuminators, such as the illuminators described in fig. 1I) to generate a digital passthrough image, capture visual media (e.g., photographs and/or videos) corresponding to the physical environment, or determine a pose (e.g., position and/or orientation) of physical objects and/or surfaces in the physical environment, such that virtual objects can be placed based on the detected pose of the physical objects and/or surfaces. In some embodiments, the computer system includes one or more input devices for detecting input, such as one or more sensors (e.g., sensor assemblies 1-356 and/or one or more sensors in fig. 1I) for detecting hand positioning and/or movement, which can be used (optionally in combination with one or more illuminators, such as illuminators 6-124 described in fig. 1I) to determine when one or more air gestures have been performed. In some embodiments, the computer system includes one or more input devices for detecting input, such as one or more sensors for detecting eye movement (e.g., eye tracking and gaze tracking sensors in fig. 1I), which can be used (optionally in combination with one or more lights, such as lights 11.3.2-110 in fig. 1O) to determine attention or gaze location and/or gaze movement, which can optionally be used to detect gaze-only input based on gaze movement and/or dwell. Combinations of the various sensors described above may be used to determine a user's facial expression and/or hand movement for generating an avatar or representation of the user, such as an anthropomorphic avatar or representation for a real-time communication session, wherein the avatar has facial expressions, hand movements, and/or body movements based on or similar to the detected facial expressions, hand movements, and/or body movements of the user of the device. Gaze and/or attention information is optionally combined with hand tracking information to determine interactions between a user and one or more user interfaces based on direct and/or indirect inputs, such as air gestures, or inputs using one or more hardware input devices, such as one or more buttons (e.g., first button 1-128, button 11.1.1-114, second button 1-132, and/or dial or button 1-328), knob (e.g., first button 1-128, button 11.1.1-114, and/or dial or button 1-328), digital crown (e.g., first button 1-128 that is depressible and torsionally or rotatably, a dial or button 1-328), Buttons 11.1.1-114 and/or dials or buttons 1-328), a touch pad, a touch screen, a keyboard, a mouse, and/or other input devices. One or more buttons (e.g., first button 1-128, button 11.1.1-114, second button 1-132, and/or dial or button 1-328) are optionally used to perform system operations, such as re-centering content in a three-dimensional environment visible to a user of the device, displaying a main user interface for launching an application, starting a real-time communication session, or initiating display of a virtual three-dimensional background. The knob or digital crown (e.g., first button 1-128, button 11.1.1-114, and/or dial or button 1-328, which may be depressed and twisted or rotatable) is optionally rotatable to adjust parameters of the visual content, such as an immersion level of the virtual three-dimensional environment (e.g., a degree to which the virtual content occupies a user's viewport in the three-dimensional environment) or other parameters associated with the three-dimensional environment and the virtual content displayed via the optical modules (e.g., first display assembly 1-120a and second display assembly 1-120b and/or first optical module 11.1.1-104a and second optical module 11.1.1-104 b).

Fig. 1B illustrates front, top, perspective views of an example of a Head Mounted Display (HMD) device 1-100 configured to be worn by a user and to provide a virtual and changing/mixed reality (VR/AR) experience. The HMD 1-100 may include a display unit 1-102 or fitting, an electronic strap fitting 1-104 connected to and extending from the display unit 1-102, and a strap fitting 1-106 secured to the electronic strap fitting 1-104 at either end. The electronic strap assembly 1-104 and the strap 1-106 may be part of a retaining assembly configured to wrap around the head of a user to retain the display unit 1-102 against the face of the user.

In at least one example, the strap assembly 1-106 may include a first strap 1-116 configured to be wrapped around a back side of a user's head and a second strap 1-117 configured to extend over a top of the user's head. As shown, the second strip may extend between the first electronic strip 1-105a and the second electronic strip 1-105b of the electronic strip assembly 1-104. The strap assembly 1-104 and the strap assembly 1-106 may be part of a securing mechanism that extends rearward from the display unit 1-102 and is configured to hold the display unit 1-102 against the face of the user.

In at least one example, the securing mechanism includes a first electronic strip 1-105a that includes a first proximal end 1-134 coupled to the display unit 1-102 (e.g., the housing 1-150 of the display unit 1-102) and a first distal end 1-136 opposite the first proximal end 1-134. The securing mechanism may also include a second electronic strip 1-105b including a second proximal end 1-138 coupled to the housing 1-150 of the display unit 1-102 and a second distal end 1-140 opposite the second proximal end 1-138. The securing mechanism may also include a first strap 1-116 and a second strap 1-117, the first strap including a first end 1-142 coupled to the first distal end 1-136 and a second end 1-144 coupled to the second distal end 1-140, and the second strap extending between the first electronic strip 1-105a and the second electronic strip 1-105 b. The straps 1-105a-b and straps 1-116 may be coupled via a connection mechanism or fitting 1-114. In at least one example, the second strap 1-117 includes a first end 1-146 coupled to the first electronic strip 1-105a between the first proximal end 1-134 and the first distal end 1-136 and a second end 1-148 coupled to the second electronic strip 1-105b between the second proximal end 1-138 and the second distal end 1-140.

In at least one example, the first and second electronic strips 1-105a-b comprise plastic, metal, or other structural material that forms the shape of the substantially rigid strips 1-105 a-b. In at least one example, the first and second belts 1-116, 117 are formed of a resiliently flexible material including woven textile, rubber, or the like. The first strap 1-116 and the second strap 1-117 may be flexible to conform to the shape of the user's head when the HMD 1-100 is worn.

In at least one example, one or more of the first and second electronic strips 1-105a-b may define an interior strip volume and include one or more electronic components disposed in the interior strip volume. In one example, as shown in FIG. 1B, the first electronic strip 1-105a may include electronic components 1-112. In one example, the electronic components 1-112 may include speakers. In one example, the electronic components 1-112 may include a computing component, such as a processor.

In at least one example, the housing 1-150 defines a first front opening 1-152. The front opening is marked 1-152 in fig. 1B with a dashed line, because the display fitting 1-108 is arranged to obstruct the first opening 1-152 from view when the HMD 1-100 is assembled. The housing 1-150 may also define a rear second opening 1-154. The housing 1-150 further defines an interior volume between the first opening 1-152 and the second opening 1-154. In at least one example, the HMD 1-100 includes a display assembly 1-108, which may include a front cover and a display screen (shown in other figures) disposed in or across the front opening 1-152 to obscure the front opening 1-152. In at least one example, the display screen of the display assembly 1-108 and, in general, the display assembly 1-108 have a curvature configured to follow the curvature of the user's face. The display screen of the display assembly 1-108 may be curved as shown to complement the facial features of the user and the overall curvature from one side of the face to the other, e.g. left to right and/or top to bottom, with the display unit 1-102 being pressed.

In at least one example, the housing 1-150 may define a first aperture 1-126 between the first and second openings 1-152, 1-154 and a second aperture 1-130 between the first and second openings 1-152, 1-154. The HMD 1-100 may also include a first button 1-126 disposed in the first aperture 1-128, and a second button 1-132 disposed in the second aperture 1-130. The first button 1-128 and the second button 1-132 can be pressed through the respective holes 1-126, 1-130. In at least one example, the first button 1-126 and/or the second button 1-132 may be a twistable dial and a depressible button. In at least one example, the first button 1-128 is a depressible and twistable dial button and the second button 1-132 is a depressible button.

Fig. 1C illustrates a rear perspective view of HMDs 1-100. The HMD 1-100 may include a light seal 1-110 extending rearward from a housing 1-150 of the display assembly 1-108 around a perimeter of the housing 1-150, as shown. The light seal 1-110 may be configured to extend from the housing 1-150 to the face of the user, around the eyes of the user, to block external light from being visible. In one example, the HMD 1-100 may include a first display assembly 1-120a and a second display assembly 1-120b disposed at or in a rearward facing second opening 1-154 defined by the housing 1-150 and/or disposed in an interior volume of the housing 1-150 and configured to project light through the second opening 1-154. In at least one example, each display assembly 1-120a-b may include a respective display screen 1-122a, 1-122b configured to project light in a rearward direction through the second opening 1-154 toward the eyes of the user.

In at least one example, referring to both fig. 1B and 1C, the display assembly 1-108 may be a front-facing front display assembly including a display screen configured to project light in a first forward direction, and the rear display screen 1-122a-B may be configured to project light in a second rearward direction opposite the first direction. As described above, the light seals 1-110 may be configured to block light external to the HMD 1-100 from reaching the user's eyes, including light projected by the forward display screen of the display assembly 1-108 shown in the front perspective view of fig. 1B. In at least one example, the HMD 1-100 may further include a curtain 1-124 that obscures the second opening 1-154 between the housing 1-150 and the rear display assembly 1-120 a-b. In at least one example, the curtains 1-124 may be elastic or at least partially elastic.

Any of the features, components, and/or parts shown in fig. 1B and 1C (including arrangements and configurations thereof) may be included in any other examples of the devices, features, components, and parts shown in fig. 1D-1F and described herein, alone or in any combination. Likewise, any of the features, components, and/or parts shown or described with reference to fig. 1D-1F (including arrangements and configurations thereof) may be included in the examples of devices, features, components, and parts shown in fig. 1B and 1C, alone or in any combination.

Fig. 1D illustrates an exploded view of an example of an HMD 1-200 that includes various portions or parts that are separated according to the modular and selective coupling of these parts. For example, HMD 1-200 may include a strap 1-216 that may be selectively coupled to a first electronic ribbon 1-205a and a second electronic ribbon 1-205b. The first fixing strap 1-205a may include a first electronic component 1-212a and the second fixing strap 1-205b may include a second electronic component 1-212b. In at least one example, the first and second strips 1-205a-b can be removably coupled to the display unit 1-202.

Furthermore, the HMD 1-200 may include a light seal 1-210 configured to be removably coupled to the display unit 1-202. The HMD 1-200 may also include a lens 1-218, which may be removably coupled to the display unit 1-202, for example, on a first display assembly and a second display assembly that include a display screen. Lenses 1-218 may include custom prescription lenses configured to correct vision. As noted, each part shown in the exploded view of fig. 1D and described above can be removably coupled, attached, reattached, and replaced to update the part or to swap out the part for a different user. For example, bands such as bands 1-216, light seals such as light seals 1-210, lenses such as lenses 1-218, and electronic bands such as electronic bands 1-205a-b may be swapped out according to users such that these portions are customized to fit and correspond to a single user of HMD 1-200.

Any of the features, components, and/or parts shown in fig. 1D (including arrangements and configurations thereof) may be included in any other examples of the devices, features, components, and parts shown in fig. 1B, 1C, and 1E-1F, and described herein, alone or in any combination. Also, any of the features, components, and/or parts shown or described with reference to fig. 1B, 1C, and 1E-1F (including arrangements and configurations thereof) may be included in the examples of devices, features, components, and parts shown in fig. 1D, alone or in any combination.

Fig. 1E illustrates an exploded view of an example of a display unit 1-306 of an HMD. The display unit 1-306 may include a front display assembly 1-308, a frame/housing assembly 1-350, and a curtain assembly 1-324. The display unit 1-306 may also include a sensor assembly 1-356, a logic board assembly 1-358, and a cooling assembly 1-360 disposed between the frame assembly 1-350 and the front display assembly 1-308. In at least one example, the display unit 1-306 may also include a rear display assembly 1-320 including a first rear display screen 1-322a and a second rear display screen 1-322b disposed between the frame 1-350 and the shade assembly 1-324.

In at least one example, the display unit 1-306 may further include a motor assembly 1-362 configured as an adjustment mechanism for adjusting the positioning of the display screen 1-322a-b of the display assembly 1-320 relative to the frame 1-350. In at least one example, the display assembly 1-320 is mechanically coupled to the motor assembly 1-362, with each display screen 1-322a-b having at least one motor such that the motor is capable of translating the display screen 1-322a-b to match the inter-pupillary distance of the user's eyes.

In at least one example, the display unit 1-306 may include a dial or button 1-328 that is depressible relative to the frame 1-350 and accessible by a user external to the frame 1-350. The buttons 1-328 may be electrically connected to the motor assembly 1-362 via a controller such that the buttons 1-328 may be manipulated by a user to cause the motor of the motor assembly 1-362 to adjust the positioning of the display screen 1-322 a-b.

Any of the features, components, and/or parts shown in fig. 1E (including arrangements and configurations thereof) may be included in any other examples of the devices, features, components, and parts shown in fig. 1B-1D and 1F and described herein, alone or in any combination. Also, any of the features, components, and/or parts shown and described with reference to fig. 1B-1D and 1F (including arrangements and configurations thereof) may be included in the examples of devices, features, components, and parts shown in fig. 1E, alone or in any combination.

Fig. 1F illustrates an exploded view of another example of a display unit 1-406 of an HMD device similar to other HMD devices described herein. The display units 1-406 may include front display assemblies 1-402, sensor assemblies 1-456, logic board assemblies 1-458, cooling assemblies 1-460, frame assemblies 1-450, rear display assemblies 1-421, and curtain assemblies 1-424. The display unit 1-406 may further comprise a motor assembly 1-462 for adjusting the positioning of the first display subassembly 1-420a and the second display subassembly 1-420b of the rear display assembly 1-421, including the first and second respective display screens for interpupillary adjustment, as described above.

The various parts, systems, and assemblies shown in the exploded view of fig. 1F are described in more detail herein with reference to fig. 1B-1E and subsequent figures referenced in this disclosure. The display unit 1-406 shown in fig. 1F may be assembled and integrated with the securing mechanism shown in fig. 1B-1E, including electronic straps, bands, and other components including light seals, connector assemblies, and the like.

Any of the features, components, and/or parts shown in fig. 1F (including arrangements and configurations thereof) may be included in any other examples of the devices, features, components, and parts shown in fig. 1B-1E and described herein, alone or in any combination. Likewise, any of the features, components, and/or parts shown and described with reference to fig. 1B-1E (including arrangements and configurations thereof) may be included in the examples of devices, features, components, and parts shown in fig. 1F, alone or in any combination.

Fig. 1G illustrates a perspective exploded view of a front cover assembly 3-100 of an HMD device described herein, such as the front cover assembly 3-1 of the HMD 3-100 shown in fig. 1G or any other HMD device shown and described herein. The front cover assembly 3-100 shown in FIG. 1G may include a transparent or translucent cover 3-102, a shield 3-104 (or "canopy"), an adhesive layer 3-106, a display assembly 3-108 including a lenticular lens panel or array 3-110, and a structural trim 3-112. The adhesive layer 3-106 may secure the shield 3-104 and/or transparent cover 3-102 to the display assembly 3-108 and/or trim 3-112. The trim 3-112 may secure the various components of the front cover assembly 3-100 to a frame or chassis of the HMD device.

In at least one example, as shown in FIG. 1G, the transparent cover 3-102, the shield 3-104, and the display assembly 3-108, including the lenticular lens array 3-110, may be curved to accommodate the curvature of the user's face. The transparent cover 3-102 and the shield 3-104 may be curved in two or three dimensions, for example, vertically in the Z direction, inside and outside the Z-X plane, and horizontally in the X direction, inside and outside the Z-X plane. In at least one example, the display assembly 3-108 may include a lenticular lens array 3-110 and a display panel having pixels configured to project light through the shield 3-104 and the transparent cover 3-102. The display assembly 3-108 may be curved in at least one direction (e.g., a horizontal direction) to accommodate the curvature of the user's face from one side (e.g., left side) of the face to the other side (e.g., right side). In at least one example, each layer or component of the display assembly 3-108 (which will be shown in subsequent figures and described in more detail, but which may include the lenticular lens array 3-110 and the display layer) may be similarly or concentrically curved in a horizontal direction to accommodate the curvature of the user's face.

In at least one example, the shield 3-104 may comprise a transparent or translucent material through which the display assembly 3-108 projects light. In one example, the shield 3-104 may include one or more opaque portions, such as opaque ink printed portions or other opaque film portions on the back side of the shield 3-104. The rear surface may be the surface of the shield 3-104 facing the eyes of the user when the HMD device is worn. In at least one example, the opaque portion may be on a front surface of the shroud 3-104 opposite the rear surface. In at least one example, the one or more opaque portions of the shroud 3-104 may include a peripheral portion that visually conceals any component around the outer periphery of the display screen of the display assembly 3-108. In this manner, the opaque portion of the shield conceals any other components of the HMD device that would otherwise be visible through the transparent or translucent cover 3-102 and/or shield 3-104, including electronic components, structural components, and the like.

In at least one example, the shield 3-104 can define one or more aperture transparent portions 3-120 through which the sensor can transmit and receive signals. In one example, the portions 3-120 are holes through which the sensors may extend or through which signals are transmitted and received. In one example, the portions 3-120 are transparent portions, or portions that are more transparent than the surrounding translucent or opaque portions of the shield, through which the sensor can transmit and receive signals through the shield and through the transparent cover 3-102. In one example, the sensor may include a camera, an IR sensor, a LUX sensor, or any other visual or non-visual environmental sensor of the HMD device.

Any of the features, components, and/or parts shown in fig. 1G (including arrangements and configurations thereof) may be included in any other examples of the devices, features, components, and parts described herein, alone or in any combination. Likewise, any of the features, components, and/or parts shown and described herein (including arrangements and configurations thereof) may be included in the examples of devices, features, components, and parts shown in fig. 1G, alone or in any combination.

Fig. 1H illustrates an exploded view of an example of an HMD device 6-100. The HMD device 6-100 may include a sensor array or system 6-102 that includes one or more sensors, cameras, projectors, etc. mounted to one or more components of the HMD 6-100. In at least one example, the sensor system 6-102 may include a bracket 1-338 to which one or more sensors of the sensor system 6-102 may be secured/fastened.

FIG. 1I illustrates a portion of an HMD device 6-100 that includes a front transparent cover 6-104 and a sensor system 6-102. The sensor systems 6-102 may include a number of different sensors, transmitters, receivers, including cameras, IR sensors, projectors, etc. Transparent covers 6-104 are illustrated in front of the sensor systems 6-102 to illustrate the relative positioning of the various sensors and emitters and the orientation of each sensor/emitter of the systems 6-102. As referred to herein, "lateral," "side," "transverse," "horizontal," and other like terms refer to an orientation or direction as indicated by the X-axis shown in fig. 1J. Terms such as "vertical," "upward," "downward," and similar terms refer to an orientation or direction as indicated by the Z-axis shown in fig. 1J. Terms such as "forward", "rearward", and the like refer to an orientation or direction as indicated by the Y-axis shown in fig. 1J.

In at least one example, the transparent cover 6-104 may define a front exterior surface of the HMD device 6-100, and the sensor system 6-102 including the various sensors and their components may be disposed behind the cover 6-104 in the Y-axis/direction. The cover 6-104 may be transparent or translucent to allow light to pass through the cover 6-104, including both the light detected by the sensor system 6-102 and the light emitted thereby.

As described elsewhere herein, the HMD device 6-100 may include one or more controllers including a processor for electrically coupling the various sensors and transmitters of the sensor system 6-102 with one or more motherboards, processing units, and other electronic devices, such as a display screen, and the like. Furthermore, as will be shown in more detail below with reference to other figures, the various sensors, emitters, and other components of the sensor system 6-102 may be coupled to various structural frame members, brackets, etc. of the HMD device 6-100, which are not shown in fig. 1I. For clarity, FIG. 1I shows components of the sensor systems 6-102 unattached and not electrically coupled to other components.

In at least one example, the apparatus may include one or more controllers having a processor configured to execute instructions stored on a memory component electrically coupled to the processor. The instructions may include or cause the processor to execute one or more algorithms for self-correcting the angle and positioning of the various cameras described herein over time as the initial positioning, angle or orientation of the cameras collides or deforms due to an unexpected drop event or other event.

In at least one example, the sensor system 6-102 may include one or more scene cameras 6-106. The system 6-102 may include two scene cameras 6-102, one disposed on each side of the bridge or arch of the HMD device 6-100, such that each of the two cameras 6-106 generally corresponds to the positioning of the left and right eyes of the user behind the cover 6-103. In at least one example, the scene camera 6-106 is oriented generally forward in the Y-direction to capture images in front of the user during use of the HMD 6-100. In at least one example, the scene camera is a color camera and provides images and content for MR video passthrough to a display screen facing the user's eyes when using the HMD device 6-100. The scene cameras 6-106 may also be used for environment and object reconstruction.

In at least one example, the sensor system 6-102 may include a first depth sensor 6-108 that is directed forward in the Y-direction. In at least one example, the first depth sensor 6-108 may be used for environmental and object reconstruction as well as hand and body tracking of the user. In at least one example, the sensor system 6-102 may include a second depth sensor 6-110 centrally disposed along a width (e.g., along an X-axis) of the HMD device 6-100. For example, the second depth sensor 6-110 may be disposed over the central nose bridge or on a fitting structure over the nose when the user wears the HMD 6-100. In at least one example, the second depth sensor 6-110 may be used for environmental and object reconstruction and hand and body tracking. In at least one example, the second depth sensor may comprise a LIDAR sensor.

In at least one example, the sensor system 6-102 may include a depth projector 6-112 that is generally forward facing to project electromagnetic waves (e.g., in the form of a predetermined pattern of light spots) into or within a field of view of the user and/or scene camera 6-106, or into or within a field of view that includes and exceeds the field of view of the user and/or scene camera 6-106. In at least one example, the depth projector can project electromagnetic waves of light in the form of a pattern of spot light that reflect off of the object and back into the depth sensor described above, including the depth sensors 6-108, 6-110. In at least one example, the depth projector 6-112 may be used for environment and object reconstruction and hand and body tracking.

In at least one example, the sensor system 6-102 may include a downward facing camera 6-114 with a field of view generally pointing downward in the Z-axis relative to the HDM device 6-100. In at least one example, the downward cameras 6-114 may be disposed on the left and right sides of the HMD device 6-100 as shown and used for hand and body tracking, headphone tracking, and face avatar detection and creation for displaying a user avatar on a forward display screen of the HMD device 6-100 as described elsewhere herein. For example, the downward camera 6-114 may be used to capture facial expressions and movements of the user's face, including cheeks, mouth, and chin, under the HMD device 6-100.

In at least one example, the sensor system 6-102 can include a mandibular camera 6-116. In at least one example, the mandibular cameras 6-116 may be disposed on the left and right sides of the HMD device 6-100 as shown and used for hand and body tracking, headphone tracking, and face avatar detection and creation for displaying a user avatar on a forward display screen of the HMD device 6-100 as described elsewhere herein. For example, the mandibular camera 6-116 may be used to capture facial expressions and movements of the user's face (including the user's mandible, cheek, mouth, and chin) under the HMD device 6-100. For hand and body tracking, headphone tracking, and facial avatar

In at least one example, the sensor system 6-102 may include a side camera 6-118. The side cameras 6-118 may be oriented to capture left and right side views in the X-axis or direction relative to the HMD device 6-100. In at least one example, the side cameras 6-118 may be used for hand and body tracking, headphone tracking, and face avatar detection and re-creation.

In at least one example, the sensor system 6-102 may include a plurality of eye tracking and gaze tracking sensors for determining identity, status, and gaze direction of the user's eyes during and/or prior to use. In at least one example, the eye/gaze tracking sensor may include a nose-eye camera 6-120 disposed on either side of the user's nose and adjacent to the user's nose when the HMD device 6-100 is worn. The eye/gaze sensor may also include bottom eye cameras 6-122 disposed below the respective user's eyes for capturing images of the eyes for facial avatar detection and creation, gaze tracking, and iris identification functions.

In at least one example, the sensor system 6-102 may include an infrared illuminator 6-124 directed outwardly from the HMD device 6-100 to illuminate the external environment with IR light and any objects therein for IR detection with one or more IR sensors of the sensor system 6-102. In at least one example, the sensor system 6-102 may include a scintillation sensor 6-126 and an ambient light sensor 6-128. In at least one example, flicker sensors 6-126 may detect a dome light refresh rate to avoid display flicker. In one example, the infrared illuminator 6-124 may comprise a light emitting diode, and may be particularly useful in low light environments for illuminating a user's hand and other objects in low light for detection by the infrared sensor of the sensor system 6-102.

In at least one example, multiple sensors (including scene cameras 6-106, downward cameras 6-114, mandibular cameras 6-116, side cameras 6-118, depth projectors 6-112, and depth sensors 6-108, 6-110) may be used in combination with electrically coupled controllers to combine depth data with camera data for hand tracking and for sizing for better hand tracking and object recognition and tracking functions of HMD device 6-100. In at least one example, the downward camera 6-114, the mandibular camera 6-116, and the side camera 6-118 described above and shown in fig. 1I may be wide angle cameras capable of operating in the visible and infrared spectrums. In at least one example, these cameras 6-114, 6-116, 6-118 may only work in black and white light detection to simplify image processing and obtain sensitivity.

Any of the features, components, and/or parts shown in fig. 1I (including arrangements and configurations thereof) may be included in any other examples of the devices, features, components, and parts shown in fig. 1J-1L and described herein, alone or in any combination. Likewise, any of the features, components, and/or parts shown and described with reference to fig. 1J-1L (including arrangements and configurations thereof) may be included in the examples of devices, features, components, and parts shown in fig. 1I, alone or in any combination.

Fig. 1J illustrates a lower perspective view of an example of an HMD 6-200 including a cover or shroud 6-204 secured to a frame 6-230. In at least one example, the sensors 6-203 of the sensor system 6-202 may be disposed about the perimeter of the HDM 6-200 such that the sensors 6-203 are disposed outwardly about the perimeter of the display area or area 6-232 so as not to obstruct the view of the displayed light. In at least one example, the sensor may be disposed behind the shroud 6-204 and aligned with the transparent portion of the shroud, allowing the sensor and projector to allow light to pass back and forth through the shroud 6-204. In at least one example, opaque ink or other opaque material or film/layer may be disposed on the shroud 6-204 around the display area 6-232 to hide components of the HMD 6-200 outside the display area 6-232 rather than a transparent portion defined by opaque portions through which the sensor and projector transmit and receive light and electromagnetic signals during operation. In at least one example, the shroud 6-204 allows light to pass through the display (e.g., within the display area 6-232), but does not allow light to pass radially outward from the display area around the perimeter of the display and shroud 6-204.

In some examples, the shield 6-204 includes a transparent portion 6-205 and an opaque portion 6-207, as described above and elsewhere herein. In at least one example, the opaque portion 6-207 of the shroud 6-204 may define one or more transparent regions 6-209 through which the sensors 6-203 of the sensor system 6-202 may transmit and receive signals. In the illustrated example, the sensors 6-203 of the sensor system 6-202 that transmit and receive signals through the shroud 6-204, or more specifically, through (or defined by) the transparent region 6-209 of the opaque portion 6-207 of the shroud 6-204, may include the same or similar sensors as those shown in the example of FIG. 1I, such as the depth sensors 6-108 and 6-110, the depth projector 6-112, the first and second scene cameras 6-106, the first and second downward cameras 6-114, the first and second side cameras 6-118, and the first and second infrared illuminators 6-124. These sensors are also shown in the examples of fig. 1K and 1L. Other sensors, sensor types, numbers of sensors, and their relative positioning may be included in one or more other examples of the HMD.

Any of the features, components, and/or parts shown in fig. 1J (including arrangements and configurations thereof) may be included in any other examples of the devices, features, components, and parts shown in fig. 1I and 1K-1L and described herein, alone or in any combination. Also, any of the features, components, and/or parts shown or described with reference to fig. 1I and 1K-1L (including arrangements and configurations thereof) may be included in the examples of devices, features, components, and parts shown in fig. 1J, alone or in any combination.

Fig. 1K illustrates a front view of a portion of an example of an HMD device 6-300, including a display 6-334, brackets 6-336, 6-338, and a frame or housing 6-330. The example shown in fig. 1K does not include a front cover or shroud to illustrate the brackets 6-336, 6-338. For example, the shroud 6-204 shown in FIG. 1J includes an opaque portion 6-207 that will visually overlay/block viewing of anything outside (e.g., radially/peripherally outside) the display/display area 6-334, including the sensor 6-303 and the bracket 6-338.

In at least one example, various sensors of the sensor system 6-302 are coupled to the brackets 6-336, 6-338. In at least one example, scene cameras 6-306 include tight tolerances in angle relative to each other. For example, the tolerance of the mounting angle between the two scene cameras 6-306 may be 0.5 degrees or less, such as 0.3 degrees or less. To achieve and maintain such tight tolerances, in one example, the scene camera 6-306 may be mounted to the cradle 6-338 instead of the shroud. The cradle may include a cantilever on which the scene camera 6-306 and other sensors of the sensor system 6-302 may be mounted to maintain the position and orientation unchanged in the event of a drop event resulting in any deformation of the other cradle 6-226, housing 6-330 and/or shroud by the user.

Any of the features, components, and/or parts shown in fig. 1K (including arrangements and configurations thereof) may be included in any other examples of the devices, features, components, and parts shown in fig. 1I-1J and 1L and described herein, alone or in any combination. Likewise, any of the features, components, and/or parts shown or described with reference to fig. 1I-1J and 1L (including arrangements and configurations thereof) may be included in the examples of devices, features, components, and parts shown in fig. 1K, alone or in any combination.

Fig. 1L illustrates a bottom view of an example of an HMD 6-400 including a front display/cover assembly 6-404 and a sensor system 6-402. The sensor systems 6-402 may be similar to other sensor systems described above and elsewhere herein, including as described with reference to fig. 1I-1K. In at least one example, the mandibular camera 6-416 may face downward to capture an image of the user's lower facial features. In one example, the mandibular camera 6-416 may be directly coupled to the frame or housing 6-430 or one or more internal brackets that are directly coupled to the frame or housing 6-430 as shown. The frame or housing 6-430 may include one or more holes/openings 6-415 through which the mandibular camera 6-416 may transmit and receive signals.

Any of the features, components, and/or parts shown in fig. 1L (including arrangements and configurations thereof) may be included in any other examples of the devices, features, components, and parts shown in fig. 1I-1K and described herein, alone or in any combination. Likewise, any of the features, components, and/or parts shown and described with reference to fig. 1I-1K (including arrangements and configurations thereof) may be included in the examples of devices, features, components, and parts shown in fig. 1L, alone or in any combination.

Fig. 1M illustrates a rear perspective view of an inter-pupillary distance (IPD) adjustment system 11.1.1-102 that includes first and second optical modules 11.1.1-104a-b slidably engaged/coupled to respective guide rods 11.1.1-108a-b and motors 11.1.1-110a-b of left and right adjustment subsystems 11.1.1-106 a-b. The IPD adjustment system 11.1.1-102 may be coupled to the carriage 11.1.1-112 and include buttons 11.1.1-114 in electrical communication with the motors 11.1.1-110 a-b. In at least one example, the buttons 11.1.1-114 can be in electrical communication with the first and second motors 11.1.1-110a-b via a processor or other circuit component to cause the first and second motors 11.1.1-110a-b to activate and cause the first and second optical modules 11.1.1-104a-b, respectively, to change positioning relative to one another.

In at least one example, the first and second optical modules 11.1.1-104a-b may include respective display screens configured to project light toward the eyes of the user when the HMD 11.1.1-100 is worn. In at least one example, a user can manipulate (e.g., press and/or rotate) buttons 11.1.1-114 to activate positional adjustment of optical modules 11.1.1-104a-b to match the inter-pupillary distance of the user's eyes. The optical modules 11.1.1-104a-b may also include one or more cameras or other sensor/sensor systems for imaging and measuring the user's IPD, so that the optical modules 11.1.1-104a-b may be adjusted to match the IPD.

In one example, a user may manipulate buttons 11.1.1-114 to cause automatic positioning adjustments of the first and second optical modules 11.1.1-104 a-b. In one example, the user may manipulate buttons 11.1.1-114 to cause manual adjustment so that the optical modules 11.1.1-104a-b move farther or closer (e.g., when the user rotates buttons 11.1.1-114 in one way or another) until the user visually matches her/his own IPD. In one example, the manual adjustment is communicated electronically via one or more circuits and power for moving the optical modules 11.1.1-104a-b via the motors 11.1.1-110a-b is provided by a power supply. In one example, the adjustment and movement of the optical modules 11.1.1-104a-b via the manipulation buttons 11.1.1-114 are mechanically actuated via the movement buttons 11.1.1-114.

Any of the features, components, and/or parts shown in fig. 1M (including arrangements and configurations thereof) may be included singly or in any combination in any other example of the devices, features, components, and parts shown in any other figures and described herein. Likewise, any of the features, components, and/or parts shown or described with reference to any other figure (including arrangements and configurations thereof) may be included in the examples of apparatus, features, components, and parts shown in fig. 1M, alone or in any combination.

FIG. 1N illustrates a front perspective view of a portion of the HMD 11.1.2-100, including the outer structural frames 11.1.2-102 and the inner or intermediate structural frames 11.1.2-104 defining the first apertures 11.1.2-106a and the second apertures 11.1.2-106 b. Holes 11.1.2-106a-b are shown in phantom in fig. 1N, as a view of holes 11.1.2-106a-b may be blocked by one or more other components of HMD 11.1.2-100 coupled to inner frames 11.1.2-104 and/or outer frames 11.1.2-102, as shown. In at least one example, the HMDs 11.1.2-100 can include first mounting brackets 11.1.2-108 coupled to the internal frames 11.1.2-104. In at least one example, the mounting brackets 11.1.2-108 are coupled to the inner frames 11.1.2-104 between the first and second apertures 11.1.2-106 a-b.

The mounting brackets 11.1.2-108 may include intermediate or central portions 11.1.2-109 coupled to the internal frames 11.1.2-104. In some examples, the intermediate or central portion 11.1.2-109 may not be the geometric middle or center of the brackets 11.1.2-108. Rather, intermediate/central portions 11.1.2-109 can be disposed between first and second cantilevered extension arms that extend away from intermediate portions 11.1.2-109. In at least one example, the mounting bracket 108 includes first and second cantilevers 11.1.2-112, 11.1.2-114 that extend away from the intermediate portions 11.1.2-109 of the mounting brackets 11.1.2-108 that are coupled to the inner frames 11.1.2-104.

As shown in fig. 1N, the outer frames 11.1.2-102 may define a curved geometry on their underside to accommodate the user's nose when the user wears the HMD 11.1.2-100. The curved geometry may be referred to as the nose bridge 11.1.2-111 and is centered on the underside of the HMD 11.1.2-100 as shown. In at least one example, the mounting brackets 11.1.2-108 can be connected to the inner frames 11.1.2-104 between the apertures 11.1.2-106a-b such that the cantilever arms 11.1.2-112, 11.1.2-114 extend downwardly and laterally outwardly away from the intermediate portions 11.1.2-109 to complement the nose bridge 11.1.2-111 geometry of the outer frames 11.1.2-102. In this manner, the mounting brackets 11.1.2-108 are configured to accommodate the nose of the user, as described above. The geometry of the bridge 11.1.2-111 accommodates the nose because the bridge 11.1.2-111 provides curvature that conforms to the shape of the user's nose, providing a comfortable fit from above, over, and around.

The first cantilever arms 11.1.2-112 may extend away from the intermediate portions 11.1.2-109 of the mounting brackets 11.1.2-108 in a first direction and the second cantilever arms 11.1.2-114 may extend away from the intermediate portions 11.1.2-109 of the mounting brackets 11.1.2-10 in a second direction opposite the first direction. The first and second cantilevers 11.1.2-112, 11.1.2-114 are referred to as "cantilevered" or "cantilevered" arms because each arm 11.1.2-112, 11.1.2-114 includes free distal ends 11.1.2-116, 11.1.2-118, respectively, that are not attached to the inner and outer frames 11.1.2-102, 11.1.2-104. In this manner, the arms 11.1.2-112, 11.1.2-114 are cantilevered from the intermediate portion 11.1.2-109, which may be connected to the inner frame 11.1.2-104, while the distal ends 11.1.2-102, 11.1.2-104 are unattached.

In at least one example, the HMDs 11.1.2-100 can include one or more components coupled to the mounting brackets 11.1.2-108. In one example, the assembly includes a plurality of sensors 11.1.2-110a-f. Each of the plurality of sensors 11.1.2-110a-f may include various types of sensors, including cameras, IR sensors, and the like. In some examples, one or more of the sensors 11.1.2-110a-f may be used for object recognition in three-dimensional space, such that it is important to maintain accurate relative positioning of two or more of the plurality of sensors 11.1.2-110a-f. The cantilevered nature of the mounting brackets 11.1.2-108 may protect the sensors 11.1.2-110a-f from damage and repositioning in the event of accidental dropping by a user. Because the sensors 11.1.2-110a-f are cantilevered on the arms 11.1.2-112, 11.1.2-114 of the mounting brackets 11.1.2-108, stresses and deformations of the inner and/or outer frames 11.1.2-104, 11.1.2-102 are not transferred to the cantilevered arms 11.1.2-112, 11.1.2-114 and, therefore, do not affect the relative positioning of the sensors 11.1.2-110a-f coupled/mounted to the mounting brackets 11.1.2-108.

Any of the features, components, and/or parts shown in fig. 1N (including arrangements and configurations thereof) may be included in any other example of a device, feature, component, or part described herein, alone or in any combination. Likewise, any of the features, components, and/or parts shown and described herein (including arrangements and configurations thereof) may be included in the examples of devices, features, components, and parts shown in fig. 1N, alone or in any combination.

Fig. 1O illustrates an example of an optical module 11.3.2-100 for use in an electronic device, such as an HMD, including an HDM device as described herein. As shown in one or more other examples described herein, the optical module 11.3.2-100 may be one of two optical modules within the HMD, where each optical module is aligned to project light toward the user's eye. In this way, a first optical module may project light to a first eye of a user via a display screen, and a second optical module of the same device may project light to a second eye of the user via another display screen.

In at least one example, optical modules 11.3.2-100 can include an optical frame or enclosure 11.3.2-102, which can also be referred to as a cartridge or optical module cartridge. The optical modules 11.3.2-100 may also include displays 11.3.2-104 coupled to the housings 11.3.2-102, including one or more display screens. The displays 11.3.2-104 may be coupled to the housings 11.3.2-102 such that the displays 11.3.2-104 are configured to project light toward the eyes of a user when the HMD to which the display modules 11.3.2-100 belong is worn during use. In at least one example, the housings 11.3.2-102 can surround the displays 11.3.2-104 and provide connection features for coupling other components of the optical modules described herein.

In one example, the optical modules 11.3.2-100 may include one or more cameras 11.3.2-106 coupled to the enclosures 11.3.2-102. The cameras 11.3.2-106 may be positioned relative to the displays 11.3.2-104 and the housings 11.3.2-102 such that the cameras 11.3.2-106 are configured to capture one or more images of a user's eyes during use. In at least one example, the optical modules 11.3.2-100 can also include light strips 11.3.2-108 that surround the displays 11.3.2-104. In one example, the light strips 11.3.2-108 are disposed between the displays 11.3.2-104 and the cameras 11.3.2-106. The light strips 11.3.2-108 may include a plurality of lights 11.3.2-110. The plurality of lights may include one or more Light Emitting Diodes (LEDs) or other lights configured to project light toward the eyes of the user when the HMD is worn. The individual lights 11.3.2-110 in the light strips 11.3.2-108 may be spaced around the light strips 11.3.2-108 and, thus, evenly or unevenly spaced around the displays 11.3.2-104 at various locations on the light strips 11.3.2-108 and around the displays 11.3.2-104.

In at least one example, the housings 11.3.2-102 define viewing openings 11.3.2-101 through which a user may view the displays 11.3.2-104 when the HMD device is worn. In at least one example, the LEDs are configured and arranged to emit light through the viewing openings 11.3.2-101 onto the eyes of a user. In one example, cameras 11.3.2-106 are configured to capture one or more images of a user's eyes through viewing openings 11.3.2-101.

As described above, each of the components and features of the optical modules 11.3.2-100 shown in fig. 1O may be replicated in another (e.g., second) optical module provided with the HMD to interact with the other eye of the user (e.g., project light and capture images).

Any of the features, components, and/or parts shown in fig. 1O (including arrangements and configurations thereof) may be included in any other example of the devices, features, components, and parts shown in fig. 1P or otherwise described herein, alone or in any combination. Also, any of the features, components, and/or parts (including arrangements and configurations thereof) shown or described with reference to fig. 1P or otherwise herein may be included in the examples of devices, features, components, and parts shown in fig. 1O, alone or in any combination.

FIG. 1P illustrates a cross-sectional view of an example of an optical module 11.3.2-200, including housings 11.3.2-202, display fittings 11.3.2-204 coupled to housings 11.3.2-202, and lenses 11.3.2-216 coupled to housings 11.3.2-202. In at least one example, the housing 11.3.2-202 defines a first aperture or passage 11.3.2-212 and a second aperture or passage 11.3.2-214. The channels 11.3.2-212, 11.3.2-214 may be configured to slidably engage corresponding rails or guides of the HMD device to allow the optical modules 11.3.2-200 to be adjustably positioned relative to the user's eyes to match the user's inter-pupillary distance (IPD). The housings 11.3.2-202 can slidably engage guide rods to secure the optical modules 11.3.2-200 in place within the HMD.

In at least one example, the optical modules 11.3.2-200 may also include lenses 11.3.2-216 coupled to the housings 11.3.2-202 and disposed between the display assemblies 11.3.2-204 and the eyes of the user when the HMD is worn. Lenses 11.3.2-216 may be configured to direct light from display assemblies 11.3.2-204 to the eyes of a user. In at least one example, lenses 11.3.2-216 can be part of a lens assembly, including corrective lenses that are removably attached to optical modules 11.3.2-200. In at least one example, lenses 11.3.2-216 are disposed over the light strips 11.3.2-208 and the one or more eye-tracking cameras 11.3.2-206 such that the cameras 11.3.2-206 are configured to capture images of the user's eyes through the lenses 11.3.2-216 and the light strips 11.3.2-208 include lights configured to project light through the lenses 11.3.2-216 to the user's eyes during use.

Any of the features, components, and/or parts shown in fig. 1P (including arrangements and configurations thereof) may be included in any other examples of devices, features, components, and parts described herein, alone or in any combination. Likewise, any of the features, components, and/or parts shown and described herein (including arrangements and configurations thereof) may be included in the examples of devices, features, components, and parts shown in fig. 1P, alone or in any combination.

Fig. 2 is a block diagram of an example of a controller 110 according to some embodiments. While certain specific features are illustrated, those of ordinary skill in the art will appreciate from the present disclosure that various other features are not illustrated for the sake of brevity and so as not to obscure more pertinent aspects of the embodiments disclosed herein. To this end, as a non-limiting example, in some embodiments, the controller 110 includes one or more processing units 202 (e.g., microprocessors, application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), graphics Processing Units (GPUs), central Processing Units (CPUs), processing cores, etc.), one or more input/output (I/O) devices 206, one or more communication interfaces 208 (e.g., universal Serial Bus (USB), FIREWIRE, THUNDERBOLT, IEEE 802.3x, IEEE 802.11x, IEEE 802.16x, global system for mobile communications (GSM), code Division Multiple Access (CDMA), time Division Multiple Access (TDMA), global Positioning System (GPS), infrared (IR), bluetooth, ZIGBEE, and/or similar types of interfaces), one or more programming (e.g., I/O) interfaces 210, memory 220, and one or more communication buses 204 for interconnecting these components and various other components.

In some embodiments, one or more of the communication buses 204 include circuitry that interconnects and controls communications between system components. In some embodiments, the one or more I/O devices 206 include at least one of a keyboard, a mouse, a touchpad, a joystick, one or more microphones, one or more speakers, one or more image sensors, one or more displays, and the like.

Memory 220 includes high-speed random access memory such as Dynamic Random Access Memory (DRAM), static Random Access Memory (SRAM), double data rate random access memory (DDR RAM), or other random access solid state memory devices. In some embodiments, memory 220 includes non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices. Memory 220 optionally includes one or more storage devices located remotely from the one or more processing units 202. Memory 220 includes a non-transitory computer-readable storage medium. In some embodiments, memory 220 or a non-transitory computer readable storage medium of memory 220 stores the following programs, modules, and data structures, or a subset thereof, including optional operating system 230 and XR experience module 240.

Operating system 230 includes instructions for handling various basic system services and for performing hardware-related tasks. In some embodiments, XR experience module 240 is configured to manage and coordinate single or multiple XR experiences of one or more users (e.g., single XR experiences of one or more users, or multiple XR experiences of a respective group of one or more users). To this end, in various embodiments, XR experience module 240 includes a data acquisition unit 242, a tracking unit 244, a coordination unit 246, and a data transmission unit 248.

In some implementations, the data acquisition unit 242 is configured to acquire data (e.g., presentation data, interaction data, sensor data, or location data) from at least the display generation component 120 of fig. 1A, and optionally from one or more of the input device 125, the output device 155, the sensor 190, and/or the peripheral device 195. For this purpose, in various embodiments, the data acquisition unit 242 includes instructions and/or logic for instructions as well as heuristics and metadata for heuristics.

In some implementations, tracking unit 244 is configured to map scene 105 and track at least the location/position of display generation component 120 relative to scene 105 of fig. 1A, and optionally relative to one or more of input device 125, output device 155, sensor 190, and/or peripheral device 195. For this purpose, in various embodiments, tracking unit 244 includes instructions and/or logic for instructions as well as heuristics and metadata for heuristics. In some embodiments, tracking unit 244 includes a hand tracking unit 245 and/or an eye tracking unit 243. In some embodiments, the hand tracking unit 245 is configured to track the location/position of one or more portions of the user's hand, and/or the motion of one or more portions of the user's hand relative to the scene 105 of fig. 1A, relative to the display generation component 120, and/or relative to a coordinate system defined relative to the user's hand. The hand tracking unit 245 is described in more detail below with respect to fig. 4. In some implementations, the eye tracking unit 243 is configured to track the positioning or movement of the user gaze (or more generally, the user's eyes, face, or head) relative to the scene 105 (e.g., relative to the physical environment and/or relative to the user (e.g., the user's hand)) or relative to XR content displayed via the display generation component 120. The eye tracking unit 243 is described in more detail below with respect to fig. 5.

In some embodiments, coordination unit 246 is configured to manage and coordinate XR experiences presented to a user by display generation component 120, and optionally by one or more of output device 155 and/or peripheral device 195. For this purpose, in various embodiments, coordination unit 246 includes instructions and/or logic for instructions as well as heuristics and metadata for heuristics.

In some embodiments, the data transmission unit 248 is configured to transmit data (e.g., presentation data or location data) to at least the display generation component 120, and optionally to one or more of the input device 125, the output device 155, the sensor 190, and/or the peripheral device 195. For this purpose, in various embodiments, the data transmission unit 248 includes instructions and/or logic for instructions as well as heuristics and metadata for heuristics.

While the data acquisition unit 242, tracking unit 244 (e.g., including the eye tracking unit 243 and hand tracking unit 245), coordination unit 246, and data transmission unit 248 are shown as residing on a single device (e.g., controller 110), it should be understood that in other embodiments, any combination of the data acquisition unit 242, tracking unit 244 (e.g., including the eye tracking unit 243 and hand tracking unit 245), coordination unit 246, and data transmission unit 248 may reside in a single computing device.

Furthermore, FIG. 2 is a functional description of various features that may be present in a particular implementation, as opposed to a schematic of the embodiments described herein. As will be appreciated by one of ordinary skill in the art, the individually displayed items may be combined and some items may be separated. For example, some of the functional blocks shown separately in fig. 2 may be implemented in a single block, and the various functions of a single functional block may be implemented by one or more functional blocks in various embodiments. The actual number of modules and the division of particular functions, and how features are allocated among them, will vary depending upon the particular implementation, and in some embodiments, depend in part on the particular combination of hardware, software, and/or firmware selected for a particular implementation.

Fig. 3 is a block diagram of an example of a display generation component 120 according to some embodiments. While certain specific features are illustrated, those of ordinary skill in the art will appreciate from the present disclosure that various other features are not illustrated for the sake of brevity and so as not to obscure more pertinent aspects of the embodiments disclosed herein. For this purpose, as a non-limiting example, in some embodiments, the display generation component 120 (e.g., HMD) includes one or more processing units 302 (e.g., microprocessors, ASIC, FPGA, GPU, CPU, processing cores, etc.), one or more input/output (I/O) devices and sensors 306, one or more communication interfaces 308 (e.g., USB, FIREWIRE, THUNDERBOLT, IEEE 802.3x, IEEE 802.11x, IEEE 802.16x, GSM, CDMA, TDMA, GPS, IR, bluetooth, ZIGBEE, and/or similar types of interfaces), one or more programming (e.g., I/O) interfaces 310, one or more XR displays 312, one or more optional inwardly and/or outwardly facing image sensors 314, memory 320, and one or more communication buses 304 for interconnecting these components and various other components.

In some embodiments, one or more of the communication buses 304 include circuitry for interconnecting and controlling communications between the various system components. In some implementations, the one or more I/O devices and sensors 306 include at least one of an Inertial Measurement Unit (IMU), an accelerometer, a gyroscope, a thermometer, one or more physiological sensors (e.g., a blood pressure monitor, a heart rate monitor, a blood oxygen sensor, or a blood glucose sensor), one or more microphones, one or more speakers, a haptic engine, one or more depth sensors (e.g., structured light, time of flight, etc.), and the like.

In some embodiments, one or more XR displays 312 are configured to provide an XR experience to a user. In some embodiments, one or more XR displays 312 correspond to holographic, digital Light Processing (DLP), liquid Crystal Displays (LCD), liquid crystal on silicon (lCoS), organic light emitting field effect transistors (OLET), organic Light Emitting Diodes (OLED), surface conduction electron emission displays (SED), field Emission Displays (FED), quantum dot light emitting diodes (QD-LED), microelectromechanical systems (MEMS), and/or similar display types. In some embodiments, one or more XR displays 312 correspond to a diffractive waveguide display, a reflective waveguide display, a polarized waveguide display, or a holographic waveguide display. For example, the display generation component 120 (e.g., HMD) includes a single XR display. As another example, display generation component 120 includes an XR display for each eye of the user. In some embodiments, one or more XR displays 312 are capable of presenting MR and VR content. In some implementations, one or more XR displays 312 can present MR or VR content.

In some embodiments, the one or more image sensors 314 are configured to acquire image data corresponding to at least a portion of the user's face including the user's eyes (and may be referred to as an eye tracking camera). In some embodiments, the one or more image sensors 314 are configured to acquire image data corresponding to at least a portion of a user's hand and optionally a user's arm (and may be referred to as a hand tracking camera). In some implementations, the one or more image sensors 314 are configured to face forward in order to acquire image data corresponding to a scene that a user would see in the absence of the display generating component 120 (e.g., HMD) (and may be referred to as a scene camera). The one or more optional image sensors 314 may include one or more RGB cameras (e.g., with Complementary Metal Oxide Semiconductor (CMOS) image sensors or Charge Coupled Device (CCD) image sensors), one or more Infrared (IR) cameras, and/or one or more event-based cameras, etc.

Memory 320 includes high-speed random access memory such as DRAM, SRAM, DDR RAM or other random access solid state memory devices. In some embodiments, memory 320 includes non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid state storage devices. Memory 320 optionally includes one or more storage devices located remotely from the one or more processing units 302. Memory 320 includes a non-transitory computer-readable storage medium. In some embodiments, memory 320 or a non-transitory computer readable storage medium of memory 320 stores the following programs, modules, and data structures, or a subset thereof, including optional operating system 330 and XR presentation module 340.

Operating system 330 includes processes for handling various basic system services and for performing hardware-related tasks. In some embodiments, XR presentation module 340 is configured to present XR content to a user via one or more XR displays 312. For this purpose, in various embodiments, the XR presentation module 340 includes a data acquisition unit 342, an XR presentation unit 344, an XR map generation unit 346, and a data transmission unit 348.

In some embodiments, the data acquisition unit 342 is configured to acquire data (e.g., presentation data, interaction data, sensor data, or location data) from at least the controller 110 of fig. 1. For this purpose, in various embodiments, the data acquisition unit 342 includes instructions and/or logic for instructions and heuristics and metadata for heuristics.

In some embodiments, XR presentation unit 344 is configured to present XR content via one or more XR displays 312. For this purpose, in various embodiments, XR presentation unit 344 includes instructions and/or logic for instructions and heuristics and metadata for heuristics.

In some embodiments, XR map generation unit 346 is configured to generate an XR map based on the media content data (e.g., a 3D map of a mixed reality scene or a map of a physical environment in which computer-generated objects may be placed to generate an augmented reality). For this purpose, in various embodiments, XR map generation unit 346 includes instructions and/or logic for the instructions as well as heuristics and metadata for the heuristics.

In some embodiments, the data transmission unit 348 is configured to transmit data (e.g., presentation data or location data) to at least the controller 110, and optionally to one or more of the input device 125, the output device 155, the sensor 190, and/or the peripheral device 195. For this purpose, in various embodiments, data transmission unit 348 includes instructions and/or logic for instructions and heuristics and metadata for heuristics.

While the data acquisition unit 342, the XR presentation unit 344, the XR map generation unit 346, and the data transmission unit 348 are shown as residing on a single device (e.g., the display generation assembly 120 of fig. 1), it should be understood that in other embodiments, any combination of the data acquisition unit 342, the XR presentation unit 344, the XR map generation unit 346, and the data transmission unit 348 may reside in a separate computing device.

Furthermore, fig. 3 is used more as a functional description of various features that may be present in a particular implementation, as opposed to a schematic of the embodiments described herein. As will be appreciated by one of ordinary skill in the art, the individually displayed items may be combined and some items may be separated. For example, some of the functional blocks shown separately in fig. 3 may be implemented in a single block, and the various functions of a single functional block may be implemented by one or more functional blocks in various embodiments. The actual number of modules and the division of particular functions, and how features are allocated among them, will vary depending upon the particular implementation, and in some embodiments, depend in part on the particular combination of hardware, software, and/or firmware selected for a particular implementation.

Fig. 4 is a schematic illustration of an example embodiment of a hand tracking device 140. In some embodiments, the hand tracking device 140 (fig. 1) is controlled (fig. 2) by the hand tracking unit 245 to track the position/location of one or more portions of the user's hand, and/or the movement of one or more portions of the user's hand relative to the scene 105 of fig. 1A (e.g., relative to a portion of the physical environment surrounding the user, relative to the display generation assembly 120, or relative to a portion of the user (e.g., the user's face, eyes, or head), and/or relative to a coordinate system defined relative to the user's hand). In some implementations, the hand tracking device 140 is part of the display generation component 120 (e.g., embedded in or attached to a head-mounted device). In some embodiments, the hand tracking device 140 is separate from the display generation assembly 120 (e.g., in a separate housing or attached to a separate physical support structure).

In some implementations, the hand tracking device 140 includes an image sensor 404 (e.g., one or more IR cameras, 3D cameras, depth cameras, and/or color cameras) that captures three-dimensional scene information including at least a human user's hand 406. The image sensor 404 captures the hand image with sufficient resolution to enable the finger and its corresponding location to be distinguished. The image sensor 404 typically captures images of other parts of the user's body, and possibly also all parts of the body, and may have a zoom capability or a dedicated sensor with increased magnification to capture images of the hand with a desired resolution. In some implementations, the image sensor 404 also captures 2D color video images of the hand 406 and other elements of the scene. In some implementations, the image sensor 404 is used in conjunction with other image sensors to capture the physical environment of the scene 105, or as an image sensor that captures the physical environment of the scene 105. In some embodiments, the image sensor 404, or a portion thereof, is positioned relative to the user or the user's environment in a manner that uses the field of view of the image sensor to define an interaction space in which hand movements captured by the image sensor are considered input to the controller 110.

In some embodiments, the image sensor 404 outputs a sequence of frames containing 3D map data (and, in addition, possible color image data) to the controller 110, which extracts high-level information from the map data. This high-level information is typically provided via an application interface (API) to an application running on the controller, which drives the display generation component 120 accordingly. For example, a user may interact with software running on the controller 110 by moving their hand 406 and/or changing their hand pose.

In some implementations, the image sensor 404 projects a speckle pattern onto a scene containing the hand 406 and captures an image of the projected pattern. In some implementations, the controller 110 calculates 3D coordinates of points in the scene (including points on the surface of the user's hand) by triangulation based on lateral offsets of the blobs in the pattern. This approach is advantageous because it does not require the user to hold or wear any kind of beacon, sensor or other marker. The method gives the depth coordinates of points in the scene relative to a predetermined reference plane at a specific distance from the image sensor 404. In this disclosure, it is assumed that the image sensor 404 defines an orthogonal set of x-axis, y-axis, z-axis such that the depth coordinates of points in the scene correspond to the z-component measured by the image sensor. Alternatively, the image sensor 404 (e.g., a hand tracking device) may use other 3D mapping methods, such as stereoscopic imaging or time-of-flight measurements, based on single or multiple cameras or other types of sensors.

In some implementations, the hand tracking device 140 captures and processes a time series containing a depth map of the user's hand as the user moves their hand (e.g., the entire hand or one or more fingers). Software running on the image sensor 404 and/or a processor in the controller 110 processes the 3D map data to extract image block descriptors of the hand in these depth maps. The software may match these descriptors with image block descriptors stored in database 408 based on previous learning processes in order to estimate the pose of the hand in each frame. The pose typically includes a 3D positioning of the user's hand joints and fingertips.

The software may also analyze the trajectory of the hand and/or finger over multiple frames in the sequence to identify gestures. The pose estimation functions described herein may alternate with motion tracking functions such that image block-based pose estimation is performed only once every two (or more) frames while tracking changes used to find poses that occur on the remaining frames. Pose, motion, and gesture information is provided to applications running on the controller 110 via the APIs described above. The program may move and modify images presented on the display generation component 120, for example, in response to pose and/or gesture information, or perform other functions.

In some implementations, the gesture includes an air gesture. An air gesture is a motion (including a motion of a user's body relative to an absolute reference (e.g., an angle of a user's arm relative to the ground or a distance of a user's hand relative to the ground), a motion relative to another portion of the user's body (e.g., a motion of a user's hand relative to a shoulder of a user, a motion of a user's hand relative to another hand of a user, and/or a motion of a user's finger relative to another finger or portion of a hand of a user) that is detected without the user touching an input element (or being independent of an input element that is part of a device) that is part of a device (e.g., computer system 101, one or more input devices 125, and/or hand tracking device 140), and/or an absolute motion of a portion of the user's body (e.g., including a flick gesture that moves a hand by a predetermined amount and/or velocity in a predetermined gesture that includes a predetermined position or a predetermined flick of a hand) that is a predetermined amount or a predetermined amount of a rotation of a hand of a body.

In some embodiments, according to some embodiments, the input gestures used in the various examples and embodiments described herein include air gestures performed by movement of a user's finger relative to other fingers or portions of the user's hand for interacting with an XR environment (e.g., a virtual or mixed reality environment). In some embodiments, the air gesture is a gesture that is detected without the user touching an input element that is part of the device (or independent of an input element that is part of the device) and based on a detected movement of a portion of the user's body through the air, including a movement of the user's body relative to an absolute reference (e.g., an angle of the user's arm relative to the ground or a distance of the user's hand relative to the ground), a movement relative to another portion of the user's body (e.g., a movement of the user's hand relative to the user's shoulder, a movement of the user's hand relative to the other hand of the user, and/or a movement of the user's finger relative to the other finger or part of the hand of the user), and/or an absolute movement of a portion of the user's body (e.g., a flick gesture that includes the hand moving a predetermined amount and/or speed in a predetermined gesture that includes a predetermined gesture of speed or a shake of a predetermined amount of rotation of a portion of the user's body).

In some embodiments where the input gesture is an air gesture (e.g., in the absence of physical contact with the input device, the input device provides information to the computer system as to which user interface element is the target of the user input, such as contact with a user interface element displayed on a touch screen, or contact with a mouse or touchpad to move a cursor to the user interface element), the gesture takes into account the user's attention (e.g., gaze) to determine the target of the user input (e.g., for direct input, as described below). Thus, in implementations involving air gestures, for example, an input gesture in combination (e.g., simultaneously) with movement of a user's finger and/or hand detects an attention (e.g., gaze) toward a user interface element to perform pinch and/or tap inputs, as described below.

In some implementations, an input gesture directed to a user interface object is performed with direct or indirect reference to the user interface object. For example, user input is performed directly on a user interface object according to performing an input gesture with a user's hand at a location corresponding to the location of the user interface object in a three-dimensional environment (e.g., as determined based on the user's current viewpoint). In some implementations, upon detecting a user's attention (e.g., gaze) to a user interface object, an input gesture is performed indirectly on the user interface object in accordance with a positioning of a user's hand while the user performs the input gesture not being at the positioning corresponding to the positioning of the user interface object in a three-dimensional environment. For example, for a direct input gesture, the user can direct the user's input to the user interface object by initiating a gesture at or near a location corresponding to the displayed location of the user interface object (e.g., within 0.5cm, 1cm, 5cm, or within a distance between 0cm and 5cm measured from the outer edge of the option or the center portion of the option). For indirect input gestures, a user can direct the user's input to a user interface object by focusing on the user interface object (e.g., by looking at the user interface object), and while focusing on an option, the user initiates an input gesture (e.g., at any location that is detectable by the computer system) (e.g., at a location that does not correspond to the displayed location of the user interface object).

In some embodiments, according to some embodiments, the input gestures (e.g., air gestures) used in the various examples and embodiments described herein include pinch inputs and tap inputs for interacting with a virtual or mixed reality environment. For example, pinch and tap inputs described below are performed as air gestures.

In some implementations, the pinch input is part of an air gesture that includes one or more of a pinch gesture, a long pinch gesture, a pinch-and-drag gesture, or a double pinch gesture. For example, pinch gestures as air gestures include movement of two or more fingers of a hand to contact each other, i.e., optionally, immediately followed by interruption of contact with each other (e.g., within 0 seconds to 1 second). A long pinch gesture, which is an air gesture, includes movement of two or more fingers of a hand into contact with each other for at least a threshold amount of time (e.g., at least 1 second) before interruption of contact with each other is detected. For example, a long pinch gesture includes a user holding a pinch gesture (e.g., where two or more fingers make contact), and the long pinch gesture continues until a break in contact between the two or more fingers is detected. In some implementations, a double pinch gesture that is an air gesture includes two (e.g., or more) pinch inputs (e.g., performed by the same hand) that are detected in succession with each other immediately (e.g., within a predefined period of time). For example, the user performs a first pinch input (e.g., a pinch input or a long pinch input), releases the first pinch input (e.g., breaks contact between two or more fingers), and performs a second pinch input within a predefined period of time (e.g., within 1 second or within 2 seconds) after releasing the first pinch input.

In some implementations, the pinch-and-drag gesture that is an air gesture (e.g., an air drag gesture or an air swipe gesture) includes a pinch gesture (e.g., a pinch gesture or a long pinch gesture) performed in conjunction with (e.g., following) a drag input that changes a position of a user's hand from a first position (e.g., a start position of the drag) to a second position (e.g., an end position of the drag). In some implementations, the user holds the pinch gesture while performing the drag input, and releases the pinch gesture (e.g., opens their two or more fingers) to end the drag gesture (e.g., at the second location). In some implementations, the pinch input and the drag input are performed by the same hand (e.g., a user pinch two or more fingers to contact each other and move the same hand into a second position in the air with a drag gesture). In some embodiments, pinch input is performed by a first hand of the user and drag input is performed by a second hand of the user (e.g., the second hand of the user moves in the air from a first position to a second position while the user continues to pinch input with the first hand of the user). For example, a first pinch gesture (e.g., a pinch input, a long pinch input, or a pinch and drag input) is performed using a first hand of a user, and a second pinch input is performed using the other hand (e.g., a second of the two hands of the user) in conjunction with the first hand.

In some implementations, the tap input (e.g., pointing to the user interface element) performed as an air gesture includes movement of a user's finger toward the user interface element, movement of a user's hand toward the user interface element (optionally, the user's finger extends toward the user interface element), downward movement of the user's finger (e.g., mimicking a mouse click motion or a tap on a touch screen), or other predefined movement of the user's hand. In some embodiments, a flick input performed as an air gesture is detected based on a movement characteristic of a finger or hand performing a flick gesture movement of the finger or hand away from a user's point of view and/or toward an object that is a target of the flick input, followed by an end of the movement. In some embodiments, the end of movement is detected based on a change in movement characteristics of the finger or hand performing the flick gesture (e.g., the end of movement away from the user's point of view and/or toward an object that is the target of the flick input, reversal of the direction of movement of the finger or hand, and/or reversal of the acceleration direction of movement of the finger or hand).

In some embodiments, the determination that the user's attention is directed to a portion of the three-dimensional environment is based on detection of gaze directed to that portion (optionally, without other conditions). In some embodiments, the portion of the three-dimensional environment to which the user's attention is directed is determined based on detecting a gaze directed to the portion of the three-dimensional environment with one or more additional conditions, such as requiring the gaze to be directed to the portion of the three-dimensional environment for at least a threshold duration (e.g., dwell duration) and/or requiring the gaze to be directed to the portion of the three-dimensional environment when the point of view of the user is within a distance threshold from the portion of the three-dimensional environment, such that the device determines the portion of the three-dimensional environment to which the user's attention is directed, wherein if one of the additional conditions is not met, the device determines that the attention is not directed to the portion of the three-dimensional environment to which the gaze is directed (e.g., until the one or more additional conditions are met).

In some embodiments, detection of the ready state configuration of the user or a portion of the user is detected by the computer system. Detection of a ready state configuration of a hand is used by a computer system as an indication that a user may be ready to interact with the computer system using one or more air gesture inputs (e.g., pinch, tap, pinch and drag, double pinch, long pinch, or other air gestures described herein) performed by the hand. For example, the ready state of the hand is determined based on whether the hand has a predetermined hand shape (e.g., a pre-pinch shape in which the thumb and one or more fingers extend and are spaced apart in preparation for making a pinch or grasp gesture, or a pre-flick in which the one or more fingers extend and the palm faces away from the user), based on whether the hand is in a predetermined position relative to the user's point of view (e.g., below the user's head and above the user's waist and extending at least 15cm, 20cm, 25cm, 30cm, or 50cm from the body), and/or based on whether the hand has moved in a particular manner (e.g., toward an area above the user's waist and in front of the user's head or away from the user's body or legs). In some implementations, the ready state is used to determine whether an interactive element of the user interface is responsive to an attention (e.g., gaze) input.

In a scenario where input is described with reference to an air gesture, it should be appreciated that similar gestures may be detected using a hardware input device attached to or held by one or more hands of a user, where the positioning of the hardware input device in space may be tracked using optical tracking, one or more accelerometers, one or more gyroscopes, one or more magnetometers, and/or one or more inertial measurement units, and the positioning and/or movement of the hardware input device is used instead of the positioning and/or movement of one or more hands at the corresponding air gesture. In the context of describing input with reference to a null pose, it should be appreciated that similar poses may be detected using hardware input devices attached to or held by one or more hands of a user. User input may be detected using controls contained in the hardware input device, such as one or more touch-sensitive input elements, one or more pressure-sensitive input elements, one or more buttons, one or more knobs, one or more dials, one or more joysticks, one or more hand or finger covers that detect a change in positioning or location of portions of a hand and/or finger relative to each other, relative to a user's body, and/or relative to a user's physical environment, and/or other hardware input device controls, wherein user input using controls contained in the hardware input device is used instead of a hand and/or finger gesture, such as a tap or pinch in air in a corresponding air gesture. For example, selection inputs described as being performed with an air tap or air pinch input may alternatively be detected with a button press, a tap on a touch-sensitive surface, a press on a pressure-sensitive surface, or other hardware input. As another example, a movement input described as being performed with an air pinch and drag (e.g., an air drag gesture or an air swipe gesture) may alternatively be detected based on interactions with hardware input controls, such as button presses and holds, touches on a touch-sensitive surface, presses on a pressure-sensitive surface, or other hardware inputs after movement of a hardware input device (e.g., along with a hand associated with the hardware input device) through space. Similarly, two-handed input, including movement of hands relative to each other, may be performed using one air gesture and one of the hands that is not performing the air gesture, two hardware input devices held in different hands, or two air gestures performed by different hands using various combinations of air gestures and/or inputs detected by the one or more hardware input devices.

In some embodiments, the software may be downloaded to the controller 110 in electronic form, over a network, for example, or may alternatively be provided on tangible non-transitory media, such as optical, magnetic, or electronic memory media. In some embodiments, database 408 is also stored in a memory associated with controller 110. Alternatively or additionally, some or all of the described functions of the computer may be implemented in dedicated hardware, such as a custom or semi-custom integrated circuit or a programmable Digital Signal Processor (DSP). Although the controller 110 is shown in fig. 4, for example, as a separate unit from the image sensor 404, some or all of the processing functions of the controller may be performed by a suitable microprocessor and software or by dedicated circuitry within the housing of the image sensor 404 (e.g., a hand tracking device) or other devices associated with the image sensor 404. In some embodiments, at least some of these processing functions may be performed by a suitable processor integrated with display generation component 120 (e.g., in a television receiver, handheld device, or head mounted device) or with any other suitable computerized device (such as a game console or media player). The sensing functionality of the image sensor 404 may likewise be integrated into a computer or other computerized device to be controlled by the sensor output.

Fig. 4 also includes a schematic diagram of a depth map 410 captured by the image sensor 404, according to some embodiments. As described above, the depth map comprises a matrix of pixels having corresponding depth values. The pixels 412 corresponding to the hand 406 have been segmented from the background and wrist in the figure. The brightness of each pixel within the depth map 410 is inversely proportional to its depth value (i.e., the measured z-distance from the image sensor 404), where the gray shade becomes darker with increasing depth. The controller 110 processes these depth values to identify and segment components of the image (i.e., a set of adjacent pixels) that have human hand characteristics. These characteristics may include, for example, overall size, shape, and frame-to-frame motion from a sequence of depth maps.

Fig. 4 also schematically illustrates the hand bones 414 that the controller 110 according to some embodiments ultimately extracts from the depth map 410 of the hand 406. In fig. 4, the hand skeleton 414 is superimposed over the hand background 416 that has been segmented from the original depth map. In some embodiments, key feature points of the hand and optionally on the wrist or arm connected to the hand (e.g., points corresponding to the knuckles, fingertips, palm center, or end of the hand connected to the wrist) are identified and located on the hand bones 414. In some embodiments, the controller 110 uses the positions and movements of these key feature points on the plurality of image frames to determine a hand gesture performed by the hand or a current state of the hand according to some embodiments.

Fig. 5 illustrates an example embodiment of the eye tracking device 130 (fig. 1). In some embodiments, eye tracking device 130 is controlled by eye tracking unit 243 (fig. 2) to track the positioning and movement of the user gaze relative to scene 105 or relative to XR content displayed via display generation component 120. In some embodiments, the eye tracking device 130 is integrated with the display generation component 120. For example, in some embodiments, when display generation component 120 is a head-mounted device (such as a headset, helmet, goggles, or glasses) or a handheld device placed in a wearable frame, the head-mounted device includes both components that generate XR content for viewing by a user and components for tracking the user's gaze with respect to the XR content. In some embodiments, the eye tracking device 130 is separate from the display generation component 120. For example, when the display generating component is a handheld device or an XR chamber, the eye tracking device 130 is optionally a device separate from the handheld device or XR chamber. In some embodiments, the eye tracking device 130 is a head mounted device or a portion of a head mounted device. In some embodiments, the head-mounted eye tracking device 130 is optionally used in conjunction with a display generating component that is also head-mounted or a display generating component that is not head-mounted. In some embodiments, the eye tracking device 130 is not a head mounted device and is optionally used in conjunction with a head mounted display generating component. In some embodiments, the eye tracking device 130 is not a head mounted device and, optionally, is part of a non-head mounted display generating component.

In some embodiments, the display generation component 120 uses a display mechanism (e.g., a left near-eye display panel and a right near-eye display panel) to display frames including left and right images in front of the user's eyes, thereby providing a 3D virtual view to the user. For example, the head mounted display generation assembly may include left and right optical lenses (referred to herein as eye lenses) located between the display and the user's eyes. In some embodiments, the display generation component may include or be coupled to one or more external cameras that capture video of the user's environment for display. In some embodiments, the head mounted display generating component may have a transparent or translucent display and the virtual object is displayed on the transparent or translucent display through which the user may directly view the physical environment. In some implementations, the display generation component projects the virtual object into the physical environment. The virtual object may be projected, for example, on a physical surface or as a hologram, such that an individual uses the system to observe the virtual object superimposed over the physical environment. In this case, separate display panels and image frames for the left and right eyes may not be required.

As shown in fig. 5, in some embodiments, the eye tracking device 130 (e.g., a gaze tracking device) includes at least one eye tracking camera (e.g., an Infrared (IR) or Near Infrared (NIR) camera) and an illumination source (e.g., an IR or NIR light source, such as an array or ring of LEDs) that emits light (e.g., IR or NIR light) toward the user's eye. The eye-tracking camera may be directed toward the user's eye to receive IR or NIR light reflected directly from the eye by the light source, or alternatively may be directed toward "hot" mirrors located between the user's eye and the display panel that reflect IR or NIR light from the eye to the eye-tracking camera while allowing visible light to pass through. The eye tracking device 130 optionally captures images of the user's eyes (e.g., as a video stream captured at 60-120 frames per second (fps)), analyzes the images to generate gaze tracking information, and communicates the gaze tracking information to the controller 110. In some embodiments, both eyes of the user are tracked separately by the respective eye tracking camera and illumination source. In some embodiments, only one eye of the user is tracked by the respective eye tracking camera and illumination source.

In some embodiments, the eye tracking device 130 is calibrated using a device-specific calibration process to determine parameters of the eye tracking device for the particular operating environment 100, such as 3D geometry and parameters of LEDs, cameras, hot mirrors (if present), eye lenses, and display screens. The device-specific calibration procedure may be performed at the factory or another facility prior to delivering the AR/VR equipment to the end user. The device-specific calibration process may be an automatic calibration process or a manual calibration process. The user-specific calibration procedure may include an estimation of eye parameters of a particular user, such as pupil position, foveal position, optical axis, visual axis, or eye distance. According to some embodiments, once the device-specific parameters and the user-specific parameters are determined for the eye-tracking device 130, the images captured by the eye-tracking camera may be processed using a flash-assist method to determine the current visual axis and gaze point of the user relative to the display.

As shown in fig. 5, the eye tracking device 130 (e.g., 130A or 130B) includes an eye lens 520 and a gaze tracking system including at least one eye tracking camera 540 (e.g., an Infrared (IR) or Near Infrared (NIR) camera) positioned on a side of the user's face on which eye tracking is performed, and an illumination source 530 (e.g., an IR or NIR light source such as an array or ring of NIR Light Emitting Diodes (LEDs)) that emits light (e.g., IR or NIR light) toward the user's eyes 592. The eye-tracking camera 540 may be directed at mirrors 550 (which reflect IR or NIR light from the eye 592 while allowing visible light to pass) between the user's eye 592 and the display 510 (e.g., a left or right display panel of a head mounted display, or a display of a handheld device, or projector), or alternatively may be directed at the user's eye 592 to receive reflected IR or NIR light from the eye 592 (e.g., as shown in the top portion of fig. 5).

In some implementations, the controller 110 renders AR or VR frames 562 (e.g., left and right frames for left and right display panels) and provides the frames 562 to the display 510. The controller 110 uses the gaze tracking input 542 from the eye tracking camera 540 for various purposes, such as for processing the frames 562 for display. The controller 110 optionally estimates the gaze point of the user on the display 510 based on gaze tracking input 542 acquired from the eye tracking camera 540 using a flash assist method or other suitable method. The gaze point estimated from the gaze tracking input 542 is optionally used to determine the direction in which the user is currently looking.

Several possible use cases of the current gaze direction of the user are described below and are not intended to be limiting. As an example use case, the controller 110 may render virtual content differently based on the determined direction of the user's gaze. For example, the controller 110 may generate virtual content in a foveal region determined according to a current gaze direction of the user at a higher resolution than in a peripheral region. As another example, the controller may position or move virtual content in the view based at least in part on the user's current gaze direction. As another example, the controller may display particular virtual content in the view based at least in part on the user's current gaze direction. As another example use case in an AR application, the controller 110 may direct an external camera used to capture the physical environment of the XR experience to focus in the determined direction. The autofocus mechanism of the external camera may then focus on an object or surface in the environment that the user is currently looking at on display 510. As another example use case, the eye lens 520 may be a focusable lens, and the controller uses the gaze tracking information to adjust the focus of the eye lens 520 such that the virtual object that the user is currently looking at has the appropriate vergence to match the convergence of the user's eyes 592. The controller 110 may utilize the gaze tracking information to direct the eye lens 520 to adjust the focus such that the approaching object the user is looking at appears at the correct distance.

In some embodiments, the eye tracking device is part of a head mounted device that includes a display (e.g., display 510), two eye lenses (e.g., eye lens 520), an eye tracking camera (e.g., eye tracking camera 540), and a light source (e.g., light source 530 (e.g., IR or NIR LED)) mounted in a wearable housing. The light source emits light (e.g., IR or NIR light) toward the user's eye 592. In some embodiments, the light sources may be arranged in a ring or circle around each of the lenses, as shown in fig. 5. In some embodiments, for example, eight light sources 530 (e.g., LEDs) are arranged around each lens 520. However, more or fewer light sources 530 may be used, and other arrangements and locations of light sources 530 may be used.

In some implementations, the display 510 emits light in the visible range and does not emit light in the IR or NIR range, and thus does not introduce noise in the gaze tracking system. Note that the position and angle of the eye tracking camera 540 is given by way of example and is not intended to be limiting. In some implementations, a single eye tracking camera 540 is located on each side of the user's face. In some implementations, two or more NIR cameras 540 may be used on each side of the user's face. In some implementations, a camera 540 with a wider field of view (FOV) and a camera 540 with a narrower FOV may be used on each side of the user's face. In some implementations, a camera 540 operating at one wavelength (e.g., 850 nm) and a camera 540 operating at a different wavelength (e.g., 940 nm) may be used on each side of the user's face.

The embodiment of the gaze tracking system as illustrated in fig. 5 may be used, for example, in computer-generated reality, virtual reality, and/or mixed reality applications to provide a computer-generated reality, virtual reality, augmented reality, and/or augmented virtual experience to a user.

Fig. 6 illustrates a flash-assisted gaze tracking pipeline in accordance with some embodiments. In some implementations, the gaze tracking pipeline is implemented by a glint-assisted gaze tracking system (e.g., eye tracking device 130 as illustrated in fig. 1 and 5). The flash-assisted gaze tracking system may maintain a tracking state. Initially, the tracking state is off or "no". When in the tracking state, the glint-assisted gaze tracking system uses previous information from a previous frame when analyzing the current frame to track pupil contours and glints in the current frame. When not in the tracking state, the glint-assisted gaze tracking system attempts to detect pupils and glints in the current frame and, if successful, initializes the tracking state to "yes" and continues with the next frame in the tracking state.

As shown in fig. 6, the gaze tracking camera may capture left and right images of the left and right eyes of the user. The captured image is then input to the gaze tracking pipeline for processing beginning at 610. As indicated by the arrow returning to element 600, the gaze tracking system may continue to capture images of the user's eyes, for example, at a rate of 60 frames per second to 120 frames per second. In some embodiments, each set of captured images may be input to a pipeline for processing. However, in some embodiments or under some conditions, not all captured frames are pipelined.

At 610, for the currently captured image, if the tracking state is yes, the method proceeds to element 640. At 610, if the tracking state is no, the image is analyzed to detect a user's pupil and glints in the image, as indicated at 620. At 630, if the pupil and glints are successfully detected, the method proceeds to element 640. Otherwise, the method returns to element 610 to process the next image of the user's eye.

At 640, if proceeding from element 610, the current frame is analyzed to track pupils and glints based in part on previous information from the previous frame. At 640, if proceeding from element 630, a tracking state is initialized based on the pupil and flash detected in the current frame. The results of the processing at element 640 are checked to verify that the results of the tracking or detection may be trusted. For example, the results may be checked to determine if the pupil and a sufficient number of flashes for performing gaze estimation are successfully tracked or detected in the current frame. If the result is unlikely to be authentic at 650, then the tracking state is set to no at element 660 and the method returns to element 610 to process the next image of the user's eye. At 650, if the result is trusted, the method proceeds to element 670. At 670, the tracking state is set to yes (if not already yes) and pupil and glint information is passed to element 680 to estimate the gaze point of the user.

Fig. 6 is intended to serve as one example of an eye tracking technique that may be used in a particular implementation. As will be appreciated by one of ordinary skill in the art, other eye tracking techniques, currently existing or developed in the future, may be used in place of or in combination with the glint-assisted eye tracking techniques described herein in computer system 101 for providing an XR experience to a user, according to various embodiments.

In some implementations, the captured portion of the real-world environment 602 is used to provide an XR experience to the user, such as a mixed reality environment with one or more virtual objects superimposed over a representation of the real-world environment 602.

Thus, the description herein describes some embodiments of a three-dimensional environment (e.g., an XR environment) that includes a representation of a real-world object and a representation of a virtual object. For example, the three-dimensional environment optionally includes a representation of a table present in the physical environment that is captured and displayed in the three-dimensional environment (e.g., actively displayed via a camera and display of the computer system or passively displayed via a transparent or translucent display of the computer system). As previously described, the three-dimensional environment is optionally a mixed reality system, wherein the three-dimensional environment is based on a physical environment captured by one or more sensors of the computer system and displayed via a display generation component. As a mixed reality system, the computer system is optionally capable of selectively displaying portions and/or objects of the physical environment such that the respective portions and/or objects of the physical environment appear as if they were present in the three-dimensional environment displayed by the computer system. Similarly, the computer system is optionally capable of displaying the virtual object in the three-dimensional environment by placing the virtual object at a respective location in the three-dimensional environment having a corresponding location in the real world to appear as if the virtual object is present in the real world (e.g., physical environment). For example, the computer system optionally displays a vase so that the vase appears as if the real vase were placed on top of a desk in a physical environment. In some implementations, respective locations in the three-dimensional environment have corresponding locations in the physical environment. Thus, when the computer system is described as displaying a virtual object at a corresponding location relative to a physical object (e.g., such as a location at or near a user's hand or a location at or near a physical table), the computer system displays the virtual object at a particular location in the three-dimensional environment such that it appears as if the virtual object were at or near a physical object in the physical environment (e.g., the virtual object is displayed in the three-dimensional environment at a location corresponding to the location in the physical environment where the virtual object would be displayed if the virtual object were a real object at the particular location).

In some implementations, real world objects present in a physical environment that are displayed in a three-dimensional environment (e.g., and/or visible via a display generation component) can interact with virtual objects that are present only in the three-dimensional environment. For example, a three-dimensional environment may include a table and a vase placed on top of the table, where the table is a view (or representation) of a physical table in a physical environment, and the vase is a virtual object.

In a three-dimensional environment (e.g., a real environment, a virtual environment, or an environment that includes a mixture of real and virtual objects), the objects are sometimes referred to as having a depth or simulated depth, or the objects are referred to as being visible, displayed, or placed at different depths. In this context, depth refers to a dimension other than height or width. In some implementations, the depth is defined relative to a fixed set of coordinates (e.g., where the room or object has a height, depth, and width defined relative to the fixed set of coordinates). In some embodiments, the depth is defined relative to the user's location or viewpoint, in which case the depth dimension varies based on the location of the user and/or the location and angle of the user's viewpoint. In some embodiments in which depth is defined relative to a user's location relative to a surface of the environment (e.g., a floor of the environment or a surface of the ground), objects that are farther from the user along a line extending parallel to the surface are considered to have a greater depth in the environment, and/or the depth of objects is measured along an axis extending outward from the user's location and parallel to the surface of the environment (e.g., depth is defined in a cylindrical or substantially cylindrical coordinate system in which the user's location is centered on a cylinder extending from the user's head toward the user's foot). In some embodiments in which depth is defined relative to a user's point of view (e.g., relative to a direction of a point in space that determines which portion of the environment is visible via a head-mounted device or other display), objects that are farther from the user's point of view along a line extending parallel to the user's point of view are considered to have greater depth in the environment, and/or the depth of the objects is measured along an axis that extends from the user's point of view and outward along a line extending parallel to the direction of the user's point of view (e.g., depth is defined in a spherical or substantially spherical coordinate system in which the origin of the point of view is at the center of a sphere extending outward from the user's head). In some implementations, the depth is defined relative to a user interface container (e.g., a window or application in which the application and/or system content is displayed), where the user interface container has a height and/or width, and the depth is a dimension orthogonal to the height and/or width of the user interface container. In some embodiments, where the depth is defined relative to the user interface container, the height and/or width of the container is generally orthogonal or substantially orthogonal to a line extending from a user-based location (e.g., a user's point of view or a user's location) to the user interface container (e.g., a center of the user interface container or another characteristic point of the user interface container) when the container is placed in a three-dimensional environment or initially displayed (e.g., such that the depth dimension of the container extends outwardly away from the user or the user's point of view). In some embodiments, where depth is defined relative to a user interface container, the depth of an object relative to the user interface container refers to the positioning of the object along the depth dimension of the user interface container. In some implementations, the plurality of different containers may have different depth dimensions (e.g., different depth dimensions extending away from the user or the viewpoint of the user in different directions and/or from different origins). In some embodiments, when depth is defined relative to a user interface container, the direction of the depth dimension remains constant for the user interface container as the position of the user interface container, the user, and/or the point of view of the user changes (e.g., or when multiple different viewers are viewing the same container in a three-dimensional environment, such as during an in-person collaboration session and/or when multiple participants are in a real-time communication session with shared virtual content including the container). In some embodiments, for curved containers (e.g., including containers having curved surfaces or curved content areas), the depth dimension optionally extends into the surface of the curved container. In some cases, z-spacing (e.g., spacing of two objects in the depth dimension), z-height (e.g., distance of one object from another object in the depth dimension), z-positioning (e.g., positioning of one object in the depth dimension), z-depth (e.g., positioning of one object in the depth dimension), or simulated z-dimension (e.g., depth serving as a dimension of an object, dimension of an environment, direction in space, and/or direction in simulated space) are used to refer to the concept of depth as described above.

In some embodiments, the user is optionally able to interact with the virtual object in the three-dimensional environment using one or both hands as if the virtual object were a real object in the physical environment. For example, as described above, the one or more sensors of the computer system optionally capture one or more hands of the user and display a representation of the user's hands in a three-dimensional environment (e.g., in a manner similar to displaying real world objects in the three-dimensional environment described above), or in some embodiments, the user's hands may be visible via the display generating component via the ability to see the physical environment through the user interface due to the transparency/translucency of a portion of the user interface being displayed by the display generating component, or due to the projection of the user interface onto a transparent/translucent surface or the projection of the user interface onto the user's eye or into the field of view of the user's eye. Thus, in some embodiments, the user's hands are displayed at respective locations in the three-dimensional environment and are considered as if they were objects in the three-dimensional environment, which are capable of interacting with virtual objects in the three-dimensional environment as if they were physical objects in the physical environment. In some embodiments, the computer system is capable of updating a display of a representation of a user's hand in a three-dimensional environment in conjunction with movement of the user's hand in the physical environment.

In some of the embodiments described below, the computer system is optionally capable of determining a "valid" distance between a physical object in the physical world and a virtual object in the three-dimensional environment, e.g., for determining whether the physical object is directly interacting with the virtual object (e.g., whether a hand is touching, grabbing, holding, etc., the virtual object or is within a threshold distance of the virtual object). For example, the hands directly interacting with the virtual object optionally include one or more of a finger of the hand pressing a virtual button, a hand of the user grabbing a virtual vase, a user interface of the user's hands together and pinching/holding the application, and two fingers performing any other type of interaction described herein. For example, the computer system optionally determines a distance between the user's hand and the virtual object when determining whether the user is interacting with the virtual object and/or how the user is interacting with the virtual object. In some embodiments, the computer system determines the distance between the user's hand and the virtual object by determining a distance between the position of the hand in the three-dimensional environment and the position of the virtual object of interest in the three-dimensional environment. For example, the one or more hands of the user are located at a particular location in the physical world, and the computer system optionally captures the one or more hands and displays the one or more hands at a particular corresponding location in the three-dimensional environment (e.g., a location where the hand would be displayed in the three-dimensional environment if the hand were a virtual hand instead of a physical hand). The positioning of the hand in the three-dimensional environment is optionally compared with the positioning of the virtual object of interest in the three-dimensional environment to determine the distance between the one or more hands of the user and the virtual object. In some embodiments, the computer system optionally determines the distance between the physical object and the virtual object by comparing locations in the physical world (e.g., rather than comparing locations in a three-dimensional environment). For example, when determining a distance between one or more hands of a user and a virtual object, the computer system optionally determines a corresponding location of the virtual object in the physical world (e.g., a location in the physical world where the virtual object would be if the virtual object were a physical object instead of a virtual object), and then determines a distance between the corresponding physical location and the one or more hands of the user. In some implementations, the same technique is optionally used to determine the distance between any physical object and any virtual object. Thus, as described herein, when determining whether a physical object is in contact with a virtual object or whether the physical object is within a threshold distance of the virtual object, the computer system optionally performs any of the techniques described above to map the location of the physical object to a three-dimensional environment and/or map the location of the virtual object to a physical environment.

In some implementations, the same or similar techniques are used to determine where and where the user's gaze is directed, and/or where and where a physical stylus held by the user is directed. For example, if the user's gaze is directed to a particular location in the physical environment, the computer system optionally determines a corresponding location in the three-dimensional environment (e.g., a virtual location of the gaze), and if the virtual object is located at the corresponding virtual location, the computer system optionally determines that the user's gaze is directed to the virtual object. Similarly, the computer system is optionally capable of determining a direction in which the physical stylus is pointing in the physical environment based on the orientation of the physical stylus. In some embodiments, based on the determination, the computer system determines a corresponding virtual location in the three-dimensional environment corresponding to a location in the physical environment at which the stylus is pointing, and optionally determines that the stylus is pointing at the corresponding virtual location in the three-dimensional environment.

Similarly, embodiments described herein may refer to a location of a user (e.g., a user of a computer system) in a three-dimensional environment and/or a location of a computer system in a three-dimensional environment. In some embodiments, a user of a computer system is holding, wearing, or otherwise located at or near the computer system. Thus, in some embodiments, the location of the computer system serves as a proxy for the location of the user. In some embodiments, the location of the computer system and/or user in the physical environment corresponds to a corresponding location in the three-dimensional environment. For example, the location of the computer system will be a location in the physical environment (and its corresponding location in the three-dimensional environment) from which the user would see the objects in the physical environment at the same location, orientation, and/or size (e.g., in absolute terms and/or relative to each other) as the objects displayed by or visible in the three-dimensional environment via the display generating component of the computer system if the user were standing at the location facing the respective portion of the physical environment visible via the display generating component. Similarly, if the virtual objects displayed in the three-dimensional environment are physical objects in the physical environment (e.g., physical objects placed in the physical environment at the same locations in the three-dimensional environment as those virtual objects, and physical objects in the physical environment having the same size and orientation as in the three-dimensional environment), then the location of the computer system and/or user is the location from which the user will see the virtual objects in the physical environment that are in the same location, orientation, and/or size (e.g., absolute sense and/or relative to each other and real world objects) as the virtual objects displayed in the three-dimensional environment by the display generation component of the computer system.

In this disclosure, various input methods are described with respect to interactions with a computer system. When one input device or input method is used to provide an example and another input device or input method is used to provide another example, it should be understood that each example may be compatible with and optionally utilize the input device or input method described with respect to the other example. Similarly, various output methods are described with respect to interactions with a computer system. When one output device or output method is used to provide an example and another output device or output method is used to provide another example, it should be understood that each example may be compatible with and optionally utilize the output device or output method described with respect to the other example. Similarly, the various methods are described with respect to interactions with a virtual environment or mixed reality environment through a computer system. When examples are provided using interactions with a virtual environment, and another example is provided using a mixed reality environment, it should be understood that each example may be compatible with and optionally utilize the methods described with respect to the other example. Thus, the present disclosure discloses embodiments that are combinations of features of multiple examples, without the need to list all features of the embodiments in detail in the description of each example embodiment.

User interface and associated process

Attention is now directed to embodiments of user interfaces ("UIs") and associated processes that may be implemented on a computer system, such as a portable multifunction device or a head-mounted device, in communication with a display generation component, one or more input devices, and optionally one or more cameras.

Fig. 7A-7O, 8A-8G, 9A-9D, 10A-10D, 11A-11F, 12A-12G, and 19A-19P illustrate a three-dimensional environment visible via a display generation component (e.g., display generation component 7100-t, or display generation component 120) of a computer system (e.g., computer system 101), as well as interactions occurring in the three-dimensional environment caused by user input directed to the three-dimensional environment and/or input received from other computer systems and/or sensors. In some implementations, the input is directed to the virtual object within the three-dimensional environment by a user gaze detected in an area occupied by the virtual object or by a hand gesture performed at a location in the physical environment corresponding to the area of the virtual object. In some implementations, the input is directed to the virtual object within the three-dimensional environment by a hand gesture performed (e.g., optionally, at a location in the physical environment that is independent of the area of the virtual object in the three-dimensional environment) when the virtual object has an input focus (e.g., when the virtual object has been selected by a gaze input that is detected concurrently and/or previously, by a pointer input that is detected concurrently or previously, and/or by a gesture input that is detected concurrently and/or previously). In some implementations, the input is directed to a virtual object within the three-dimensional environment by an input device that has positioned a focus selector object (e.g., a pointer object or a selector object) at the location of the virtual object. In some implementations, the input is directed to a virtual object within the three-dimensional environment via other components (e.g., voice and/or control buttons). In some embodiments, the input is directed to the physical object or a representation of a virtual object corresponding to the physical object by user hand movement (e.g., whole hand movement in a respective gesture, movement of one portion of the user's hand relative to another portion of the hand, and/or relative movement between the hands) and/or manipulation relative to the physical object (e.g., touch, swipe, tap, open, move-toward, and/or relative movement). In some embodiments, the computer system displays some changes to the three-dimensional environment (e.g., displays additional virtual content, stops displaying existing virtual content, and/or transitions between displaying different immersion levels of visual content) based on inputs from sensors (e.g., image sensors, temperature sensors, biometric sensors, motion sensors, and/or proximity sensors) and contextual conditions (e.g., location, time, and/or presence of other people in the environment). In some embodiments, the computer system displays some changes in the three-dimensional environment (e.g., displays additional virtual content, stops displaying existing virtual content, and/or transitions between different immersion levels of displaying visual content) based on input from other computers used by other users sharing the computer-generated environment with users of the computer system (e.g., in a shared computer-generated experience, in a shared virtual environment, and/or in a shared virtual or augmented reality environment of a communication session). In some embodiments, the computer system displays some changes in the three-dimensional environment (e.g., displays movements, deformations, and/or changes in visual characteristics of the user interface, virtual surface, user interface object, and/or virtual landscape) based on input from sensors that detect movements of other people and objects and movements of the user that may not meet the criteria of the identified gesture input as triggering the associated operation of the computer system.

In some implementations, the three-dimensional environment visible via the display generation component described herein is a virtual three-dimensional environment that includes virtual objects and content at different virtual locations in the three-dimensional environment without a representation of the physical environment. In some embodiments, the three-dimensional environment is a mixed reality environment that displays virtual objects at different virtual locations in the three-dimensional environment that are constrained by one or more physical aspects of the physical environment (e.g., the location and orientation of walls, floors, surfaces, the direction of gravity, time of day, and/or spatial relationships between physical objects). In some embodiments, the three-dimensional environment is an augmented reality environment that includes a representation of a physical environment. In some embodiments, the representations of the physical environment include respective representations of the physical objects and surfaces at different locations in the three-dimensional environment such that spatial relationships between the different physical objects and surfaces in the physical environment are reflected by spatial relationships between the representations of the physical objects and surfaces in the three-dimensional environment. In some embodiments, when a virtual object is placed relative to the positioning of a representation of a physical object and a surface in a three-dimensional environment, the virtual object appears to have a corresponding spatial relationship to the physical object and the surface in the physical environment. In some embodiments, the computer system transitions between displaying different types of environments based on user input and/or contextual conditions (e.g., transitions between rendering computer-generated environments or experiences with different levels of immersion, adjusting the relative salience of audio/visual sensory input from the virtual content and from the representation of the physical environment).

In some embodiments, the display generation component includes a passthrough portion in which a representation of the physical environment is displayed. In some implementations, the transparent portion of the display generating component is a transparent or translucent (e.g., see-through) portion of the display generating component that reveals at least a portion of the physical environment around the user or within the user's field of view. For example, the transparent portion is a portion of the head-mounted display or head-up display that is made translucent (e.g., less than 50%, 40%, 30%, 20%, 15%, 10%, or 5% opacity) or transparent so that the user can view the real world around the user through it without removing the head-mounted display or moving away from the head-up display. In some embodiments, the transparent portion gradually transitions from translucent or transparent to completely opaque when displaying a virtual or mixed reality environment. In some implementations, the pass-through portion of the display generation component displays a live feed of images or video of at least a portion of the physical environment captured by one or more cameras (e.g., a rear-facing camera of a mobile device or associated with a head-mounted display, or other camera that feeds image data to a computer system). In some embodiments, the one or more cameras are directed at a portion of the physical environment that is directly in front of the user's eyes (e.g., behind the display generation component relative to the user of the display generation component). In some embodiments, the one or more cameras are directed at a portion of the physical environment that is not directly in front of the user's eyes (e.g., in a different physical environment, or at the side or rear of the user).

In some implementations, when virtual objects are displayed at locations corresponding to the locations of one or more physical objects in a physical environment (e.g., at locations in a virtual reality environment, a mixed reality environment, or an augmented reality environment), at least some of the virtual objects are displayed in place of (e.g., in place of) a portion of a real-time view of a camera (e.g., a portion of the physical environment captured in the real-time view). In some implementations, at least some of the virtual objects and content are projected onto a physical surface or empty space in the physical environment and are visible through the transparent portion of the display generating component (e.g., visible as part of a camera view of the physical environment or visible through a transparent or translucent portion of the display generating component). In some implementations, at least some of the virtual objects and virtual content are displayed to overlay portions of the display and to obscure at least a portion of the view of the physical environment that is visible through the transparent or translucent portion of the display generating component.

In some embodiments, the display generation component displays different views of the three-dimensional environment according to user input or movement that changes the viewpoint of the currently displayed view of the three-dimensional environment relative to the virtual positioning of the three-dimensional environment. In some implementations, when the three-dimensional environment is a virtual environment, the point of view moves according to a navigation or motion request (e.g., an air hand gesture and/or a gesture performed by movement of one portion of the hand relative to another portion of the hand) without requiring movement of the user's head, torso, and/or display generating components in the physical environment. In some embodiments, movement of the user's head and/or torso, and/or movement of the display generating component or other position sensing element of the computer system (e.g., due to the user holding the display generating component or wearing the HMD) relative to the physical environment results in a corresponding movement (e.g., with a corresponding movement direction, movement distance, movement speed, and/or orientation change) of the viewpoint relative to the three-dimensional environment, resulting in a corresponding change in the current display view of the three-dimensional environment. In some embodiments, when the virtual object has a preset spatial relationship with respect to the viewpoint (e.g., is anchored or fixed to the viewpoint), movement of the viewpoint with respect to the three-dimensional environment will cause movement of the virtual object with respect to the three-dimensional environment while maintaining the positioning of the virtual object in the field of view (e.g., the virtual object is said to be head-locked). In some embodiments, the virtual object is physically locked to the user and moves relative to the three-dimensional environment as the user moves in the physical environment as a whole (e.g., carries or wears the display generating component and/or other position sensing components of the computer system), but will not move in the three-dimensional environment in response to individual user head movements (e.g., the display generating component and/or other position sensing components of the computer system rotate about a fixed position of the user in the physical environment). In some embodiments, the virtual object is optionally locked to another portion of the user, such as the user's hand or the user's wrist, and moves in the three-dimensional environment according to movement of the portion of the user in the physical environment to maintain a preset spatial relationship between the location of the virtual object and the virtual location of the portion of the user in the three-dimensional environment. In some embodiments, the virtual object is locked to a preset portion of the field of view provided by the display generation component and moves in a three-dimensional environment according to movement of the field of view, independent of movement of the user that does not cause a change in the field of view.

In some embodiments, as shown in fig. 7A-7O, 8A-8G, 9A-9D, 10A-10D, 11A-11F, 12A-12G, and 19A-19P, representations of a user's hands, arms, and/or wrists are included in the view of the three-dimensional environment. In some embodiments, the representation of the user's hand, arm, and/or wrist is included in a view of the three-dimensional environment as part of the representation of the physical environment provided via the display generation component. In some embodiments, these representations are not part of the representation of the physical environment and are captured separately (e.g., pointed at the user's hand, arm, and wrist by one or more cameras) and displayed in the three-dimensional environment independent of the currently displayed view of the three-dimensional environment. In some embodiments, these representations include camera images captured by one or more cameras of the computer system or stylized versions of the arm, wrist, and/or hand based on information captured by the various sensors. In some embodiments, these representations replace a display of, overlay on, or block a view of a portion of the representation of the physical environment. In some embodiments, when the display generation component does not provide a view of the physical environment and provides a full virtual environment (e.g., no camera view and no transparent passthrough portion), a real-time visual representation (e.g., a stylized representation or a segmented camera image) of one or both arms, wrists, and/or hands of the user is optionally still displayed in the virtual environment. In some embodiments, if no representation of the user's hand is provided in the view of the three-dimensional environment, a location corresponding to the user's hand is optionally indicated in the three-dimensional environment, e.g., by changing the appearance of the virtual content (e.g., by translucency and/or simulating changes in reflectivity) at a location in the three-dimensional environment corresponding to the location of the user's hand in the physical environment. In some embodiments, the representation of the user's hand or wrist is outside of the currently displayed view of the three-dimensional environment, while the virtual location in the three-dimensional environment corresponding to the position of the user's hand or wrist is outside of the current field of view provided via the display generating component, and the representation of the user's hand or wrist is made visible in the view of the three-dimensional environment in response to the virtual location corresponding to the position of the user's hand or wrist moving within the current field of view due to the display generating component, the user's hand or wrist, the user's head, and/or the user's movement as a whole.

Fig. 7A to 7O illustrate examples of displaying a main menu user interface within a three-dimensional environment. FIG. 13 is a flow diagram of an example method 1300 for displaying a main menu user interface within a three-dimensional environment. The user interfaces in fig. 7A to 7O are used to illustrate the processes described below, including the process in fig. 13.

FIG. 7A illustrates an example physical environment 7000 including a user 7002 interacting with the computer system 101. As shown in the examples in fig. 7A to 7O, the display generating component 7100 of the computer system 101 is a touch screen operated by the user 7002. The physical environment 700 includes physical walls 7004, 7006 and a floor 7008. The physical environment 7000 also includes a physical object 7014, e.g., a table. The user 7002 is holding the display generation assembly 7100 with the hand 7020 or the hand 7022, or both. In some embodiments, the display generating component of computer system 101 is a head mounted display worn on the head of user 7002 (e.g., the content shown in fig. 7A-7O as visible via display generating component 7100 of computer system 101 corresponds to the field of view of user 7002 when wearing the head mounted display). In some embodiments, the display generation component is a stand-alone display, a projector, or another type of display. In some embodiments, the computer system communicates with one or more input devices including cameras or other sensors and input devices that detect movement of a user's hand, movement of the user's entire body, and/or movement of the user's head in a physical environment. In some embodiments, the one or more input devices detect movement and current pose, orientation, and positioning of a user's hands, face, and/or whole body. For example, in some implementations, when the user's hand 7020 is within the field of view of one or more sensors of the HMD 7100a (e.g., within the field of view of the user), the representation of the user's hand 7020 'is displayed in a user interface displayed on a display of the HMD 7100a (e.g., as a passthrough representation and/or as a virtual representation of the user's hand 7020). In some implementations, when the user's hand 7022 is within the field of view of the one or more sensors of the HMD 7100a (e.g., within the field of view of the user), the representation of the user's hand 7022 'is displayed in a user interface displayed on a display of the HMD 7100a (e.g., as a passthrough representation and/or as a virtual representation of the user's hand 7022). In some implementations, the user's hand 7020 and/or the user's hand 7022 are used to perform one or more gestures (e.g., one or more air gestures), optionally in combination with gaze input. In some implementations, the one or more gestures performed with the user's hand 7020 and/or 7022 include direct air gesture input that is based on a positioning of a representation of the user's hand 7020 'and/or 7022' displayed within a user interface on a display of the HMD 7100 a. For example, the direct air gesture input is determined to be directed to a user interface object displayed at a location intersecting a displayed location of the representation of the user's hand 7020' and/or 7022' in the user interface. In some implementations, the one or more gestures performed with the user's hand 7020 and/or 7022 include an indirect air gesture input that is based on a virtual object displayed at a location corresponding to the location at which the user's attention is currently detected (e.g., and/or optionally not based on the location of the representation of the user's hand 7020' and/or 7022' displayed within the user interface). For example, when a user's attention to a user interface object is detected (e.g., based on gaze or other indication of the user's attention), indirect air gestures, such as gaze and pinch (e.g., or other gestures performed with the user's hand), are performed with respect to the user interface object.

In some implementations, the user input is detected via a touch-sensitive surface or touch screen. In some implementations, the one or more input devices include an eye tracking component that detects the location and movement of the user's gaze. In some embodiments, the display generation component and optionally the one or more input devices and the computer system are part of a head mounted device that moves and rotates with the user's head in a physical environment and changes the user's point of view in a three-dimensional environment provided via the display generation component. In some embodiments, the display generating component is a heads-up display that does not move or rotate with the user's head or the entire body of the user, but optionally changes the user's point of view in a three-dimensional environment according to the movement of the user's head or body relative to the display generating component. In some embodiments, the display generating component (e.g., touch screen) is optionally moved and rotated by the user's hand relative to the physical environment or relative to the user's head, and the viewpoint of the user is changed in the three-dimensional environment according to the movement of the display generating component relative to the user's head or face or relative to the physical environment.

In some implementations, the display generation component 7100 includes a Head Mounted Display (HMD) 7100a and/or HMD 12011 (e.g., fig. 12). For example, as shown in fig. 7C2 (e.g., and fig. 7C3, 8C 1-8C 2, 9B 2-9B 3, 10B 2-10B 3, 11B 2-11B 3, 12B 2-12G 2, and 19C 1-19C 2), the head-mounted display 7100a (e.g., and/or HMD 12011) includes one or more displays that display a representation of a portion of the three-dimensional environment 7000 'corresponding to the user's perspective, although HMDs typically include multiple displays (including a display for the right eye and a separate display for the left eye that display slightly different images to generate a user interface with stereoscopic depth), in the figures, a single image corresponding to the image for a single eye is shown, and depth information is indicated with other annotations or descriptions of the figures. In some embodiments, the HMD 7100a includes one or more sensors (e.g., one or more inwardly-facing and/or outwardly-facing image sensors 314), such as sensors 7101a, 7101b, and/or 7101c, for detecting a state of a user, including facial and/or eye tracking of the user (e.g., using one or more inwardly-facing sensors 7101a and/or 7101 b) and/or tracking a hand, torso, or other movement of the user (e.g., using one or more outwardly-facing sensors 7101 c). In some implementations, the HMD 7100a includes one or more input devices, such as one or more buttons, a touch pad, a touch screen, a scroll wheel, a rotatable and depressible digital crown, or other input devices, optionally located on a housing of the HMD 7100 a. In some embodiments, the input element is a mechanical input element, in some embodiments, a solid state input element that responds to a press input based on a detected pressure or intensity. For example, in fig. 7C2 (e.g., and fig. 8C1, 9B2, 10B2, 11B2, 12B 2-12G 2, and 19C 1), HMD 7100a includes one or more of buttons 701, 702, and digital crown 703 (e.g., and/or other hardware input elements 7108) for providing input to HMD 7100 a. It should be appreciated that additional and/or alternative input devices may be included in the HMD 7100 a.

Fig. 7C3 (e.g., and fig. 8C2, 9B3, 10B3, 11B3, and 19C 2) illustrates a top-down view of a user 7002 in a physical environment 7000. For example, the user 7002 is wearing the HMD 7100a such that the user's hands 7020 and/or 7022 (e.g., which are optionally used to provide air gestures or other user inputs) are physically present within the physical environment 7000 behind the display of the HMD 7100 a.

Alternative display generating components of the computer system are illustrated in fig. 7C2 (e.g., and fig. 7C3, 8C 1-8C 2, 9B 2-9B 3, 10B 2-10B 3, 11B 2-11B 3, 12B 2-12G 2, and 19C 1-19C 2) as compared to the displays illustrated in fig. 7A-7C 1, 7D-8B, 8C 3-9B 1, 9C-10B 1, 10C-11B 1, 11C-11F, 12B 1-12G 1, 19A-19B, and 19C 3-19P. It should be understood that the processes, features, and functions described herein with reference to the display generating assembly 7100 described in fig. 7A-7C 1, fig. 7D-8B, fig. 8C 3-9B 1, fig. 9C-10B 1, fig. 10C-11B 1, fig. 11C-11F, fig. 12B 1-12G 1, fig. 19A-19B, and fig. 19C 3-19P are also applicable to the HMD 00a shown in fig. 7C 2-7C 3, fig. 8C 1-8C 2, fig. 9B 2-9B 3, fig. 10B 2-10B 3, fig. 11B 2-11B 3, fig. 12B 2-12G 2, and fig. 19C 1-19C 2.

Fig. 7B illustrates an application user interface 7018 displayed in a virtual three-dimensional environment having a top portion 7102, a middle portion 7104, and a bottom portion 7106. Further, the virtual three-dimensional environment includes one or more computer-generated objects, also referred to as virtual objects, such as boxes 7016 (e.g., which are not representations of physical boxes in physical environment 7000). In some implementations, the application user interface 7018 corresponds to a user interface of a software application (e.g., an email application, a web browser, a messaging application, a map application, a video player or an audio player, or other software application) executing on the computer system 101. In some implementations, the application user interface 7018 is displayed in a middle portion 7104 of the virtual three-dimensional environment within a central portion of a field of view of a user of the device (e.g., along a gaze direction of the user, the user 7002 is provided with a front view of the application user interface 7018 such that the application user interface 7018 appears substantially at a gaze height of the user 7002).

In some embodiments, the display generation component 7100 is disposed within the housing 7024 of the computer system 101. Hardware input elements 7108 (e.g., buttons, crowns, or rotatable and depressible input elements) are provided on a housing 7024 that encloses or surrounds the display generating assembly 7100. Hardware input elements 7108 (e.g., buttons, crowns, or rotatable and depressible input elements) are configured to detect two or more types of input. The first type of input to the hardware input element 7108 is a press input, as indicated by the downward arrow shown in fig. 7B. The hardware input element 7108 is also capable of receiving a second type of input as a rotational input. For example, the hardware input element 7108 may rotate in a counterclockwise manner about the rotational axis, as shown by the dashed lines and curved arrows in fig. 7B. In some implementations, the hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element) is configured to receive clockwise rotational input. In some implementations, the hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element) is configured to receive both counterclockwise rotation input and clockwise rotation input. In some embodiments, computer system 101 is capable of detecting an amount of rotation (e.g., the degree by which hardware input element 7108 (e.g., button, crown, or depressible input element) is turned) and a direction of rotation (e.g., counterclockwise or clockwise), and performing a function based on the amount of rotation and the direction of rotation. In some implementations, the hardware input element 7108 is a rotatable input element (e.g., a crown).

In response to detecting user input on a hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element), a main menu user interface 7110 is presented in a virtual three-dimensional environment as shown in fig. 7C (e.g., fig. 7C1, 7C2, and 7C3, with the user interface shown in fig. 7C1 displayed on HMD 7100a in fig. 7C 2). In some implementations, the user input is a single press input to a hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element). In some embodiments, the application user interface 7018 is canceled by a single press input (e.g., prior to displaying the main menu user interface 7110, or simultaneously with displaying the main menu user interface 7110), as shown in fig. 7C. In some embodiments, the main menu user interface 7110 is displayed in a central portion of the user's field of view, e.g., in the middle portion 7104 of the virtual three-dimensional environment, and thus is not displayed below the display of the application user interface 7018.

The main menu user interface 7110 includes a collection of various representations, such as application icons, gadgets, communication options, and/or affordances for displaying VR and/or AR contexts. In some embodiments, the main menu user interface 7110 includes (e.g., at least) three sets of representations. Fig. 7C illustrates a first set of representations including representations 7112, 7114, 7116, 7118, 7120, 7122, 7124, and 7126 arranged in a virtual three-dimensional environment. The representations 7112-7126 may occupy locations anywhere within the virtual three-dimensional environment. Generally, these representations are presented in a middle portion 7104 of the virtual three-dimensional environment (e.g., the main menu user interface 7110 is presented substantially in a central portion of the field of view of the user 7002, with the representations 7110-7126 being displayed substantially at the line-of-sight height of the user 7002). Presenting the main menu user interface 7110 substantially in the central portion of the field of view of the user 7002 of the device increases the efficiency of operation by avoiding further input (e.g., lowering or raising the gaze of the user 7002, or visually searching the main menu user interface 7110 and/or tilting/rotating the head of the user 7002 to focus on the main menu user interface 7110) and reduces the amount of time required to begin navigating within the main menu user interface 7110, thereby increasing the efficiency of operation of the computer system 101.

In some embodiments, representations 7112-7126 are arranged in a regular pattern (e.g., in a grid pattern, along a line, radially, circumferentially). In some implementations, the representations 7112-7126 correspond to various software applications (e.g., email applications, web browsers, messaging applications, map applications, video or audio players, or other software applications) that can execute on the computer system 101.

The main menu user interface 7110 includes tabs 7132 for displaying representations of software applications, tabs 7134 for displaying representations of one or more other people, each for initiating or maintaining (e.g., continuing) communication with a corresponding person (e.g., representations of one or more other users interacting with or capable of interacting with the user 7002), and tabs 7136 for displaying one or more virtual environments that may be displayed as (or in) a virtual three-dimensional environment. In some embodiments, the virtual environment includes virtual content that is computer-generated content that is different from the transparent portion of the physical environment. In some embodiments, additional tabs for displaying other representations are provided in the main menu user interface 7110. In some implementations, one or more of tabs 7132, tab 7134, or tab 7136 are not presented in the main menu user interface 7110. Fig. 7C shows tabs 7132, tabs 7134, and tabs 7136 arranged substantially linearly on the left portion of the main menu user interface 7110. In some implementations, tabs 7132, tab 7134, and tab 7136 are displayed in other portions (e.g., top, right, bottom) of the main menu user interface 7110. In some embodiments, tabs 7132, tab 7134, and tab 7136 are not arranged in any particular spatial order relative to each other.

In response to detecting user input directed to tab 7134 (e.g., corresponding to or on the tab), main menu user interface 7110 is updated to display representations of one or more other people, each representation for initiating or maintaining communication with a corresponding person (e.g., representations of one or more other users interacting with or capable of interacting with user 7002), as shown in fig. 7D. For example, fig. 7D shows a representation 7138 of a first user, a representation 7140 of a second user, and a representation 7142 of a third user. In some implementations, the representation 7138, the representation 7140, and the representation 7142 are displayed in a central portion of the field of view of the user 7002 in a middle portion 7104 of the virtual three-dimensional environment (e.g., the representation 7138 of the first user, the representation 7140 of the second user, and the representation 7142 are presented substantially at the line-of-sight height of the user 7002).

In some implementations, representations of one or more users currently in a coexistence session with user 7002 are displayed on main menu user interface 7110 (e.g., one or more of the first user, the second user, or the third user is in a coexistence session with user 7002). In some implementations, representations of users are arranged within a shared three-dimensional environment relative to one another in a coexistence session (or a spatial communication session) (e.g., such that respective users view the positioning of other users relative to the respective users' viewpoints). For example, the viewpoint of user 7002 includes a representation of the first user to the left (or right) of the representation of the second user. The coexistence session and the spatial communication session are further described with reference to fig. 9D.

In some implementations, representations of one or more users that are not yet in the coexistence session but are able to enter the coexistence session with the user 7002 (e.g., one or more of a first user, a second user, or a third user that are not yet in the coexistence session with the user 7002 but are able to join the coexistence session with the user 7002) are additionally displayed on the main menu user interface 7110.

In some implementations, representations of one or more users in the contact list of user 7002 (e.g., one or more of the first user, the second user, or the third user in the contact list of user 7002) are additionally displayed on the main menu user interface 7110. By providing user input in the main menu user interface 7110 that points to (e.g., corresponds to or is on) one or more representations of one or more other users, the user 7002 is able to initiate or maintain communication with and/or interact with one or more other users. For example, in response to user input directed to representation 7138 (e.g., corresponding to or on the representation), computer system 101 facilitates user 7002 to communicate and/or interact with a first user in the virtual three-dimensional environment. In some embodiments, instead of a fully virtual three-dimensional environment, the user 7002 communicates and/or interacts with the first user in a mixed reality environment that includes sensory input from the physical environment 7000 or a representation thereof (e.g., box 7016) in addition to the computer-generated sensory input.

In some implementations, the user input directed to the representation in the main menu user interface or other user interface includes a pinch input, a tap input, or a gaze input.

In response to detecting user input directed to tab 7136 (e.g., corresponding to or on the tab), the main menu user interface 7110 is updated to display a representation of the virtual environment (sometimes referred to as an option) that may be displayed as (or in) the virtual three-dimensional environment, as shown in fig. 7E. The representation 7144 corresponds to a virtual environment providing beach scenery. The representation 7146 corresponds to a virtual environment that provides office settings. Displaying the main menu user interface 7110 that provides quick access to a set of selectable virtual environments provides a way to change the user's virtual experience without displaying additional controls, thereby minimizing the number of inputs required to select a desired virtual environment, thereby improving the performance and operational efficiency of the computer system 101.

In some implementations, the representations (e.g., options) 7144 and the representations (e.g., options) 7146 are displayed in a central portion of the field of view of the user 7002 in a middle portion 7104 of the virtual three-dimensional environment (e.g., the options 7144 and 7146 are presented substantially at the line of sight height of the user 7002).

In response to detecting a user selection of a virtual environment providing office settings (e.g., computer system 101 detects user input corresponding to or on option 7146), the virtual three-dimensional environment is updated to include a desk 7148 and a display plate 7150, as shown in fig. 7F. In some embodiments, virtual objects such as boxes 7016 that exist prior to displaying a particular virtual environment continue to exist after the virtual environment is selected. For example, the display panel 7150 is shown resting on and supported by the box 7016. In some implementations, the virtual environment includes virtual objects that allow user interaction (e.g., user 7002 may reposition a conference chair around desk 7148; user 7002 may reposition display plate 7150; user 7002 may reposition desk 7148). In some implementations, the virtual environment includes virtual objects that do not allow user interaction (e.g., the user 7002 cannot relocate any item in the virtual environment). In some embodiments, a virtual object, such as box 7016, that exists prior to displaying a particular virtual environment, stops displaying after the virtual environment is selected. For example, in such embodiments, when the display panel 7150 and desk 7148 are displayed, the box 7016 is no longer displayed.

In response to detecting a user input corresponding to or on tab 7132, the main menu user interface 7110 is updated to return to displaying a representation of the software application in the virtual three-dimensional environment, as shown in fig. 7C.

From the main menu user interface 7110, the user 7002 can access the various sets of representations by selecting the respective tabs (e.g., the set of representations of the software application can be viewed by selecting tab 7132; the set of representations of one or more other users that interact or are capable of interacting with the user 7002 can be viewed by selecting tab 7134; the set of representations of one or more selectable virtual environments can be viewed by selecting tab 7136). A single input (e.g., a single press input) to a hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element) provides a main menu user interface 7110 to the user 7002 from which the user 7002 can navigate to other software applications, interact with other users, or experience a different virtual environment. Allowing a single input to trigger the display of the main menu user interface 7110 allows the user 7002 to quickly access and navigate the collection of applications in the main menu user interface 7110 and/or to change the user's virtual environment and/or to interact with additional users, regardless of what process is in progress (e.g., when the first application is running) without displaying additional controls, minimizes the number of inputs required to select a desired operation, and improves performance and efficiency of the computer system 101. Furthermore, providing the main menu user interface 7110 with sections navigable by a user in response to the first input effectively provides the user with a greater range of applications, people, virtual environments, or other operations than is possible with a static main menu user interface.

In fig. 7B and 7C, when the main menu user interface 7110 is displayed, the application user interface 7018 is hidden (e.g., the application user interface 7018 is hidden before the main menu user interface 7110 is displayed, or the application user interface 7018 is hidden simultaneously with the main menu user interface 7110 is displayed). In some embodiments, the application associated with the application user interface 7018 continues to run in the background even though the application user interface 7018 is hidden. In contrast, fig. 7G-7I illustrate embodiments in which a different user interface for the same software application is provided to the user 7002 as the user 7002 navigates the main menu user interface 7110.

Fig. 7G illustrates an application user interface 7152 displayed in a virtual three-dimensional environment including a box 7016, which is a computer-generated virtual object. Application user interface 7152 is the user interface of an audio player software application executing on computer system 101. In some implementations, the application user interface 7152 is displayed in a middle portion 7104 of the virtual three-dimensional environment, substantially in a central portion of the field of view of the user 7002 (e.g., the application user interface 7152 appears substantially at the line-of-sight height of the user 7002).

In response to detecting a user input directed to a hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element), a main menu user interface 7110 is presented in a virtual three-dimensional environment, as shown in fig. 7H. In some implementations, the user input is a single press input to a hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element). In some embodiments, the application user interface 7152 is canceled by a single press input (e.g., before the main menu user interface 7110 is displayed or simultaneously with the main menu user interface 7110 being displayed) and replaced by a mini-player user interface 7154, as shown in fig. 7H. The mini-player user interface 7154 occupies a smaller area of the virtual three-dimensional environment than the application user interface 7152. In some embodiments, the mini-player user interface 7154 is shifted to a more peripheral portion of the virtual three-dimensional environment than the application user interface 7152, which is displayed in a central portion of the field of view of the user 7002. In some embodiments, the mini-player user interface 7154 is displayed at substantially the same location as the application user interface 7152 (e.g., the center position of the application user interface 7152 substantially coincides with the center position of the mini-player user interface 7154).

Presenting the mini-player user interface 7154 provides a means for the user 7002 to multitask and continue the media experience (at least in some capability) while navigating virtually through the main menu user interface 7110, which increases the performance and efficiency of the computer system 101. Displaying the mini-player user interface 7154 (e.g., an audio mini-player) allows the user to control the media experience (e.g., by providing playback control in the mini-player) and/or indicates to the user the current "location" of the user's media experience (e.g., by displaying a time index, or for video content, displaying a representation of the current video frame) when the user navigates the main menu user interface, without displaying additional controls. Although not shown in fig. 7H-7J, in some embodiments, the compact player user interface includes a video picture-in-picture (PiP) player that provides a representation optionally including the current video frame.

The user 7002 can scroll through the representations of the software applications displayed in the main menu user interface 7110. For example, the first set of representations of the software application includes representation 7112, representation 7114, representation 7116, representation 7118, representation 7120, representation 7122, representation 7124, and representation 7126. In some implementations, the first set of representations of software applications includes static representations (e.g., static application icons, or static content snapshots, or other static information) of software applications arranged in a first region of the virtual three-dimensional environment. In some implementations, the first set of representations of the software application are dynamic representations (e.g., animated representations, periodic animated representations). In response to detecting a user input (e.g., a user gesture) to navigate to a different set of representations of the software application, the main menu user interface 7110 presents a second set of representations of the software application including representation 7156, representation 7158, representation 7160, representation 7162, representation 7164, representation 7166, representation 7168, and representations 7170, representation 7172, and representation 7174, as shown in fig. 7I. In some embodiments, the user input is a drag gesture that allows for scrolling of a representation of the software application (e.g., the drag gesture is interpreted by computer system 101 as an instruction to scroll the representation of the software application) as indicated by the left pointing arrow in FIG. 7H.

Providing a second set of representations of the software application in substantially the same region as the first set of representations of the software application (e.g., the first set of representations being replaced by the second set of representations) allows the user 7002 to sequentially browse through the large number of representations of the software application without being overwhelmed by simultaneous/concurrent display of the large number of representations in the virtual three-dimensional environment, thereby facilitating timely selection of desired operations without displaying additional controls. Furthermore, the scrollable main menu user interface efficiently provides a user with a greater range of applications, people, virtual environments, or other operations than is possible with a static scrollable main menu user interface.

In some embodiments, different sets of representations of software applications are arranged on respective pages of the main menu user interface 7110. The user 7002 may access a corresponding page, for example, a page of a representation set of software applications including the main menu user interface 7110. In some implementations, pages are ordered with a particular directionality, making it easier for a user to navigate to a particular (e.g., previously accessed) page. Navigation of the user 7002 in the main menu user interface 7110 may result in an operation that causes the display of the main menu user interface 7110 to be canceled (e.g., when initiating an immersion experience from a representation of the software application). When the user 7002 returns to the main menu user interface 7110 within a preset time threshold (e.g., less than 1 hour, less than 10 minutes, less than 1 minute), the last accessed section of the main menu user interface (e.g., the specific page of the application, the section displaying the contact list with which the user 7002 can initiate communication, the section displaying various selectable virtual environments) is maintained and displayed to the user 7002. Conversely, if the user 7002 returns to the main menu user interface 7110 after a preset time threshold has elapsed (e.g., the next day, in the next session, after more than one hour), the display of the main menu user interface 7110 resets to a predetermined section (e.g., the first page of the representation of the application). In some implementations, the preset time threshold is dependent on a section of the main menu user interface (e.g., the application section resets within a smaller time threshold than the people/contacts section). Retaining information about the last accessed section on the main menu user interface 7110 reduces interference, allowing the user 7002 to quickly return to the previously accessed section of the main menu user interface 7110 without displaying additional controls when the user 7002 accesses the main menu user interface within a preset time threshold after leaving the main menu user interface 7110. This feature helps save the user time, avoiding the need to re-browse various sections of the main menu user interface 7110 to return to previously accessed sections of the main menu user interface 7110 when the user briefly leaves the main menu user interface to perform a different operation, such as an operation in a particular application.

When the main menu user interface 7110 is displayed, the main menu user interface 7110 is canceled in response to detecting a second user input to a hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element), as shown in fig. 7J. In some implementations, the second user input is a second press input to a hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element). In some implementations, cancelling the main menu user interface 7110 does not affect the display of virtual objects (e.g., boxes 7016) in the virtual three-dimensional environment. Having the mini-player persist after the main menu user interface 7110 is canceled provides the user 7002 with an uninterrupted media experience even after navigation in the virtual environment via the main menu user interface 7110 has ended, thereby improving the operating efficiency of the computer system 101. For example, the user does not need to restart the media application after navigating and then cancel the main menu user interface 7110.

In response to detecting a third user input to hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element), a view of three-dimensional environment 7128 is visible to user 7002 via display generation component 7100 of computer system 101, as shown in fig. 7K. The three-dimensional environment 7128 of fig. 7K optionally includes a representation of objects in a physical environment (e.g., as captured by one or more cameras of the computer system 101), such as physical environment 7000. For example, in fig. 7K, three-dimensional environment 7128 includes representation 7014 'of physical object 7014, representations 7004' and 7006 'including physical walls 7004 and 7006, respectively, and representation 7008' of physical floor 7008. In some implementations, detecting a third user input to the hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element) causes the computer system 101 to provide a mixed reality experience to the user 7002. For example, both a computer-generated virtual object (e.g., box 7016) and a representation of an object in physical environment 7000 are displayed to user 7002. For example, a first portion of the virtual three-dimensional environment includes computer-generated virtual objects that are not present in the physical environment 7000, while a second portion of the virtual three-dimensional environment includes representations of objects in the physical environment 7000 that are displayed as the three-dimensional environment 7128.

In some embodiments, instead of using three sequential inputs to hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element), the user interface shown in fig. 7J is displayed (e.g., after two sequential inputs to hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element)) and the user interface shown in fig. 7K is displayed (e.g., after three sequential inputs to hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element)) after two sequential inputs (e.g., the second input is a long press to hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element), and the user interface shown in fig. 7J is skipped.

Canceling the main menu user interface 7110 by replacing the display of the main menu user interface with a presentation of a passthrough portion of the physical environment of the computer system 101 (e.g., a head mounted device) via a display generation component improves the security of the user 7002, thereby allowing the user 7002 to be aware of the physical environment of the computer system 101 (via the passthrough portion of the physical environment of the computer system 101). For example, after the user has completed navigating the main menu user interface 7110, the user 7002 may need to respond to an emergency or other situation that requires the attention of the user 7002 or that requires the user 7002 to interact with the physical environment. Using the second or third input to activate the display of the passthrough portion allows the user 7002 to exit from the virtual environment and view at least a portion of the physical environment without displaying additional controls. In some embodiments, the display of the virtual environment in which the main menu user interface 7110 is displayed is stopped in addition to the rendering of the passthrough portion. Stopping displaying the virtual environment while canceling the main menu user interface 7110 allows the user to exit from the virtual environment and view at least a portion of the physical environment (e.g., cancel the display of the virtual environment) by causing the second input to function similar to the input to the escape button without displaying additional controls.

In some embodiments, the display generation component includes a passthrough portion in which a representation of the physical environment is displayed or visible. In some implementations, the transparent portion of the display generating component is a transparent or translucent (e.g., see-through) portion of the display generating component that reveals at least a portion of the physical environment around the user or within the user's field of view. For example, the transparent portion is a portion of the head-mounted display or head-up display that is made translucent (e.g., less than 50%, 40%, 30%, 20%, 15%, 10%, or 5% opacity) or transparent so that a user can view the real world around the user through it without removing the head-mounted display or moving away from the head-up display (sometimes referred to as "optical transmission"). In some embodiments, the transparent portion gradually transitions from translucent or transparent to completely opaque when displaying a virtual or mixed reality environment. In some implementations, the passthrough portion of the display generation component displays a live feed of images or videos (sometimes referred to as "virtual passthrough") of at least a portion of a physical environment captured by one or more cameras (e.g., a rear camera of a mobile device or associated with a head mounted display, or other cameras that feed image data to a computer system). In some embodiments, the one or more cameras are directed at a portion of the physical environment that is directly in front of the user's eyes (e.g., behind the display generation component relative to the user of the display generation component). In some embodiments, the one or more cameras are directed at a portion of the physical environment that is not directly in front of the user's eyes (e.g., in a different physical environment, or at the side or rear of the user).

When in mixed reality/passthrough mode (e.g., when displaying a three-dimensional environment 7128), in response to detecting user input to a hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element), a main menu user interface 7110 is overlaid on the three-dimensional environment 7128, as shown in fig. 7L. In some implementations, the user input is a press input to a hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element). In some implementations, the main menu user interface 7110 is presented in a middle portion of the three-dimensional environment 7128. In response to detecting a user input (e.g., a user gesture) directed to representation 7124, computer system 101 causes a software application associated with representation 7124 to be displayed. In some embodiments, the representation 7124 corresponds to an audio player application, and user input selecting the representation 7124 causes the mini-player user interface 7154 to be presented simultaneously with the main menu user interface 7110, as shown in fig. 7M.

In response to detecting a user input (e.g., a user gesture) directed to representation 7126, computer system 101 causes a software application associated with representation 7126 to be displayed. In some embodiments, the representation 7126 corresponds to a web browsing application, and a user gesture that selects the representation 7126 causes a web browsing application user interface 7178 to be displayed, as shown in fig. 7O.

In some embodiments, the characteristics of the software application determine whether to maintain the display of the main menu user interface 7110. For example, the display of the main menu user interface 7110 is maintained when the representation of the audio player application (or video player application) is selected, and the display of the main menu user interface 7110 is stopped when the representation of the web browsing application (or document editing application, calendar application, or email application) is selected. In some embodiments, the display of the main menu user interface 7110 is maintained until a predetermined number of applications have been selected (e.g., the display of the main menu user interface 7110 is maintained until after the representation of the second software application has been selected, the display of the main menu user interface 7110 is maintained until after the representation of the third software application has been selected, or the display of the main menu user interface 7110 is maintained until after the representation of the fourth software application has been selected).

Even though not shown in fig. 7L and 7M, in some embodiments, an application (e.g., an audio player application) is already running on computer system 101 before main menu user interface 7110 is displayed in response to the first user input. In response to detecting a user input (e.g., a user gesture, pinch and drag gesture) on the application, a first user interface object (e.g., an instance of the application or an object extracted or dragged from the application, sometimes referred to herein as an "overview object") is extracted from the application and displayed. In some embodiments, the overview object is an object that is pulled from the application before a portion of the application (e.g., all of the application) is replaced by the display of the main menu user interface 7110, and continues to be displayed after that portion of the application is replaced by the display of the main menu user interface 7110.

For example, the first user interface object may be a music track of a music album being played on an audio player application. Alternatively, the first user interface object may be a text portion extracted or dragged from a document editing application running on computer system 101. Alternatively, the first user interface object may be a web page that is fetched or dragged from a web browsing application running on computer system 101. Alternatively, the first user interface object may be an image file or a video file extracted or dragged from a media display application (e.g., a web browsing application, a video player, a photo display application) running on the computer system 101.

Providing the first user interface object allows the user 7002 to maintain use of the application (e.g., use an instance of the application) or to maintain display of data associated with the application even after the application's main user interface is canceled (e.g., the overview object is an instance of copying from the application). Maintaining the display of such user interface objects allows the user 7002 to continue controlling the application (e.g., navigating on the main menu user interface 7110) while multitasking, without displaying additional controls. The multitasking functionality is not affected by the presence of the main menu user interface 7110 triggered by the first input, thereby improving the performance and efficiency of the computer system 101.

In some implementations, in response to detecting a user input (e.g., a user gesture) directed to a representation of the second application displayed in the main menu user interface 7110, execution of the second application is initiated (e.g., and begins running) when the speed object is displayed. Launching the second application from the main menu user interface 7110 while the first user interface object is displayed (e.g., continues to be displayed) eliminates the need to display additional controls. Maintaining the display of the first user interface object provides a visual alert to the user 7002 that can facilitate selection of an appropriate second application. In some cases, the displayed first user interface object provides information that may be used in the second application without requiring the user to restart the first application after starting the second application, thereby allowing multiple tasks to be completed simultaneously, thereby improving performance and operating efficiency of computer system 101.

In some implementations, the user 7002 can direct the first user interface object to the second application (e.g., drag the overview object into the second application) to perform an operation in the second application based on the first user interface object. For example, the speed object may be an image from a media display application and the second application is a text messaging application or a document editing application. Directing the image to the document editing application allows the image to be added directly to an open document in the document editing application.

In some implementations, the first user interface object is cancelled when the main menu user interface 7110 is cancelled (e.g., by input to a hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element)). Using a single input (e.g., a second button press) to cancel both the first user interface object and the main menu user interface 7110 eliminates the need to display additional controls. The user does not have to waste time closing the first user interface object and/or navigating to a particular user interface control element to manually close the first user interface object, respectively, thereby improving performance and operational efficiency of the computer system 101.

Subsequent user input to the hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element) causes the main menu user interface 7110 to be displayed in the three-dimensional environment 7128 after the main menu user interface 7110 is canceled (e.g., as shown in fig. 7N). The additional input enables the main menu user interface 7110 to be redisplayed after it has been cancelled without displaying additional controls. Allowing additional inputs to redisplay the main menu user interface 7110 provides a simple way for the user 7002 to return to the main menu user interface 7110 based on a single input, regardless of which process the user 7002 may have used on the computer system 101 after canceling the main menu user interface 7110. This input serves as a generic mechanism that enables the user 7002 to navigate directly to the top-level main menu user interface 7110 and then browse through different sets of representations (e.g., representations of applications, people, and/or virtual environments) in the main menu user interface 7110 without displaying additional controls.

Hardware input elements 7108 (e.g., buttons, crowns, or rotatable and depressible input elements) are configured to receive various user inputs. For example, in response to detecting two press inputs that are closely connected (e.g., two press inputs within 2 seconds of each other, two press inputs within 1 second of each other, two press inputs within 0.5 seconds of each other), an application management user interface (e.g., system interface 7180) is presented in the virtual three-dimensional environment, as shown in fig. 7O. In some implementations, the system interface 7180 is overlaid over applications running in the foreground (which may include two or more applications) (e.g., an audio player/music application and a web browser application, as shown in fig. 7N and 7O) and a three-dimensional environment 7128 (e.g., presented at a location in the field of view of the user 7002 that is closer to the user 7002 than the two applications running in the foreground). Using different types of inputs on a single input device to trigger multiple system operations (e.g., display a forced exit menu) (e.g., trigger an application-specific operation) reduces the number of different input devices that must be provided to accomplish different tasks (e.g., N input devices may implement M operations, where N < M). Reducing the number of input devices that need to be provided in order to provide the user with direct access to various system functions reduces physical confusion on the device, freeing up more physical space on the device and helping to prevent accidental input from inadvertent contact. Reducing the number of input devices also reduces the need to provide additional hardware wiring within the device, and instead the processor may be programmed to interpret different inputs from a fewer number of input devices. Using the same user input device, the user 7002 can quickly reach the application management user interface without having to present additional/intermediate controls.

In the example shown in FIG. 7O, system interface 7180 provides a force exit menu that shows all applications currently running on computer system 101. Applications include both applications running in the foreground and applications running in the background (e.g., email applications, document editing applications, and calendar applications). The user 7002 can close a particular application by providing a user gesture to the exit button 7182 associated with each application. In some embodiments, the forced exit menu includes a button that causes all applications running on computer system 101 to be closed. In some embodiments, the application management user interface is a system interface that allows for multitasking on computer system 101.

In some implementations, a system user interface (e.g., an application agnostic user interface, a user interface for applying system-wide settings of computer system 101) responds to user input on an input device (e.g., hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element)) in the same manner as an application user interface (e.g., a pressing input on hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element)), while the system user interface is displayed such that at least a portion of the system user interface is replaced by a main menu user interface. Streamlining (e.g., by normalizing) the display of the main menu user interface 7110 in response to detecting a corresponding input having the same type of input as the first input, regardless of the user interface currently being displayed (e.g., system user interface or application user interface), reduces the number of different control elements required by the device and allows the user 7002 to browse different sets of representations (e.g., representations of applications, people, and virtual environments) without displaying additional controls.

Fig. 8A to 8G illustrate examples of performing different operations based on input to the input device depending on the current display mode. FIG. 14 is a flowchart of an example method 1400 for performing different operations based on input to an input device depending on a current display mode. The user interfaces in fig. 8A to 8G are used to illustrate the processes described below, including the process shown in fig. 14.

Fig. 8A illustrates an application user interface 8000 displayed in a virtual three-dimensional environment. The application user interface 8000 fully occupies the entire field of view of the user 7002 in the virtual three-dimensional environment. For example, the application user interface 8000 is displayed in the top portion 7102, middle portion 7104, and bottom portion 7106 of the virtual three-dimensional environment. Various portions of the virtual three-dimensional environment are described with reference to fig. 7A-7B. In some implementations, the application user interface 8000 corresponds to a user interface of a software application (e.g., a video player, web browser, map application, video conferencing application, messaging application, email application, audio player, or other software application) executing on the computer system 101. In some embodiments, the virtual three-dimensional environment includes virtual content 8002 displayed by (or in) an application user interface 8000 of an application executed by computer system 101. In some embodiments, the computer-generated virtual content (e.g., box 7016) displayed in the virtual three-dimensional environment has no correspondence in the physical environment 7000 and/or is not part of the application corresponding to the application user interface 8000. Optionally, one or more elements of the computer-generated virtual content are overlaid on top of the immersive application user interface 8000 (e.g., presented closer to the viewpoint of the user 7002 in the field of view of the user 7002 than the application user interface 8000).

When an application is presented in an immersion mode (e.g., in a full immersion mode, or providing a full immersion experience to the user 7002), an application user interface associated with the application completely fills the user's field of view (e.g., a view extending 180 ° from the respective orientation of the user's head (e.g., from left shoulder to right shoulder). In some embodiments, the full immersion mode provides a view having a view of 180 ° around the user's 7002 head. In some embodiments, the user is provided with a full 360 ° view in all directions as the user rotates her head and/or body). In some embodiments, the immersion mode is also described as a "full screen" display mode that fully occupies the entire display provided by the display generating component of (or coupled to) computer system 101. In some embodiments, the first display mode includes an immersion mode in which only content of the application user interface (e.g., application user interface 8000) is displayed (e.g., content of the application user interface is displayed within the field of view of user 7002, and no content other than the content of the application user interface is displayed, and/or the content of the application user interface occupies substantially the entire field of view of user 7002).

In some embodiments, in addition to completely filling the user's field of view, audio input from the physical environment is counteracted, or substantially reduced (e.g., over 60%, 70%, or 80%) or prevented from reaching the user when the application is presented in an immersive mode. Similarly, in some embodiments, when the application user interface 8000 is presented to the user in the immersion mode, no audio input from any other application running on the computer system 101 is provided to the user. In some implementations, when the user 7002 is in the immersion mode, the computer-generated virtual content (e.g., box 7016) provides notification (e.g., incoming communication request, update from another application running in the context of the computer system 101) to the user 7002.

When the display generation component presents content to the user 7002 in the immersion mode, in response to detecting a user input (e.g., a single press input) on a hardware input element 7108 (e.g., a button, crown, or rotatable and pressable input element), the application user interface 8000 is canceled by the single press input and replaced by the resized application user interface 8004, as shown in fig. 8B. In some implementations, content similar to the content displayed in the application user interface 8000 is displayed in the resized application user interface 8004. For example, virtual content 8002 provided by application user interface 8000 continues to be displayed in resized application user interface 8004, albeit at a reduced scale. In some implementations, virtual content (such as box 7016) previously displayed in the immersion application user interface 8000 continues to be displayed (e.g., at the same location and/or with the same visual characteristics).

As shown in fig. 8B, in some embodiments, the resized application user interface 8004 exposes an underlying virtual environment (e.g., an office virtual environment including conference chairs surrounding a desk 7148) that was previously obscured by the immersive application user interface 8000. In some implementations, the resized application user interface 8004 is displayed in the middle portion 7104 of the virtual three-dimensional environment, near a central portion of the field of view of the user 7002. In some implementations, the resized application user interface 8004 is in a "non-full screen" display mode because the content from the resized application user interface 8004 does not fully occupy the entire display provided by the display generating component of the computer system 101. The display generation component also presents an office virtual environment and thus not all parts of the virtual environment display content from the resized application user interface 8004. In other words, the second display mode includes a non-immersive mode in which corresponding content of the application user interface (e.g., resized application user interface 8004) and other content (e.g., both the content of resized application user interface 8004 and content other than the content of resized application user interface are displayed within the field of view of user 7002) are displayed simultaneously, the content of resized application user interface 8004 occupying only a portion of the field of view of user 7002.

When interacting with an application user interface in a non-immersive mode, a virtual environment (e.g., an office virtual environment) forms part of the user experience. Displaying the application user interface (e.g., the resized application user interface 8004) with a non-immersive experience while maintaining the display of the virtual environment after the first input is detected minimizes interference to the user.

As shown in fig. 8B, when the display generation component presents both the virtual environment and the resized application user interface 8004 to the user 7002, the resized application user interface 8004 is cancelled by the second single press input in response to detecting the second user input (e.g., the second single press input) on the hardware input element 7108 (e.g., a button, crown, or rotatable and pressable input element). For example, the resized application user interface 8004 is canceled before the main menu user interface 7110 is displayed or simultaneously with the main menu user interface 7110 being displayed. The resized application user interface 8004 is replaced by a main menu user interface 7110 presented in a virtual three-dimensional environment while maintaining the display of the virtual environment (e.g., an office virtual environment), as shown in fig. 8C (e.g., fig. 8C1, 8C2, and 8C3, with a user interface similar to that shown in fig. 8C3 shown on HMD 7100a in fig. 8C 1). In some implementations, the main menu user interface 7110 is displayed in a central portion of the user's field of view, e.g., in the middle portion 7104 of the virtual three-dimensional environment, and thus is not displayed below the display of the previously displayed resized application user interface 8004.

Continuing to display the virtual environment (e.g., an office virtual environment) while displaying the main menu user interface minimizes interference to the user while navigating the main menu user interface 7110 without displaying additional controls. By maintaining the display of the virtual environment, the user does not need to re-initialize the virtual environment after navigating in the main menu user interface 7110, thereby improving the performance and efficiency of the computer system.

As previously described with reference to fig. 7B-7E, the main menu user interface 7110 provides access to different sets of user navigable items, including applications, people (e.g., representations of particular people) or contact lists, and virtual environments. In some embodiments, the main menu user interface 7110 includes application icons, gadgets, communication options, and/or affordances for displaying XR context. In some implementations, the main menu user interface 7110 is superimposed over an application user interface (e.g., the resized application user interface 8004). In some implementations, objects in the main menu user interface 7110 (e.g., application icons, virtual user interface icons, and other objects) are opaque or partially transparent, blocking or obscuring a corresponding portion of the application user interface (e.g., resized application user interface 8004). For example, those portions of the application user interface that are positioned behind the main menu user interface 7110 are blocked or obscured. In some embodiments, the main menu user interface 7110 includes a album having a plurality of objects thereon, and the album is opaque or partially transparent, thereby blocking or obscuring those portions of the application user interface that are positioned behind the main menu user interface 7110.

When the main menu user interface 7110 is displayed, in response to detecting a user input directed to a respective representation of the software application (e.g., a tap input, a long press input, or a pinch and drag input), an application user interface of the software application is displayed (e.g., in the foreground of the three-dimensional environment such that the software application corresponding to the representation is run in the foreground as a focused application).

Allowing a single input to trigger the display of the main menu user interface allows the user to quickly access and navigate the set of representations in the main menu user to interact with others, regardless of what is in progress (e.g., while the first application is running), without displaying additional controls, minimizes the number of inputs required to select the desired operation, improving performance and operating efficiency of the device (e.g., computer system).

In some embodiments, the main menu user interface 7110 is worldwide locked. For example, after presenting the main menu user interface 7110 (e.g., in response to a pressing input to a hardware input element 7108 (e.g., a button, crown, or a depressible input element), as shown in fig. 8C, upon rotation of her head by the user 7002 (e.g., toward the desk 7148 on the left side of fig. 8C), the main menu user interface 7110 remains in substantially the same positioning in the virtual three-dimensional environment such that when the representation 7118 is out of the field of view of the user 7002 due to the rotation of the head of the user 7002, the representation 7118 is no longer displayed to the user 7002. In some embodiments, the main menu user interface 7110 is head-locked such that after the main menu user interface 7110 is presented, the main menu user interface 7110 is redisplayed in the same portion of the field of view of the user 7002, regardless of how the user 7002 moves her head.

In some implementations, the resized application user interface 8004 shown in fig. 8B is similar to the application user interface 7018 shown in fig. 7B in that it is responsive to user input provided to a hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element). In some embodiments, the box 7016 is displayed by the display generating component in both fig. 7B and 8B. In some implementations, the presence of an office virtual environment including a conference chair surrounding the office table 7148 does not affect display operations (e.g., of the main menu user interface 7110) triggered by press inputs to hardware input elements 7108 (e.g., buttons, crowns, or rotatable and depressible input elements). In some implementations, virtual content such as box 7016 continues to be displayed while the main menu user interface 7110 is presented to the user 7002 in response to the press input.

As shown in fig. 8C, when the main menu user interface 7110 is presented, a representation set of one or more selectable virtual environments is presented to the user 7002 in response to user input (e.g., direct air gesture, indirect air gesture, tap input, long press input, and/or pinch and drag input) directed to tab 7136, as shown in fig. 8D. For example, in fig. 8C1, the user input is shown as a direct air gesture, where the location of the representation of the user's hand 7020' corresponds to tab 7136. In some implementations, the user input is an indirect air gesture that is displayed at a location corresponding to a location where the user's attention was currently detected when one or more gestures were performed with the user's hands 7020 and/or 7022 based on tab 7136. For example, in response to user input directed to tab 7136, representations of options 7114 corresponding to beach landscape virtual environments and options 7146 corresponding to virtual office environments are presented to user 7002 as selectable virtual environments. In some implementations, virtual content such as box 7016 continues to be displayed while the main menu user interface 7110 is presented to the user 7002 in response to the press input. In some implementations, when the selectable virtual environment is presented to the user 7002, the previously presented virtual environment is canceled. For example, as shown in fig. 8D, the office virtual environment is no longer displayed in fig. 8D. In some implementations, when the selectable virtual environment is presented to the user 7002, the previously presented virtual environment is maintained. In some embodiments, the user 7002 is presented with representations of more than two selectable virtual environments. In some embodiments, representations of more than two selectable virtual environments may all be displayed to user 7002 in a single snapshot. In some implementations, in response to a user input (e.g., pinch and drag input, tap input, long press input) directed to an edge of the field of view of the user 7002, the representation of the additional selectable virtual environment is scrolled (e.g., by the computer system 101) into the field of view of the user 7002. For example, a pinch and drag input directed to the right edge of the virtual environment in the field of view of the user 7002 causes additional selectable virtual environments to be entered into the field of view of the user 7002 from the right side.

As shown in fig. 8D, when presenting the representation of the selectable virtual environment to the user 7002, the office virtual environment is replaced with a beach landscape virtual environment including coco 8006, sun 8008, and coastline 8010 in response to user input directed to an option 7144 corresponding to the representation of the beach landscape, as shown in fig. 8E. In some embodiments, virtual content such as box 7016 continues to be displayed as the virtual environment is updated in response to user input. In some embodiments, the display of the representation of the selectable virtual environment is not immediately canceled when the user 7002 selects the option 7144 corresponding to the representation of the beach landscape. For example, the representation of the selectable virtual environment persists for a first amount of time (e.g., about 3 seconds or about 5 seconds) in case the user 7002 wishes to make a different selection after the selected virtual environment is displayed. In some implementations, the display of the representation of the selectable virtual environment ceases without further user input to the selectable representation after the first amount of time.

Displaying a main menu user interface that provides quick access to a set of selectable virtual environments provides a way to change the user's virtual experience without displaying additional controls, thereby minimizing the number of inputs required to select a desired virtual environment and improving the performance and efficiency of the computer system.

In some embodiments, as shown in fig. 8F, the immersive application user interface 8000 is displayed to the user 7002 without any computer-generated virtual content that is not provided by the application associated with the application user interface 8000 (e.g., the immersive application user interface 8000 is displayed without the box 7016 being displayed, which is not provided by the application associated with the application user interface 8000).

As shown in fig. 8F, when the immersive application user interface 8000 is displayed to the user 7002, in response to detecting a user input (e.g., a single press input) on a hardware input element 7108 (e.g., a button, crown, or rotatable and pressable input element), the application user interface 8000 is canceled by the single press input and replaced by an updated application user interface 8040, as shown in fig. 8G. In some implementations, content similar to that displayed in the application user interface 8000 is displayed in the updated application user interface 8040. For example, virtual content 8002 provided by application user interface 8000 continues to be displayed in updated application user interface 8040, albeit at a reduced scale. In some implementations, the updated application user interface 8040 is presented without any virtual environment and/or without any additional virtual content. In some embodiments, as shown in fig. 8G, when the virtual environment is not displayed to the user 7002, the updated application user interface 8040 is presented with a presentation of a passthrough portion of the physical environment of the computer system 101 (e.g., a head mounted device) via a display generation component. In some implementations, the updated application user interface 8040 corresponds to the resized application user interface 8004. In some embodiments, the updated application user interface 8040 corresponds to a mini-player application interface (e.g., mini-player user interface 7154 as shown in fig. 7G-7I or mini-player user interface 11012 as shown in fig. 11D).

In some embodiments, more than one user input results in a transition from fig. 8F to fig. 8G. For example, when in the immersion mode, a first input (e.g., a press input) to a hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element) causes application content in the immersion mode (e.g., in full screen mode) to be displayed in a non-immersion mode (e.g., in the form of a resized application user interface, in non-full screen mode). When the non-immersion mode is displayed, in response to detecting a second input (e.g., a second press input), a main menu user interface 7110 is displayed, as shown in fig. 8C. In some implementations, the updated application user interface 8040 persists, for example, as a mini-player application user interface. For example, the application user interface 8000 corresponds to a media player in full-screen mode (e.g., a video player application that presents a movie in full-screen mode), and the updated application user interface 8040 corresponds to a mini-player application user interface. When the main menu user interface is displayed simultaneously with the mini-player application user interface, in response to detecting a third input (e.g., a third press input), the main menu user interface 7110 is canceled and a passthrough section is presented, as shown in fig. 8G. In some embodiments, the mini-player application interface is maintained when the pass-through portion is presented, as shown in fig. 8G. In some embodiments, the display of the mini-player application interface is stopped when the pass-through portion is presented. When the device is operating in the non-immersion mode, canceling the main menu user interface using the third input (e.g., providing the non-immersion experience to the user) provides an efficient way to terminate navigation activity on the main menu user interface without interfering with the application user interface in the non-immersion experience (e.g., as shown in fig. 8G). No additional controls need to be provided to the user and the user does not need to browse any additional user interface control elements to exit the main menu user interface, thereby improving the operating efficiency of the computer system.

Canceling the main menu user interface 7110 by replacing the display of the main menu user interface with a presentation of a passthrough portion of the physical environment of the computer system 101 (e.g., a head mounted device) via a display generation component improves the security of the user 7002, thereby allowing the user 7002 to be aware of the physical environment of the computer system 101 (via the passthrough portion of the physical environment of the computer system 101). For example, after the user has completed navigating the main menu user interface 7110, the user 7002 may need to respond to an emergency or other situation that requires the attention of the user 7002 or that requires the user 7002 to interact with the physical environment. Using the second or third input (e.g., on a physical button) to activate the display of the passthrough portion allows the user 7002 to exit from the virtual environment and view at least a portion of the physical environment without displaying additional controls. In some embodiments, the display of the virtual environment in which the main menu user interface 7110 is displayed is stopped in addition to the rendering of the passthrough portion. Stopping displaying the virtual environment while canceling the main menu user interface 7110 allows the user to exit from the virtual environment and view at least a portion of the physical environment (e.g., cancel the display of the virtual environment) by causing the second input to function similar to the input to the escape button without displaying additional controls.

The user may use a single input to the input device to transition the device from a high immersion level (e.g., a full immersion mode in which only the content of the respective application is displayed) to a lower immersion mode or a non-immersion mode, or from a non-immersion mode to a mode in which the main menu user interface is also displayed), and provide intuitive top-level access to the different sets of representations while the user is in a non-immersion experience without displaying additional controls (e.g., without requiring the user to view user interface elements), thereby improving the operational efficiency of the user-machine interaction based on the single input. Using a single input to the input device reduces the amount of time required to navigate within or transition out of the virtual environment.

In some embodiments, the input device that receives the aforementioned single input and other inputs described herein with reference to fig. 8A-8G is a hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element). In some implementations, the hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element) is a hardware button. In some implementations, the hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element) is a solid state button. Using input to hardware buttons or solid state buttons to control the level of immersion providing application content (e.g., from a fully immersed mode to a non-immersed mode) or displaying a main menu user interface provides intuitive top-level access to basic operational functions of a computer system without displaying additional controls (e.g., without requiring a user to view user interface elements), thereby improving the operational efficiency of the computer system. The solid state buttons reduce the number of moving parts, which increases reliability and allows the system to be reconfigurable (e.g., through firmware updates that allow the solid state buttons to provide different feedback, provide other functionality, receive additional types of input), thereby increasing the performance and efficiency of the computer system.

Fig. 9A-9D illustrate examples of ways in which one or more different operations may be triggered by input to an input device, depending on the characteristics of the displayed application user interface. FIG. 15 is a flow diagram of an example method 1500 for performing one or more different operations based on (e.g., triggered by) input to an input device depending on characteristics of a displayed application user interface. The user interfaces in fig. 9A to 9D are used to illustrate the processes described below, including the process in fig. 15.

Fig. 9A shows an application user interface 9002, an application user interface 9004, an application user interface 9006, and an application user interface 9008 displayed in a virtual three-dimensional environment 9000. In some embodiments, the application user interface 9002 corresponds to a user interface of a media player application (e.g., a video player application), the application user interface 9004 corresponds to a user interface of a messaging application, the application user interface 9006 corresponds to a user interface of a calendar application, and the application user interface 9008 corresponds to a user interface of a web browsing application.

In some embodiments, a media player application with an application user interface 9002 is used in a content sharing session. For example, the user 7002 shares a movie with participants Abe, mary, isaac and Edwin of the content sharing session while playing the movie in the video player application of the application user interface 9002. In some implementations, the representations of the participants in the content sharing session are displayed as avatars on a portion of the application user interface 9002 (e.g., the representations of the participants are arranged on the left portion, the representations of the participants are arranged on the right portion, the representations of the participants are arranged on the top portion, and the representations of the participants are arranged on the bottom portion). In some embodiments, a content sharing session that includes two or more participants is also referred to as a group interaction session. For example, participants to a content sharing session may interact with each other (e.g., via chat messaging, audio calls, or video calls) while co-viewing shared content in a group interaction session.

Particular applications may be used in content sharing sessions (e.g., media player applications with application user interface 9002) or non-content sharing sessions (e.g., media player applications with application user interface 11002, as described with reference to fig. 11A and 11B), where content is presented to user 7002 by a display generating component of computer system 101 and not to additional users or participants. In some embodiments, an application running on the computer system 101 of the user 7002 in a content sharing session shares only a portion of the displayed information with other participants. For example, a document editing application in a content sharing session with two different teams of participants shares only a portion of the application user interface with participants from a first team having permission to view shared content in a first portion of the application user interface, and a different portion of the application user interface with participants from a second team having different permission levels or settings.

An application having a corresponding session (e.g., a content sharing session) that includes content shared with (e.g., displayed to) more than one user on computer system 101 on which the application is running is also referred to as a "sharing application. An application that does not have a corresponding session (e.g., a content sharing session) that includes content shared with or displayed to more than one user on computer system 101 on which the application is running is referred to as a "private application. Thus, even when multiple participants are able to content share an application, the same application may be a "private application" in the absence of any active content sharing session for that application, and when there is an active content sharing session for that application, the application is considered to be a "sharing application".

FIG. 9A illustrates an application user interface that displays multiple applications simultaneously. The plurality of applications include private applications (e.g., messaging applications, calendar applications, and web browsing applications) and/or applications used in content sharing sessions (e.g., media player applications). Displaying application user interfaces for two or more applications simultaneously allows the user 7002 to multitask, provides more information to the user 7002 without additional user input, and improves the operating efficiency of the computer system.

In addition to sharing media content for consumption in conjunction with multiple participants (e.g., via a video player application), a content sharing session may also include sharing video conferencing content between multiple participants in a video conference and/or sharing or streaming gaming content to multiple participants in a gaming application. For example, an ongoing game (e.g., a single user's game) is broadcast to multiple participants in a content sharing session of a gaming application. In some implementations, the content sharing session can include a screen image. In screen mirroring, display output provided by a display generation component of computer system 101 is additionally provided to one or more other display devices that are different from computer system 101. In some embodiments, screen mirroring is used when the application user interface is in an immersion mode (e.g., no other applications are running in the foreground of the computer system 101 of the user 7002), as described with respect to fig. 8A and 8F. In some implementations, participants in a content sharing session of an application user interface that screen mirrors in the immersion mode may also experience shared content in the immersion mode (e.g., screen mirrors provide visual output to the respective wearable devices of the participants).

In some implementations, the gaming application is a multi-player gaming application (e.g., a multi-player online tactical competition (MOBA) video game) in which users in the content sharing session do not view the same output display (e.g., each player is presented with a point of view from the perspective of the respective game character) and the gaming application is running on the respective computer system of the player. In some implementations, the multiplayer gaming application includes a content sharing session in which team members of the user 7002 (e.g., in the gaming application) receive video feeds (e.g., video feeds presented to the same display (e.g., screen mirror image) or a similar display of the user 7002 by the display generation component of the computer system 101) and audio feeds from the user 7002 during the MOBA gaming session.

In contrast, in some embodiments, the application user interfaces 9004, 9006, and 9008 as shown in FIG. 9A are all application user interfaces for applications used in non-content sharing sessions. In some implementations, one or more of the application user interfaces 9004, 9006, and 9008 are used in a content sharing session. For example, the user 7002 can place the web browsing application in a content sharing session such that the participant can view web page content (e.g., web page content including media clips) from the web browsing application in real-time in the content sharing session.

In some embodiments, one or more virtual objects (e.g., boxes 7016) are presented in a virtual three-dimensional environment 9000 that includes application user interfaces for private applications and shared applications.

When the application user interfaces of the private application and the shared application are displayed, in response to a pressing input to a hardware input element 7108 (e.g., a button, a crown, or a pressable input element), all the private applications are canceled, and a main menu user interface 7110 is overlaid on the shared application (e.g., the main menu user interface is presented in front of the application user interface 9002, closer to the user 7002 in the z-direction than the application user interface 9002), as shown in fig. 9B (e.g., fig. 9B1, 9B2, and 9B3, with a user interface similar to that shown in fig. 9B1 being shown on HMD 7100a in fig. 9B 2).

In some embodiments, the hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element) includes a rotatable input element or mechanism, such as a digital crown. Hereinafter, the hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element) is also referred to as a rotatable input element 7108 or rotatable input mechanism 7108. In some implementations, the hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element) is a hardware button. In some implementations, the hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element) is a solid state button. Providing a dedicated hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element) for receiving the first input allows a user (e.g., without having to interact with the user interface of any software application) to more quickly and responsively distinguish the shared application from the private application.

Displaying the main menu user interface 7110 in front of the application user interface 9002 allows a user to navigate a collection of applications in the main menu user interface and/or to change the user's virtual environment and/or to interact with additional users while an ongoing content sharing session is ongoing, thereby improving operational efficiency by eliminating the need to interfere with (e.g., by having to close) the content sharing session of the sharing application (e.g., the application user interface 9002) in order for a particular user to navigate the main menu user interface 7110. Reducing the amount of input required to cancel the private application and bring the shared application into focus enhances the operability of computer system 101 and makes the user-device interface more efficient, which additionally reduces power usage and extends the battery life of computer system 101 by enabling the user to use the device more quickly and efficiently.

In some embodiments, when two application user interfaces are displayed, one application user interface is a shared application and the other application user interface is a private application, press input to hardware input element 7108 (e.g., button, crown, or rotatable and pressable input element) stops the display of the private application while maintaining the display of the shared application used in the content sharing session. In some embodiments, multiple application user interfaces of a private application (e.g., without an ongoing content sharing session) and multiple application user interfaces of a sharing application are displayed. Pressing input to hardware input element 7108 (e.g., button, crown, or rotatable and pressable input element) stops display of multiple application user interfaces of the private application while maintaining display of multiple application user interfaces of the sharing application used in the content sharing session. In some embodiments, multiple application user interfaces of the shared application are displayed, and an application user interface of the private application is displayed. Pressing input to hardware input element 7108 (e.g., button, crown, or rotatable and pressable input element) stops the display of the private application while maintaining the display of multiple application user interfaces of the sharing application used in the content sharing session.

In some implementations, virtual content such as box 7016 continues to be displayed while the main menu user interface 7110 is presented to the user 7002 in response to the press input. In some embodiments, the home menu user interface 7110 is also referred to as a home screen user interface, and the home screen user interface does not necessarily block or replace all other displayed content. For example, a "home screen" refers to a virtual user interface that is optionally displayed in an XR environment, rather than a default login user interface that is displayed to a user in response to a particular predefined user interaction with computer system 101. In other words, the home screen user interface is different from a default login user interface that automatically displays various representations to a user without specific user input.

Canceling the private application of the user using the first input to the input device, while not affecting any shared application minimizes interference to both the user and other users during the shared experience, and prioritizes multi-user interactions over private application use. The ability to use the first input to distinguish between shared applications and private (e.g., non-shared) applications allows for separate control of both types of applications (e.g., prioritizing shared applications over private applications) without having to display additional controls. The amount of interference that a user may experience while in a group interaction session is reduced by using a first input to quickly cancel a private application and making a shared application more easily focus. Furthermore, the amount of input required to cancel the private application and maintain the display of the shared application is reduced—instead of having to individually minimize or cancel the application user interfaces 9004, 9006, and 9008 of the private application, the first input is sufficient to maintain the display of the shared application while stopping displaying the application user interfaces of the private application.

When both the main menu user interface 7110 and at least a portion of the application user interface 9002 of the application currently being shared in the content sharing session are displayed (this state is reached after the first press input as shown in fig. 9B), in response to a second press input to the hardware input element 7108 (e.g., a button, crown, or rotatable and pressable input element), the main menu user interface 7110 is canceled and a pass-through portion (e.g., a representation 7014' of the table 7014 in the physical environment 7000 is presented, as shown in fig. 9℃) thus, the sharing application of the application user interface 9002 is displayed simultaneously with the pass-through portion of the physical environment of the computer system 101. In other words, the second press input to the hardware input element 7108 (e.g., a button, crown, or rotatable and pressable input element) is a repeated "main input" to the hardware input element 7108 (e.g., a button, crown, or rotatable and pressable input element), which stops the display of the main menu user interface 7110 while maintaining the display of the application user interface 9002 of the shared application, thereby causing the display of the application user interface 9002 of the shared application without the main menu 10 to be presented.

As shown in fig. 9C, cancelling the display of the main menu user interface 7110 by replacing the main menu user interface with a presentation of the passthrough portion of the physical environment of the computer system 101 (e.g., a head-mounted device) via the display generating component improves the security of the user 7002, allowing the user 7002 (via the passthrough portion of the physical environment of the computer system 101) to be aware of the physical environment of the computer system 101 while not interfering with ongoing content sharing sessions involving more than one user. For example, after the user 7002 has completed navigating the main menu user interface 7110, the user 7002 may need to respond to an emergency or other situation that requires the attention of the user 7002 or that requires the user 7002 to interact with the physical environment 7000. Using the second input to activate the display of the passthrough portion of the physical environment of computer system 101 allows user 7002 to exit from the virtual environment and view at least a portion of the physical environment without displaying additional controls. In some embodiments, in addition to presenting the passthrough portion in response to the second input, the display of the virtual environment in which the main menu user interface 7110 is displayed also ceases. Stopping displaying the virtual environment while canceling the main menu user interface 7110 allows the user to exit from the virtual environment and view at least a portion of the physical environment (e.g., cancel the display of the virtual environment) by causing the second input to function similar to the input to the escape button without displaying additional controls. In some embodiments, when the main menu user interface 7110 is cancelled in response to the second input, virtual content such as box 7016 continues to be displayed as shown in fig. 9C, but in other embodiments, display of virtual content such as box 7016 is also cancelled in response to the second input.

Using a second input, such as a press input, to cancel the main menu user interface provides an efficient way to terminate navigation activity on the main menu user interface 7110 without interfering with the content sharing session of the sharing application 9002. No additional controls need to be provided to the user and the user does not need to browse any additional user interface control elements to exit the main menu user interface 7110, thereby improving the operating efficiency of the device.

Fig. 9D illustrates an application user interface 9002 in a content sharing session with four participants and an application user interface 9008 of an application (e.g., a web browsing application) that is not in any content sharing session, according to some embodiments. In some embodiments, as shown in fig. 9D, representations of participants are arranged within the shared three-dimensional environment 9200 'relative to each other (e.g., such that a respective user (or participant) views the positioning of other users (participants) relative to the respective user's point of view), referred to herein as a coexistence communication session (or spatial communication session). Both the application user interface 9002 and the application user interface 9008 are displayed in the shared three-dimensional environment 9200' for the spatial communication session.

In some embodiments, each participant in the content sharing session of the application user interface 9002 has a corresponding virtual location in the spatial communication session. For example, a representation of Abe (e.g., avatar, video stream of Abe, image of Abe) is located at position 9402 in the spatial communication session, a representation of Mary (e.g., avatar, video stream of Mary, image of Mary) is located at position 9404 in the spatial communication session, a representation of Isaac (e.g., avatar, video stream of Isaac, image of Isaac) is located at position 9406 in the spatial communication session, and a representation of Edwin (e.g., avatar, video stream of Edwin, image of Edwin) is located at position 9408 in the spatial communication session. The viewpoint of user 7002 includes the representation of Abe at location 9402 to the left of the representation of Mary at location 9404. In some implementations, the display generation component 7100 also presents a displayed representation of the user 7002 (e.g., where the user 7002's own representation is displayed in a dedicated area (e.g., upper right corner) of the display generation component). In some embodiments, no representations of one or more active participants in the communication session are displayed relative to each other within the shared three-dimensional environment 9200' (e.g., active participants are displayed in a list or gallery view, as shown in fig. 9A-9C). In some embodiments, the communication session includes a combination of users participating in the coexistence communication session (e.g., and considering the representations of other users as being arranged in a three-dimensional environment relative to each other) and additional users not considering the other users as being arranged in a three-dimensional space relative to each other (e.g., users viewing other participants in a list or gallery view).

In some embodiments, the shared three-dimensional environment 9200' is updated in real-time as users communicate with each other in a coexisting communication session (e.g., using audio, physical movement, and/or a sharing application). In some implementations, users in the coexisting communication session are not co-located with each other in the physical environment (e.g., not within a predefined physical proximity of each other), but rather share a three-dimensional environment 9200'. For example, the user views the shared three-dimensional environment 9200' from different physical environments (e.g., the shared three-dimensional environment may include one or more attributes of the physical environment of one or more of the users).

In some embodiments, as described above, the shared three-dimensional environment 9200' includes a representation for each user participating in the coexistence communication session. In some embodiments, the control user interface object includes one or more affordances for displaying additional content related to the communication session, such as affordances for changing a virtual environment (e.g., virtual landscape) for the communication session. For example, enabling a user to add a virtual object to a coexistence communication session (e.g., virtual object 7016) and/or control placement of a virtual object within shared three-dimensional environment 9200 '(e.g., by selecting a control user interface object) adjusts the virtual properties of shared three-dimensional environment 9200'. For example, the shared three-dimensional environment is enabled to be displayed with one or more topics, which are referred to herein as an immersion experience (e.g., the immersion experience includes an immersion animation or environment) applied to the three-dimensional environment 9200'. For example, the user is provided with options (e.g., using control user interface objects) for adding, removing, and/or changing virtual scenery, virtual lighting, and/or virtual wallpaper in the three-dimensional environment. In some implementations, an improved immersion experience is provided to all users participating in the coexistence communication session in response to user selection to change the current immersion experience (e.g., a respective display generation component for each participating user displays virtual content for the immersion experience).

In some embodiments, the content sharing session runs within a spatial communication session in the shared three-dimensional environment 9200', as shown in fig. 9D. For example, an application user interface 9002 (shown in fig. 9A-9C) that is currently in a content sharing session with multiple participants operates in a real-time communication session. For example, multiple participants of a real-time communication session may communicate with each other using audio (e.g., via microphones and/or speakers in communication with the user's respective computer system) and video and/or 3D representations such as avatars that represent changes in the positioning and/or expression of the participants in the real-time communication session over time. In some embodiments, audio received from a respective user is simulated as being received from a location corresponding to the current location of the respective user in the three-dimensional environment 9200'. In some embodiments, the location in the three-dimensional environment 9200' where the content sharing session occurs is different from the location of the representations of the plurality of participants.

Providing a content sharing session within a real-time communication session (e.g., a "coexistence session") expands the scope of media experience in the real-time communication session. Instead of exchanging only participant-derived data (e.g., participant or participant-generated video conference audio and video data), separate data sources (e.g., media content, game content, web page content) may be shared in a real-time communication session with multiple participants.

In some implementations, multiple participants can control playback of media (e.g., music, video, or animated content) shared in a content sharing session, for example, by performing media playback control actions such as rubbing (e.g., positioning control elements of a rub bar), fast forward, rewind, and/or play/pause. As shown in fig. 9D, when the user 7002 provides a press input to the hardware input element 7108 (e.g., a button, crown, or a depressible input element) while displaying the real-time spatial communication session, the press input causes the main menu user interface 7110 to be displayed to the user 7002 while the application user interface 9008 is canceled because the application user interface 9008 is not in any content sharing session. When the main menu user interface 7110 is displayed, another participant in the communication session may move the application user interface 9002 of the content sharing session. From the perspective of the user 7002, movement of the other participants with respect to the application user interface 9002 will result in corresponding movement of the application user interface 9002, which results in consistent spatial relationships between the user 7002, the application user interface 9002, and representations of the other participants in the shared three-dimensional environment 9200'.

For example, it represents that one of the participants Abe of the real-time communication session located at location 9402 (see fig. 9D) in the shared three-dimensional environment 9200' moves the application user interface 9002 from an original location closer to location 9010 to a new location at location 9012. In some implementations, movement of the Abe-corresponding application user interface 9002 occurs when the main menu user interface 7110 is presented to the user 7002 by the display generation component (e.g., after the user 7002 provides a press input to a hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element). In response to movement of Abe to the application user interface 9002, the spatial location of the application user interface 9002 is also updated for the user 7002 from the viewpoint of the user 7002 while the user 7002 is navigating the main menu user interface 7110.

Allowing other participants to move the user interface of the application while the first participant is navigating the main menu user interface on her separate computer system helps to minimize interference with the multi-user experience (e.g., the content sharing session of the application). For example, other participants may continue to interact with the user interface of the application in the content sharing session without regard to or being constrained by the fact that the main menu user interface is displayed for the first participant. Furthermore, allowing for simultaneous changes in spatial relationships between user interface objects representing respective content to different participants in a self-consistent manner increases the realism of a multi-user experience and better simulates a content sharing experience in a physical environment. The simultaneous change in the positioning of the user interfaces of the applications of two or more participants also eliminates the need to apply the same change, either sequentially or manually, to the application user interfaces as seen by the multiple participants (e.g., as displayed by the respective computer systems of the multiple participants), thereby improving the communication efficiency of the multiple participants.

In some embodiments, the application user interface 9002 of the sharing application or an element or respective portion of the application user interface 9002 has a shared spatial relationship with respect to a plurality of participants in the real-time communication session such that one or more user interface objects visible to the plurality of participants in the content sharing session in the real-time communication session have a consistent spatial relationship from different viewpoints of the plurality of participants in the content sharing session. In some embodiments, when an application user interface of a shared application is moved by a first participant of a real-time communication session, for other participants, the application user interface of the shared application is moved accordingly from their respective point of view. In other words, when the user interface object of the sharing application is moved by one participant, the user interface object moves for all participants of the real-time communication system in a manner that maintains a consistent spatial relationship between the user interface object and all participants.

For example, when the application user interface 9002 moves from an original location proximate to location 9010 to a new location at location 9012, its representation is located at location 9408 in the shared three-dimensional environment 9200' will have an updated spatial relationship with the relocated application user interface 9002 that reflects the closer spatial proximity between the application user interface 9002 at location 9012 and the represented location 9408 of the Edwin. Similarly, from the point of view of the representation of Edwin at location 9408, mary (which is represented at location 9404) will appear behind the application user interface 9002 (at new location 9012) rather than in front of the application user interface 9002, as was the case when the application user interface 9002 was at previous location 9010.

Maintaining a consistent spatial relationship between the user interface object and the participant includes, for example, for a first user interface object representing the respective content to a first participant (e.g., if Mary is the first participant and the application user interface 9002 is the first user interface object representing the media content to Mary), and the point of view of the first participant from the perspective of the first participant (e.g., from the perspective of Mary at location 9404, the point of view of Mary may include a direct line of sight to Edwin before the application user interface 9002 moves from location 9010 to location 9012, but after the application user interface 9002 moves to location 9012, the application user interface 9002 blocks direct view of Edwin) co-directionally with a second user interface object representing the respective content to a second participant (e.g., if Edwin is the second participant, the second user interface object presents the application user interface 9002 to Edwin at his computer system relative to the virtual spatial location at 9408, and the user interface is in front of the box, and in a new spatial relationship to the user interface 9002 from the perspective of the first participant from the perspective of the second participant (e.g., from the point of view of Edwin is in front of the box, at location 7016).

Further, the spatial relationship between the second user interface object representing the respective content to the second participant and the viewpoint of the second participant from the perspective of the second participant (e.g., the viewpoint of Edwin from the perspective of Edwin at location 9408 may include a direct line of sight to Abe at location 9402 before the application user interface 9002 moves from location 9010 to location 9012, but after the application user interface 9002 moves to location 9012, the application user interface 9002 blocks direct view of Abe at location 9402 by Edwin) is consistent with the spatial relationship between the first user interface object representing the respective content to the first participant and the representation of the second participant from the perspective of the first participant (from the perspective of Mary, the representation of Edwin at location 9408 will appear behind the application user interface 9002 at new location 9012).

The user interface allowing one participant to move an application with respect to another participant eliminates the need to apply the same changes to multiple participants sequentially or manually, thereby improving the communication efficiency of the multiple participants. Allowing simultaneous changes in spatial relationships between user interface objects representing respective content to different participants in a self-consistent manner increases the realism of a multi-user experience and better simulates the content sharing experience in a physical environment. Each participant may independently position itself at a location in the shared three-dimensional environment 9200' relative to a user interface object that represents the corresponding content selected/trimmed for a particular user. The spatial relationship selected for a particular user (the spatial relationship between the user interface object and the representation of the particular user) will not affect the spatial relationship desired by another user. Allowing different spatial relationships between an application or an element or portion of an application and different users enhances the ability of different users to control their individual interactions (e.g., viewing interactions) with the application or the element or portion of the application.

Fig. 10A to 10D illustrate an example of the reset input registration process. Fig. 16 is a flow diagram of an example method 1600 for resetting an input registration process. The user interfaces in fig. 10A to 10D are used to illustrate the processes described below, including the process in fig. 16.

Fig. 10A illustrates an application user interface 10000 displayed in a virtual three-dimensional environment. The virtual three-dimensional environment includes one or more computer-generated objects, also referred to as virtual objects, such as boxes 7016 (e.g., which are not representations of physical boxes in physical environment 7000). In some embodiments, application user interface 10000 corresponds to a user interface of a software application (e.g., a web browser, messaging application, map application, video player or audio player, email application, or other software application) executing on computer system 101. In some implementations, the application user interface 10000 is displayed in a middle portion of the virtual three-dimensional environment within a central portion of the field of view of the user of the device (e.g., along the gaze direction of the user, a front view of the application user interface 10000 is provided to the user 7002 such that the application user interface 7018 appears substantially at the gaze height of the user 7002).

Although not shown in fig. 10A, the user 7002 interacts with the application user interface 10000 using user input such as hand gestures or gaze input. For example, the user 7002 uses a hand gesture (e.g., a tap input on a guide hyperlink) directed to a hyperlink in the application user interface 10000 of the web browsing application to view a web page associated with the hyperlink using the web browsing application. Alternatively, the user 7002 uses a pinch and drag gesture directed to the web browsing application to scroll to a different web page (e.g., a previously accessed web page) in the web browsing application. In addition to the hand gestures, in some embodiments, the user 7002 directs her gaze to a portion of the application user interface 10000 (e.g., for a longer period of time than a preset threshold) to trigger different operations in the application user interface 10000 (e.g., selecting a hyperlink in a web browsing application, zooming in on a display of a portion of the application user interface 10000, playing or pausing a media item (e.g., video clip, audio clip) in the application user interface 10000).

The prior biometric input enrollment process provides first input enrollment information to computer system 101 to resolve the hand gestures and gaze inputs of user 7002 such that computer system 101 (1) maps inputs from user 7002 to point to corresponding locations in the virtual three-dimensional environment and (2) interprets the hand gestures or gaze inputs of user 7002 as corresponding to particular operations (e.g., tap inputs, pinch and drag inputs, gaze inputs of a particular duration) that interact with control elements in application user interface 10000.

In some embodiments, the prior biometric input enrollment process is initiated when computer system 101 is first used, or when computer system 101 is reinitialized after a software update of computer system 101.

When the first input enrollment information collected from the previous biometric input enrollment process introduces an error (e.g., a calibration error), such that when the user 7002 interacts with the application user interface 10000, the user's gaze input or hand gesture is not properly interpreted by the computer system 101 (e.g., tapping on an unwanted offset (e.g., lateral offset, vertical offset, or intermediate offset) in the input that causes the user interface control element (e.g., hyperlink) to the left or right (or up or down) of the intended target to be triggered), the pinch and drag input is not registered to detect pinch or drag, the gaze input detected with the unwanted offset makes performing an operation on a portion of the application user interface 10000 that is different from the intended portion, further user-machine interaction via the hand gesture and/or gaze input difficult and frustrating.

In some embodiments, the first input registration information includes information about a first type of input (e.g., a user's gaze input or hand gesture) that is determined based on a location and/or movement of a first biometric feature of the user 7002 (e.g., a location and/or movement of an eye, pupil, face, head, body, arm, hand, finger, leg, foot, toe, or other biometric feature of the user 7002).

In some embodiments, the error from the biometric input enrollment process is due to damage that alters the appearance or other characteristics of the user's finger, wrist, arm, eye (e.g., due to infection or a change in contact lens type/color) or alters sound (e.g., due to disease).

Instead of using inaccurately calibrated user inputs (e.g., gaze or hand inputs) to trigger a reset process for collecting new biometric input enrollment information for those same inputs, the user 7002 can initiate a new biometric input enrollment process by providing user inputs to an input device (e.g., a button, dial, rotatable input element, switch, movable component, or solid state component, or touch-sensitive surface (e.g., a device that detects local sensor inputs such as intensity or force sensor inputs, which the computer system uses to trigger corresponding operations and optionally provide tactile feedback, such as tactile feedback corresponding to the detected inputs)) having different modalities than those the user 7002 wants to reset. For example, allowing biometric input registration reset for eye/gaze or hand gestures using different types of inputs (e.g., pressure/touch) on different input devices (e.g., hardware buttons such as button 7508) allows calibration of input (e.g., gaze or hand gestures) reset for a first modality of input (e.g., tactile touch/mechanical actuation). A more reliable input mode (e.g., tactile touch/mechanical actuation on hardware/solid state buttons) that does not require calibration may be used to initialize calibration correction in one modality (gaze/eye tracking), which improves reliability and operational efficiency of computer system 101.

According to some embodiments, table 1 below describes the behavior of computer system 101 in response to different operations on buttons 7508 (e.g., hardware buttons, solid state buttons).

For computer system 101 as a wearable device (e.g., a head-mounted device, a strapping device, a watch), when the wearable device is turned on and worn on the body of user 7002, in response to detecting four consecutive presses of button 7508 (e.g., and/or button 701, button 702, and/or digital crown 703 and/or 7108 of HMD 7100 a) (e.g., hardware buttons, solid state buttons, or other hardware input elements) within a preset period of time (e.g., less than 7 seconds, less than 5 seconds, less than 3 seconds), a biometric input enrollment process is triggered and user interface 10004 is displayed as shown in fig. 10B (e.g., fig. 10B1, 10B2, and 10B 3), with a user interface similar to that shown in fig. 10B1 being shown on HMD 7100a in fig. 10B 2. In some embodiments, the wearable device is placed in a "store mode" (e.g., for display to shoppers in the store), and four consecutive presses within a preset period of time cause the device to reset the store presentation mode (e.g., a mode in which the computer system operates through different functionalities of the wearable device to present product features to customers). In some embodiments, the wearable device is placed in a "shipping mode" (e.g., for shipping to a customer for use outside of a store), and four consecutive presses within a preset period of time cause the device to reset the biometric input enrollment mode.

In addition to triggering the biometric input enrollment reset, input to button 7508 causes the wearable device to perform various operations. When the wearable device is turned on and worn on the body of the user 7002, a capture switch to the wearable device is triggered in response to detecting a single press input to the button 7508. For example, in some embodiments, a single press allows the wearable device to begin video recording. A second press after the wearable device has started video recording then causes the wearable device to stop recording. In some embodiments, the video record is a record of a three-dimensional virtual environment displayed by a display generation component of the wearable device. In some embodiments, the video recording is a recording of a three-dimensional augmented reality (XR) environment that includes both computer-generated content and a transparent portion of the physical environment of the wearable device that are visible via a display generating component of the wearable device. In some embodiments, the video recording is merely a recording of the physical environment of the wearable device.

In some implementations, a single press switches the video recording mode off or on. For example, a first single press switches the video recording mode on. When a press and hold input is detected on button 7508, video is captured while button 7508 is held. The second single press switches the video recording mode off and the detected press and hold input on button 7508 does not cause the wearable device to record any video.

In some implementations, a pressing input to button 7508 causes the wearable device to capture still media and/or video media (e.g., capture media rendered visible via the display generation component).

In some embodiments, when the wearable device is turned on and worn on the body of the user 7002, a screen shot of the display generated by the display generation component is captured using the press and hold input on button 7508. For example, a screenshot is captured when button 7508 receives a press and holds an input.

When the wearable device is turned on and worn on the body of the user 7002, in response to detecting two press inputs to the button 7508 for the button 7508 within a preset period of time (e.g., less than 3 seconds, less than 2 seconds, less than 1 second), and in accordance with a determination that a transaction (e.g., a purchase transaction, a funds transfer transaction, or a payment transaction) is active (e.g., ongoing or in a current session) on the wearable device, a payment confirmation for the transaction is activated (e.g., a visual indication is displayed confirming a payment process for the transaction).

In some embodiments, button 7508 is also used to turn the wearable device on or off. When the wearable device is turned off, a press input on button 7508 that is held (e.g., persists) for a preset time duration (e.g., about 2 seconds or about 5 seconds) causes the wearable device to turn on.

In some embodiments, when the wearable device is turned on, but not worn on the body of the user 7002, the wearable device transitions from the sleep state to the standby state in response to detecting a press input to the button 7508, as explained in more detail in fig. 12A-12G. Thus, button 7508 is also used to wake the wearable device from a sleep mode (or state) to a standby mode.

Providing a dedicated button 7508 (e.g., solid state button, hardware button) for resetting other types of user inputs (e.g., hand tracking or gaze tracking) allows the user 7002 (e.g., when using any software application) to trigger an input registration reset more quickly and responsively. Instead of wasting time closing an application and/or navigating to a particular user interface control element using inaccurately calibrated biometric input, using physical buttons 7508 (e.g., hardware, solid state buttons) to quickly trigger a user input enrollment reset for a first type of input, rather than relying on inaccurately calibrated input (e.g., biometric input) to trigger a user input enrollment reset increases the operating efficiency of computer system 101.

Fig. 10B illustrates a user interface 10004 for a biometric input enrollment process. The user interface 10004 provides a confirmation prompt to the user 7002 as to whether to reset biometric input enrollment. In some embodiments, user interface 10004 is displayed in a central portion of the field of view of user 7002, e.g., to overlay other displayed content (e.g., computer generated content such as box 7016, which is not a representation of a physical box in physical environment 7000; or a transparent portion of physical environment 7000 of a wearable device). In some implementations, from the perspective of the user 7002, the user interface 10004 overlays content when displayed closer to the user 7002 within the field of view of the user 7002 than other displayed content.

In response to the user 7002 confirming on the user interface 10004 that the biometric input enrollment should be reset (e.g., by directing user input (tap input, gaze input) to a "yes" user interface control element), the one or more display generation components display a biometric input enrollment experience to the user 7002 as shown in fig. 10C.

In response to the user 7002 indicating that biometric input enrollment should not be reset (e.g., by directing user input (tap input, gaze input) to a "no" user interface control element in the user interface 10004), the user interface 10004 is canceled and the user 7002 is able to continue navigating in the virtual 3D environment using biometric data collected from previous biometric input enrollment processes (e.g., the user 7002 will not be provided with a biometric input enrollment experience). For example, the user 7002 may have changed her mind after providing a pressing input to the button 7508, or the user 7002 may have made an unintended pressing input that results in an unintended display of the user interface 10004.

Fig. 10C illustrates an example of a biometric input enrollment process according to some embodiments. During the biometric input enrollment process, the user interface element 10006 presents instructions to the user 7002 for completing the biometric input enrollment process for a first type of input (e.g., gaze of the user 7002 or hand tracking of the user 7002). For example, the instructions include a request for the user 7002 to slowly rotate her head. For example, images captured while slowly rotating the head of the user 7002 may calibrate various characteristics of the eyes of the user 7002, e.g., displacement of the eyes of the user 7002 as a function of the rotation of the head of the user 7002. These instructions may include requiring the user 7002 to rotate her head in a particular direction, requiring the user 7002 to look at a displayed virtual object (e.g., while the virtual object is being moved or displayed at a fixed location), moving the user's 7002 hand to various locations, and/or performing different hand gestures. Based on the user action performed according to the presented instruction, second input registration information (new input registration information) for the first type of input is collected.

For example, when the gaze registration of the user 7002 is to be reset, second input enrollment information is derived from a first biometric feature extracted from data collected by one or more input devices. In some embodiments, the one or more input devices for obtaining the second input registration information include a camera 10010 integrated on the wearable device. In some embodiments, the one or more input devices may be devices that are physically separate from the wearable device (e.g., beacons or scanners located in the physical environment of the wearable device).

In some implementations, when the second input registration information is being collected, the user interface element 10008 displays a progress indicator (e.g., a status bar, or a circular arrangement of tick marks, with the tick marks that are highlighted or highlighted indicating the progress of the input registration process) for the input registration process. In some implementations, a visual indication of the second input registration information being collected is provided to the user (e.g., an image of the face of the user 7002, an image of the hand of the user 7002 as the head of the user 7002 rotates).

Examples of first biometric features include a location and/or movement of a user's eye, a determination and/or calibration of an inter-pupillary distance of the user 7002, a size of an iris of the user 7002, and/or a range of angular movement of the user's 7002 eye. In some embodiments, the one or more input devices include a camera (optical/visible spectrum RGB camera, infrared camera, or thermal imaging camera) that captures a two-dimensional image of the biometric feature of the user 7002.

For example, when enrollment of the hand movements (e.g., hand gestures and/or other tracking of the hand movements) of the user 7002 is to be reset, second input enrollment information is derived from the first biometric feature, which is extracted from data collected by one or more input devices. Examples of first biometric features include positioning and/or movement of one or more portions of the user's 7002 hand (e.g., determining and/or calibrating the size of the user's 7002 hand, the range of motion of the user's 7002 hand or wrist, the length of different joints in the user's 7002 hand, and/or the range of motion of different joints in the hand). In some embodiments, the one or more input devices include a camera (optical/visible spectrum RGB camera, infrared camera, or thermal imaging camera) that captures a two-dimensional image of the biometric feature of the user 7002.

In some embodiments, statistics and second input registration information from a previous input registration process are extracted and a weighted average of all collected input registration information is used to calibrate the first type of input.

When the second input registration information has been collected, the wearable device provides an indication to the user 7002 that the input registration has been successfully reset (e.g., a visual indication 10012 provided by the display generation component, an audio indication provided by the audio system of the wearable device, and/or a tactile indication provided by the wearable device).

Performing the new operation based at least in part on the second input registration information of the first type of input allows the new operation to be performed using the first type of input that is better calibrated, updated, and/or improved, thereby improving the operational efficiency of user-machine interactions based on the first type of input.

Allowing input registration resets for hand tracking using different types of inputs (e.g., pressure/touch) on different input devices (e.g., hardware/solid state buttons) allows inputs of a first modality (e.g., tactile touch/mechanical actuation) to reset calibration for different modalities (e.g., hand tracking, visual hand tracking, infrared hand tracking). A more reliable mode (e.g., tactile touch/mechanical actuation on hardware/solid state buttons) that does not require calibration may be used to initialize calibration correction in one modality (hand tracking), which improves reliability and operational efficiency of computer system 101. Instead of having the user 7002 use a first type of input (e.g., gaze input, hand gestures) to navigate through user interface elements (e.g., menus or other control elements) in order to reset input registration for the first type of input, using a second type of input (e.g., press input to button 7508) to initialize input registration improves operational efficiency, reduces user frustration, and reduces the number of inputs required to initialize the input registration reset process. Resetting the input registration using the second type of input also helps to reduce the amount of time required to begin the input registration reset process. For example, using the second type of input enables an input registration reset to be initialized without displaying additional controls (e.g., using the first type of input to browse user interface elements).

According to some embodiments, table 2 below describes the behavior of computer system 101 in response to different inputs to button 7508 (e.g., hardware button, solid state button) and the second input device (second button). In some embodiments, input to button 7508 is concurrent with or overlaps with input to the second input device, and input to button 7508 is detected in conjunction with input to the second input device. In some embodiments, which system operation(s) to perform depends on the duration and style of input to button 7508 and the second input device. The simultaneously detected inputs are sometimes referred to as reconciliation inputs.

Using button 7508 in conjunction with one or more other input devices as a harmonic input to perform other system operations (e.g., operations not specific to a particular application) allows for various system operations to be performed without displaying additional controls. Furthermore, the combined use of more than one input device to achieve system operation (e.g., application-specific operation) reduces the number of different input devices that must be provided to accomplish different tasks (e.g., N input devices may achieve M operations, where N < M). Reducing the number of input devices that must be provided reduces physical clutter on the device, freeing up more physical space on the device and helping to prevent accidental input from inadvertent contact. Reducing the number of input devices also reduces the need to provide additional hardware wiring within the device, and instead, the processor may be programmed to interpret reconciled inputs from a fewer number of input devices.

According to some embodiments, table 2 below describes the behavior of computer system 101 in response to different joint operations on a first input device (e.g., button 7508) and a second input device (e.g., hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element)).

For computer system 101 as a wearable device (e.g., a head-mounted device, a strapped device, a watch), the wearable device captures a screenshot of a display provided by the display generation component in response to detecting a press input on both the button 7508 and the second input device (e.g., hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element)) and a concurrent release of the press input on both the button 7508 and the second input device when the wearable device is turned on and worn on the body of the user 7002.

When the wearable device is turned on and worn on the body of the user 7002, in response to detecting a press input on both the button 7508 and the second input device (e.g., the second button) and maintaining a first input for both the button 7508 and the second input device, the wearable device captures a screenshot of the display provided by the display generation component. In some embodiments, the screen shot is a snapshot of the three-dimensional virtual environment displayed by the display generation component of the wearable device. In some embodiments, the screen shots are snapshots of a three-dimensional mixed reality environment that includes both computer-generated content and a passthrough portion of the physical environment of the wearable device, which are visible via a display generating component of the wearable device. In some embodiments, the screen shots are merely snapshots of the physical environment of the wearable device.

When the wearable device is turned on and worn on the body of the user 7002, the combined input powers down the wearable device in response to detecting a press input on both the button 7508 and the second input device (e.g., the second button) and detecting that the two press inputs are held for a first time threshold (e.g., longer than 2 seconds, longer than 5 seconds, longer than 10 seconds).

When the wearable device is turned on and worn on the body of the user 7002, in response to detecting a press input on both the button 7508 and the second input device (e.g., hardware input element 7108 (e.g., button, crown, or rotatable and pressable input element)) and detecting that the two press inputs are held for a second time threshold (e.g., longer than 3 seconds, longer than 7 seconds, longer than 14 seconds) that is longer than the first time threshold, the combined input forces the wearable device to restart (e.g., the wearable device shuts down and reinitializes the startup procedure, all applications previously running are closed before the press input is detected).

In response to detecting a press input on both the button 7508 and the second input device (e.g., hardware input element 7108 (e.g., button, crown, or depressible input element)) and detecting a release of the press input from the button 7508, while the press input is still applied to the second input device, the combined input activates a Device Firmware Update (DFU) mode. In some implementations, the DFU mode allows the wearable device to partially or fully update firmware on the device. In some implementations, the DFU is an alternative boot mode for the system, which is similar to the recovery mode. In at least some such embodiments, the same combined input activates the DFU mode when the wearable device is turned on and worn by the user 7002, and when the wearable device is turned off and connected to the power cable.

Fig. 11A-11F illustrate examples of adjusting immersion levels of an augmented reality (XR) experience of a user in a three-dimensional environment. Fig. 17 is a flow diagram of an example method 1700 for adjusting an immersion level of an XR experience of a user. The user interfaces in fig. 11A to 11F are used to illustrate the processes described below, including the process in fig. 17.

Fig. 11A shows an application user interface 11002 displayed in a three-dimensional environment 11000 that includes a representation 7014 'of a physical object 7014 (e.g., a physical table), representations 7004' and 7006 'including physical walls 7004 and 7006, respectively, and a representation 7008' of a physical floor 7008. In some implementations, both a computer-generated virtual object (e.g., box 7016) that does not correspond to any particular object in physical environment 7000 and a representation of the object in physical environment 7000 are displayed to user 7002.

In some implementations, as shown in fig. 11A, the application user interface 11002 provides media content to the user 7002. For example, the application user interface 11002 is a video player including a playback control display. In some implementations, the physical environment 7000 is a spatially constrained space (e.g., an indoor space) surrounded by walls, physical objects, or other obstructions. Features of physical environment 7000 are visible in three-dimensional environment 11000 (e.g., an XR environment) displayed to user 7002 by the presence of representations 7004 'and 7006' of physical walls 7004 and 7006, representation 7008 'of physical floor 7008, and representation 7014' of physical object 7014 (e.g., a table), as shown in fig. 11A. In some embodiments, computer system 101 also displays to user 7002 computer-generated virtual content, such as box 7016, that has no correspondence in physical environment 7000.

As shown in fig. 11A, a first portion of the three-dimensional environment 11000 includes computer-generated virtual objects that are not present in the physical environment 7000, while a second portion of the virtual three-dimensional environment includes representations of the objects in the physical environment 7000 that are displayed as part of the three-dimensional environment 11000. Fig. 11A illustrates a first immersion level generated by the display generation component of computer system 101 in which the display of the XR environment simultaneously includes the transparent portion of the physical environment 7000 of computer system 101, virtual content from application user interface 11002, and computer-generated virtual content, such as box 7016, that is different from the content provided by application user interface 11002.

In some embodiments, as the immersion level increases, the amount of input signal from the physical environment 7000 received by the user 7002 decreases. For example, reducing input from the physical environment 7000 by not displaying representations 7004', 7006', and 7008' to the user 7002 and/or displaying computer-generated virtual content providing visual input to the user 7002 to simulate a more spacious environment (e.g., open field, outdoor space, or external space) than provided by the physical environment 7000 increases the level of immersion provided to the user 7002 in the three-dimensional environment 11000. In some embodiments, increasing the level of immersion provided to the user 7002 reduces the impact of the physical environment 7000 on the user 7002.

The user 7002 may wish to be more fully immersed in the content of the media provided through the application user interface 11002. Without exiting the application user interface 11002, the user 7002 provides rotational input to the rotational input element 7108 (e.g., by rotating the rotational input element 7108). In some implementations, the user 7002 provides a first rotational input to the rotational input element 7108 by turning in a first rotational direction (e.g., clockwise rotation or counterclockwise rotation). The first rotational direction changes the level of immersion presented by the display generation component (e.g., increases the level of immersion or decreases the level of immersion). For example, by rotating in a first rotational direction that increases the immersion level, the passthrough portion of the physical environment 7000 is displayed with lower fidelity and/or fewer passthrough portions of the physical environment 7000 are displayed (e.g., some of the passthrough portions of the experience cease to be displayed) than before the first rotational input. For example, the representation 7014' ceases to be displayed as shown in fig. 11B (e.g., fig. 11B1, 11B2, and 11B3, with a user interface similar to that shown in fig. 11B1 shown on HMD 7100a in fig. 11B 2). Alternatively, computer-generated content shown by dashed lines is presented to the user 7002 as virtual content 11004.

In addition to increasing the immersion level during the media consumption experience (e.g., watching video or listening to music), the immersion level may also be increased to help the user 7002 focus on work. For example, increasing the immersion level reduces noise from the surrounding physical environment and/or presents content from a smaller number of applications (e.g., a single application, word processing application) to the user 7002 (e.g., notifications from other applications are blocked when the user 7002 is using a particular application at a high immersion level).

Increasing the proportion of virtual (e.g., computer-generated) content that is not a representation of the physical environment 7000 increases the level of immersion presented to the user 7002. Virtual content generally refers to content that is different from a representation of the physical world (e.g., physical environment 7000). For example, presenting a greater number of computer-generated virtual content (such as box 7016) without correspondence in physical environment 7000 increases the level of immersion represented to user 7002. In some implementations, computer-generated content that was previously presented at a lower level of immersion continues to be displayed as the level of immersion increases (e.g., display of computer-generated content is maintained as the level of immersion increases).

In some implementations, the immersion level presented to the user 7002 is related to an associated size (e.g., magnitude) of a spatial range (e.g., angular range) of the field of view in which the computer-generated content is displayed. At lower immersion levels, the computer-generated virtual content is displayed within a smaller field of view. At higher immersion levels, the computer-generated virtual content is displayed within and covers a larger field of view.

For example, in fig. 11A, although box 7016 is computer-generated virtual content, it appears in the peripheral portion of the field of view of user 7002. In contrast, as shown in fig. 11B, in response to detecting a rotational input to the rotatable input element 7108 in a first direction, the passthrough portion of the representation 7014' is shown stopped from displaying and a first portion (e.g., a central portion) of the field of view of the user 7002 is replaced with the computer generated virtual content 11004. In some implementations, the computer-generated virtual content occupies an extended portion of the field of view of the user 7002, beginning in a central portion of the field of view of the user 7002.

In some embodiments, the central portion of the field of view of the user 7002 coincides with the middle region of the application user interface 110002. In some embodiments, the immersion level has an associated perspective that is the angular size of the cone of view of the field of view in which the computer-generated virtual content is displayed. A higher immersion level has a larger associated view angle, while a lower immersion level has a smaller associated view angle.

In some embodiments, the viewing angle is about ±10° for the immersion level shown in fig. 11B. For example, for a view angle of about ±10°, from the view angle of the user 7002, the computer-generated virtual content starts to be displayed at an angle of about 10 ° to the left from the axis of the user 7002 (e.g., an axis located in the sagittal plane of the user 7002, e.g., the axis of the user 7002 protrudes perpendicularly forward from the front surface plane of the body of the user 7002) and at an angle of about 10 ° to about 10 ° from above to about 10 ° below the axis. In some implementations, the computer-generated virtual content occupies (e.g., fully occupies) the field of view of the user 7002 within the viewing angle. In some implementations, the computer-generated virtual content portion occupies a portion of the field of view of the user 7002 within the perspective, but no input from the physical environment 7000 of the computer system 101 is provided in the field of view of the user 7002 within the perspective.

In some embodiments, reducing the immersion level involves changing the immersion level from an initial Virtual Reality (VR) environment in which a transparent portion of a physical environment of the computer system is not displayed to a first immersion level that includes a display of the XR environment. In some embodiments, the highest immersion level of the three-dimensional environment is a virtual reality environment in which the transfused portion of the physical environment is not provided.

The immersion level affects the perceived experience of the user by changing the properties of the mixed reality three-dimensional environment. Changing the immersion level changes the relative salience of the virtual content to content (visual and/or audio) from the physical world. For example, for an audio component, increasing the immersion level includes, for example, increasing noise cancellation, increasing the spatial nature of spatial audio associated with the XR environment (e.g., by moving the audio sources to more points around the user or increasing the number and/or volume of point sources of audio), and/or by increasing the volume of audio associated with the virtual environment. In some embodiments, increasing the immersion level changes the degree to which the mixed reality environment reduces (or eliminates) signals from the physical world (e.g., audio and/or visual transmission of a portion of the physical environment of the computer system) presented to the user. For example, increasing the immersion level includes increasing the proportion of the visual field of view in which the virtual content is displayed, or decreasing the saliency of the representation of the real world (e.g., physical environment 7000) by dimming, fading, or reducing the amount of the representation of the real world displayed to the user.

Changing the immersion level may also include changing a visual presentation of the mixed reality environment, including the extent of the field of view and the extent to which the visibility of the external physical environment is reduced. Changing the immersion level may include changing the number or degree of sensory modalities that the user may use to interact with the mixed reality three-dimensional environment (e.g., through the user's voice, gaze, and body movements). Changing the immersion level may also include changing the degree to which the mixed reality environment simulates the fidelity and resolution of the desired environment. Changing the immersion level may also include modifying the point of view of the mixed reality environment to a degree that matches the point of view or perspective of the user, for example, by capturing movement of the user and adjusting the portion of the three-dimensional environment that is within the field of view in time.

At the immersion level shown in fig. 11B, some visual input from the physical environment 7000 is still provided to the user 7002. In some embodiments, representations 7004', 7006', and 7008' are provided with lower fidelity (e.g., insufficient focus, with lower contrast, or more monochromatic) in the immersion level shown in fig. 11B than in fig. 11A. Thus, while still providing input to the user 7002 from the physical environment 7000, the visibility of the external physical environment is reduced at higher immersion levels compared to lower immersion levels.

In addition to changing the level of immersion presented to the user 7002, the rotatable input mechanism 7108 can also receive one or more press inputs that cause a computer system (e.g., a wearable device) to perform various operations, as described in table 3 below.

Using a single input device (e.g., rotatable input mechanism 7108) that accepts two (or more) different types of inputs (e.g., rotational input as a first type of input and/or press input as a second type of input) reduces the number of different input devices that must be provided to request or indicate that different functionalities are performed. Reducing the number of input devices that must be provided reduces the manufacturing cost of the computer system and reduces the number of components in the computer system that may fail. Reducing the number of components reduces the cost and complexity of manufacturing the computer system and improves the reliability of the computer system. Reducing the number of input devices also reduces physical clutter on computer system 101, freeing up more physical space on computer system 101 and helping to prevent accidental input from inadvertent contact.

According to some embodiments, table 3 below describes the behavior of computer system 101 in response to different operations on hardware input elements 7108 (e.g., buttons, crowns, or depressible input elements) (e.g., hardware buttons, solid state buttons).

Number of presses	Equipment is opened (already wearing)	Equipment is opened (not worn)	Device shutdown
				1 Press	Showing the main UI or pass through or exit full screen	Standby (temporary tracking state)	N/A
Pressing for 2 times	Forced exit menu	N/A	N/A
				3 Presses	Barrier-free mode	N/A	N/A
Rotation of	Altering immersion	N/A	N/A
				Press and hold	Re-centering (fade-out and fade-in)	N/A	N/A

For computer system 101 as a wearable device (e.g., a head-mounted device, a strapping device, or a watch), the level of immersion presented by the wearable device changes in the manner described above with reference to fig. 11A and 11B in response to detecting a rotational input to rotatable input element 7108 (e.g., a bi-directional rotatable input element) when the wearable device is turned on and worn on the body of user 7002. For example, when a rotational input in a first rotational direction is provided to the rotatable input element 7108, the immersion level increases. As another example, when the wearable device is turned on, but not worn on the body of the user (e.g., user 7002), no operation is triggered (e.g., the wearable device remains in a sleep state) in response to detecting a rotational input to the rotatable input element 7108.

For computer system 101 as a wearable device (e.g., a head-mounted device, a strapping device, or a watch), when the wearable device is turned on and worn on the body of user 7002, in response to detecting a single press input to rotatable input element 7108, a main menu user interface is presented to user 7002, as described with reference to fig. 7A-7O, or an application user interface in an immersive or full screen display mode exits full screen or immersive display mode when a single press input is detected, as described with reference to fig. 8A-8G. Accordingly, the wearable device is configured to perform different operations depending on the type of user input (e.g., a press input or a rotation input) provided to the rotatable input element 7108.

For the computer system 101 as a wearable device (e.g., a head-mounted device, a strapping device, or a watch), when the wearable device is turned on but not worn on the body of the user 7002, the wearable device transitions from a sleep state to a standby state in response to detecting a single press input to the rotatable input element 7108, as described with reference to fig. 12A to 12G.

For computer system 101 as a wearable device (e.g., a head-mounted device, a strapped device, or a watch), when the wearable device is turned on and worn on the body of user 7002, an operating system menu (e.g., a forced exit menu) is presented to user 7002 in response to detecting two press inputs to rotatable input element 7108 within a preset time interval (e.g., less than 3 seconds, less than 2 seconds, less than 1 second), as described with reference to fig. 7O.

For computer system 101 as a wearable device (e.g., a head-mounted device, a strapped device, or a watch), when the wearable device is turned on and worn on the body of user 7002, the user 7002 is presented with the option of entering the unobstructed mode in response to detecting three press inputs to rotatable input element 7108 within a preset time interval (e.g., less than 5 seconds, less than 3 seconds, less than 2 seconds).

In some embodiments, the rotatable input mechanism 7108 also serves to re-center the field of view of the user 7002. For example, instead of aligning the central portion of the view of the field of view of the user 7002 with the middle portion of the application user interface 11002, the user 7002 re-centers the central portion of his field of view to a different location in the three-dimensional environment 11000. For example, the new center of the field of view of user 7002 corresponds to a point along the intersection of representations 7004 'and 7006'. In some implementations, re-centering the field of view includes computer-generated virtual content that fades out of the field of view of the user 7002 at a previous center position of the field of view of the user 7002 and then fades in computer-generated virtual content in the region at a newly defined center position of the field of view of the user 7002. Optionally, the virtual content is presented with higher fidelity than before re-centering the user's field of view or is displayed with higher contrast than before re-centering the user's field of view. When the wearable device is turned on and worn on the body of the user 7002, a press input that is maintained (e.g., persists) on the rotatable input mechanism 7108 for a preset time duration (e.g., about 2 seconds or about 5 seconds) causes the wearable device to begin the above-described re-centering operation. While maintaining the press input, the user 7002 may rotate or move her head so that the central portion of her field of view is repositioned to a new position. Upon selection of a new location, release of the press input re-centers the central portion of the field of view of the user 7002 to the new location.

As shown in table 3, a different number of inputs (e.g., separate or sequential inputs) of a different type than the rotational input (e.g., the press input) causes different operations to be performed by computer system 101. For example, in some implementations, for a single press input, (1) a main menu user interface is displayed, (2) a pass-through portion of the physical environment is provided, or (3) the application exits full screen mode. For two press inputs provided in close proximity (e.g., within 3 seconds, within 2 seconds, within 1 second), a force exit menu is displayed. For three press inputs provided in close proximity (e.g., within 5 seconds, within 3 seconds, within 2 seconds), the unobstructed mode is activated, or an option is provided to activate the unobstructed mode.

Using the number of press inputs to determine which operation(s) to perform reduces the number of different input devices that must be provided to accomplish the differences. Reducing the number of input devices that must be provided reduces physical clutter on the device, freeing up more physical space on the device and helping to prevent accidental input from inadvertent contact. Reducing the number of input devices also reduces the need to provide additional hardware wiring within the device, and alternatively, the processor may be programmed to interpret more types of input from a particular input device (e.g., based on the number of press inputs).

The use of a rotary input mechanism allows a user to easily provide an input range, which may be a continuous range or a range covering a series of discrete steps or values, and the bi-directionality of the rotary input mechanism allows the input to be easily and intuitively changed in either direction without having to display additional controls to the user. The same rotational input mechanism 7108 can receive a second type of input (e.g., a press input) requesting or indicating to perform a discrete function (e.g., cancel or display a user interface object). Reducing the number of input devices that must be provided reduces physical clutter on the device, freeing up more physical space on the device and helping to prevent accidental input from inadvertent contact. The use of a rotational input mechanism provides direct access to changes in immersion levels and execution of different operations, reducing the amount of time required to achieve a particular result (e.g., the user does not have to navigate a menu or visually displayed control elements to make selections for executing the operations and/or changing the immersion levels), thereby improving the operating efficiency of the computer system.

Increasing the immersion level relative to the immersion level may help alleviate constraints imposed by the physical environment of the computer system. For example, a more spacious virtual or XR environment may be realistically simulated by blocking sensory output inputs from the physical environment (e.g., blocking visual inputs of a small/restricted room, removing (audio) echoes from a small physical space) to provide a virtual environment that is more beneficial for a user to interact with applications in the three-dimensional environment 11000.

The user 7002 further increases the immersion level from the immersion level shown in fig. 11B by applying a further rotational input in the same direction (e.g., first rotational direction) as that applied to increase the immersion level from the immersion level shown in fig. 11A to the immersion level shown in fig. 11B. In response to the second rotational input, the immersion level associated with the display of the XR environment generated by the display generation component shown in fig. 11B is increased to a second immersion level by displaying additional virtual content 11006 to user 7002, as shown in fig. 11C.

For the immersion level shown in fig. 11C, virtual content 11006, which is different from the transparent portion of the physical environment 7000, is displayed in the field of view of the user 7002 with a viewing angle of about ±40°. For example, for a viewing angle of about ±40° from the view of the user 7002, computer-generated virtual content starts to be displayed at an angle of about 40 ° from the left to the right of the axis of the user 7002 (e.g., the sagittal axis of the user 7002) and an angle of about 40 ° from above to about 40 ° below the axis. In some implementations, the computer-generated virtual content occupies (e.g., fully occupies) a field of view within the viewing angle. As shown in fig. 11C, the three-dimensional environment 11000 shown in fig. 11C does not have any representation of the physical environment 7000. In some embodiments, the computer-generated virtual content portion occupies a portion of the field of view within the perspective, and no input from the physical environment 7000 of the computer system 101 is provided in the field of view within the perspective.

In addition to virtual content that substantially and/or continuously fills or occupies the field of view of user 7002 starting from a central portion of the field of view of user 7002, additional discrete virtual content may be provided at a more peripheral region of the field of view of user 7002. For example, a second box 11008 with ribbon 11010 is placed on top of box 7016 at a second immersion level illustrated in fig. 11C. In this example, none of the box 7016, the second box 11008, or the ribbon 11010 have any correspondence in the physical environment 7000.

In some implementations, the virtual content is displayed at a different level of fidelity (e.g., the same virtual content is displayed at a higher fidelity (e.g., at a clearer contrast, at a higher resolution, and/or in a more realistic representation) at a second level of immersion as shown in fig. 11C than at the level of immersion shown in fig. 11B, as well as, as compared to the situation shown in fig. 11B, at lower immersion levels, with lower fidelity (e.g., with lower contrast, with lower resolution, and/or with less realistic representations, or more blended into the display of the XR environment) display the same virtual content in some embodiments, the computer-generated virtual content is eliminated from the immersion level being a lower level of immersion than that shown in fig. 11B (not shown).

In some embodiments, at the highest level of immersion, the perspective of the computer-generated virtual content displayed to the user 7002 may be ±90°, covering a full angular range of 180 ° (e.g., spanning from a position on the left shoulder of the user 7002 to a position on the right shoulder of the user 7002). As the user 7002 rotates her head, a newly oriented angular range of 180 ° is provided at the new location of her head, effectively providing a 360 ° range of viewing angles (e.g., providing a visual experience similar to that provided at an astronomical instrument). Similarly, the audio source is positioned in a suitable location (e.g., a simulated location in an XR environment) to simulate a sound source that matches a visual perspective for experience at a corresponding immersion level.

The use of a rotary input mechanism allows a user to provide a continuous or discrete input range as described above and observe direct visual changes in the XR environment in response to rotary input using the rotary input mechanism without having to display additional controls to the user. The use of a rotational input mechanism provides direct access to changes in immersion levels and execution of different operations, reducing the amount of time required to achieve a particular result (e.g., the user does not have to navigate a menu or visually displayed control elements to make a selection for changing immersion levels), thereby improving the operating efficiency of the computer system.

When the user 7002 provides a third rotational input to the rotatable input element 7108 in a rotational direction opposite to the rotational direction provided in the second rotational input, the immersion level of the three-dimensional environment 11000 decreases from the second immersion level shown in fig. 11C to the lower immersion level shown in fig. 11B. When computer system 101 is displaying three-dimensional environment 11000 with the level of immersion shown in fig. 11B to user 7002, providing a fourth rotational input to rotatable input element 7108 in the same direction as the third rotational input further reduces the level of immersion to the level of immersion shown in fig. 11A, where the passthrough portion is again provided to user 7002.

A rotatable input element configured to receive bi-directional input allows a user to be presented with a change in immersion level based on a direction of rotation of the rotational input without having to display additional controls to the user. For example, in accordance with a determination that the first input is a rotational input in a first direction, the level of immersion presented to the user 7002 is increased. Instead, in accordance with a determination that the first input is a rotational input in a second direction that is different (e.g., opposite) from the first direction, the level of immersion presented to the user 7002 is reduced. For example, the first direction is clockwise and the second direction is counter-clockwise (or vice versa), clockwise rotation increases the immersion level and counter-clockwise rotation decreases the immersion level (or vice versa).

In some embodiments, when computer system 101 is displaying the immersion level shown in fig. 11A to user 7002, providing a rotational input substantially equal in magnitude to the sum of rotational angles through which the first rotational input and the second rotational input rotate in the first rotational direction changes the immersion level from the immersion level shown in fig. 11A to the immersion level shown in fig. 11C while (e.g., briefly) transitioning through the immersion level shown in fig. 11B. Thus, the amount of change in immersion level is optionally based on the magnitude of the rotation and/or has a direction based on the direction of rotation.

The use of a rotary input mechanism allows a user to provide a continuous or semi-continuous input range (e.g., five (or eight or ten) or more different input values or levels), and the bi-directionality of the rotary input mechanism allows easy and intuitive change of input in either direction without having to display additional controls to the user. In some embodiments, the number of presses matches the immersion level (e.g., three presses correspond to a higher immersion level than two presses).

When consuming media (e.g., watching video) via the application user interface 11002 at a second immersion level, as shown in fig. 11C, the user 7002 provides user input to the rotatable input element 7108. In response to detecting the user input to the rotatable input element 7108 as a press input (e.g., the computer system 101 determines to provide a press input to the rotatable input element 7108), and without changing the immersion level (e.g., the immersion level remains at the second immersion level), the display generating component of the computer system 101 presents a main menu user interface 7110 to the user 7002 in the three-dimensional environment 11000 at the second immersion level, as shown in fig. 11D. In some embodiments, the main menu user interface 7110 is displayed in the foreground closer to the user 7002 than other objects or features of the XR environment. In some embodiments, the main menu user interface 7110 is presented concurrently with other content outside of the main menu user interface 7110, such as passthrough content or virtual content in a 3D environment.

The ability to navigate the main menu user interface 7110 while continuing to display the XR environment (e.g., by accessing a set of applications or contacts that are capable of interacting with the user) reduces the amount of time required to complete the user's desired operation, independent of the current display mode (e.g., VR or AR). Navigation of the main menu user interface is not limited to a particular display mode and does not require additional controls to be displayed to the user to access the main menu user interface.

In some implementations, the application user interface 11002 is canceled by a single press input (e.g., before the main menu user interface 7110 is displayed or simultaneously with the main menu user interface 7110 being displayed) and replaced by the mini-player user interface 11012 based on the type of application user interface presented to the user interface prior to detection of a press input to the rotatable input element 7108, as shown in fig. 11D.

The mini-player user interface 11012 shown in fig. 11D occupies a smaller area of the three-dimensional environment 11000 than the application user interface 11002 shown in fig. 11C. In some embodiments, the mini-player user interface 11012 is shifted to a more peripheral portion of the three-dimensional environment 11000 than the application user interface 11002, which is displayed in a central portion of the field of view of the user 7002. In some embodiments, the mini-player user interface 11012 is displayed at substantially the same location as the application user interface 11002 (e.g., the center position of the application user interface 11002 substantially coincides with the center position of the mini-player user interface 11012).

Presenting the mini-player user interface 11012 provides a means for the user 7002 to multitask and continue the media experience (at least in some capability) while navigating the main menu user interface 7110, which increases the performance and efficiency of the computer system 101. Displaying the mini-player user interface 11012 (e.g., a video picture-in-picture (PiP) player optionally including a representation of the current video frame) allows the user to control the media experience (e.g., by providing playback control in the mini-player) and/or indicates to the user the current "location" of the user's media experience (e.g., by displaying a time index, or for video content, displaying a representation of the current video frame) as the user navigates the main menu user interface 7110 without displaying additional controls. In some embodiments, the display of computer-generated virtual content (e.g., box 7016, second box 11008, and/or ribbon 11010) is maintained while the main menu user interface 7110 is displayed.

Fig. 11E illustrates another application user interface 11014 displayed at the same immersion level (e.g., a second immersion level) as that illustrated in fig. 11C. When browsing a web page via the application user interface 11014 at the second immersion level, as shown in fig. 11E, the user 7002 provides user input to the rotatable input element 7108. In response to detecting the user input to the rotatable input element 7108 as a press input (e.g., the computer system 101 determines to provide a press input to the rotatable input element 7108), and without changing the immersion level (e.g., the immersion level remains at the second immersion level), the display generating component of the computer system 101 presents a main menu user interface 7110 to the user 7002 in the three-dimensional environment 11000 at the second immersion level, as shown in fig. 11F. In some implementations, the main menu user interface 7110 is displayed in a central portion of the user's field of view, e.g., in the middle portion 7104 of the three-dimensional environment 11000, and thus is not displayed below the previous display of the application user interface 11014.

In some embodiments, the application user interface 11014 is canceled by a single press input (e.g., prior to displaying the main menu user interface 7110 or simultaneously with displaying the main menu user interface 7110), as shown in fig. 11F. In some embodiments, canceling an active application (e.g., application user interface 11014) using a single press input includes running the application (e.g., associated with application user interface 11014) in the context of computer system 101 and not terminating the application.

In some embodiments, although not shown in fig. 11F, virtual objects, such as box 7016, second box 11008, and/or ribbon 11010 are canceled in response to computer system 101 detecting a press input to rotatable input element 7108. In some embodiments, canceling the virtual object further includes displaying a corresponding passthrough portion of the physical environment 7000 of the computer system 101. In some implementations, in response to a pressing input to the rotatable input element 7108, all virtual objects are canceled and the computer system is transitioned to (or maintained at) the minimum level of immersion (e.g., no immersion). In some implementations, in response to a pressing input to the rotatable input element 7108, the level of immersion is reduced (e.g., more aspects of the physical environment are presented to the user 7002), but virtual content from applications running in the foreground is still displayed.

As described in table 3 and explained with respect to fig. 7O, in response to detecting a closely connected two press input to the rotatable input element 7108 (e.g., two press inputs within 2 seconds of each other, two press inputs within 1 second of each other, two press inputs within 0.5 seconds of each other), an application management user interface (e.g., system interface 7180) is presented in a virtual three-dimensional environment, as shown in fig. 7O. In some implementations, the system interface 7180 is overlaid on applications (which may include one or more or two or more applications) running in the foreground in the three-dimensional environment 7128 (e.g., audio player/music applications and web browser applications, as shown in fig. 7N and 7O) (e.g., presented at a location in the field of view of the user 7002 that is closer to the user 7002 than the two applications running in the foreground). In some embodiments, the application management user interface is a system interface that allows for multitasking on computer system 101.

As described with reference to table 3, in response to detecting a closely connected three press input to the rotatable input element 7108 (e.g., three press inputs within 4 seconds, three press inputs within 3 seconds, three press inputs within 1 second), the unobstructed mode is activated, or the user 7002 is presented with an option to enter the unobstructed mode. In some embodiments, the option to enter the barrier-free mode is overlaid on applications running in the foreground (which may include one or more, or two or more applications) (e.g., audio player/music applications, and web browser applications). In some implementations, the three press input switches between the barrier-free mode being active and the barrier-free mode being inactive, or displays an option for enabling or disabling the barrier-free mode.

As described in table 3, for computer system 101 as a wearable device (e.g., a head-mounted device, a strapped device, or a watch), when the wearable device is turned on and worn on the body of user 7002, the field of view of user 7002 is re-centered in response to detecting a single press input to rotatable input element 7108 that remains (e.g., persists) for a first period of time (e.g., more than 2 seconds, more than 4 seconds, more than 6 seconds). In some implementations, re-centering the field of view includes the display of computer-generated virtual content fading out of the field of view of the user 7002 at a previous center position of the field of view of the user 7002 and fading in computer-generated virtual content at a newly defined center position of the field of view of the user 7002. Optionally, the virtual content is presented with higher fidelity or higher contrast than before re-centering the user's field of view. In some implementations, re-centering the field of view of the user 7002 includes redisplaying a plurality of previously displayed user interface elements (e.g., main menu or home screen user interface elements) in the XR environment. In some implementations, when the user 7002 faces or focuses on a new center of the field of view, the new center of the field of view of the user 7002 is selected by terminating the pressing input (e.g., when the user 7002 stops applying the pressing input to the rotatable input element 7108).

The same rotary input mechanism can receive a second type of input (e.g., a press input) that requests or indicates that a discrete/binary type (e.g., open or close) function be performed (e.g., cancel an active application, enter an unobstructed mode, or cancel a virtual object), as described with reference to table 3. Using different numbers of press inputs to determine which operation(s) of two or more different operations to perform reduces the number of different input devices that must be provided to accomplish different tasks.

In some embodiments, computer system 101 is further configured to perform different operations for press inputs of different durations. For example, as explained with reference to table 3, pressing and holding the rotatable input element 7108 causes the display to be re-centered (e.g., fade out and fade in), while a tap or short single press input causes (1) the main menu user interface to be displayed, (2) the passthrough portion of the physical environment to be provided, or (3) the application to exit full screen mode.

Using input durations to determine which system operations (e.g., application-specific operations) to perform reduces the number of different input devices that must be provided to accomplish different tasks. Reducing the number of input devices also reduces the need to provide additional hardware wiring within the device, and instead, the processor may be programmed to interpret more types of input (e.g., short presses, long presses, and holds) from a particular input device.

As explained with reference to table 2, a second type of input (e.g., a press input) for a hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element) may be used in the reconciliation input of inputs detected concurrently with other input devices (e.g., button 7508) to request or indicate performance of a third operation that depends on the duration and/or style of the input. In some implementations, input to the hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element) is concurrent or overlapping with input to a second input device (e.g., a button 7508 or a camera), and input to the hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element) is detected in conjunction with input to the second input device (e.g., button 7508). The reconciliation input may be concurrent or overlapping input. The use of reconciliation inputs enables system operations (e.g., application-specific operations) such as capturing screenshots, powering off, restarting the computer, and resetting the computer system to be performed without displaying additional controls.

The combined use of more than one input device to request or indicate a corresponding system operation (e.g., an operation that is not specific to a particular application) reduces the number of different input devices that must be provided to accomplish different tasks (e.g., N input devices may be used to request or indicate M operations, where N < M). Reducing the number of input devices that must be provided reduces physical clutter on the device, freeing up more physical space on the device and helping to prevent accidental input from inadvertent contact. Reducing the number of input devices also reduces the need to provide additional hardware wiring within the device, and instead, the processor may be programmed to interpret reconciled inputs from a fewer number of input devices.

Fig. 12A to 12G illustrate examples of controlling a computer system based on the physical location of the computer system relative to a user and changes in the physical location and the state of the computer system. FIG. 18 is a flow chart of an example method 1800 for controlling a computer system based on the physical location of the computer system relative to a user and changes in the physical location and state of the computer system. The user interfaces in fig. 12A to 12G are used to illustrate the processes described below, including the process in fig. 18.

Fig. 12A shows a state diagram 12000 associated with computer system 101. As shown in the examples depicted in fig. 12B-12G (e.g., fig. 12B 1-12G 1 (fig. 12B1, 12C1, 12D1, 12E1, 12F1, and 12G 1) and fig. 12B 2-12G 2 (fig. 12B2, 12C2, 12D2, 12E2, 12F2, 12G 2)), in some embodiments, the display generating component 7100 of the computer system 101 is a watch 12010 (e.g., fig. 12B 1-12G 1) or HMD 12011 (e.g., fig. 12B 2-12G 2) (e.g., and/or HMD 7100 a) worn by the user 7002. In some embodiments, the display generating component of computer system 101 is a head mounted display worn on the head of user 7002 (e.g., the content shown in fig. 12B-12G as visible via display generating component 7100 of computer system 101 corresponds to the field of view of user 7002 when the head mounted display is worn, as shown in fig. 12B 2-12G 2). In some embodiments, computer system 101 is a wearable device, such as a watch 12010, a pair of headphones, a headset, or a strapping device.

In some embodiments, the display generation component is a stand-alone display, a projector, or another type of display. In some embodiments, the computer system communicates with one or more input devices including biometric sensors, cameras, or other sensors and input devices that detect movement of a user's hand, movement of the user's entire body, and/or movement of the user's head in a physical environment. The one or more input devices also optionally include a pulse sensor (e.g., for measuring a pulse rate of the user), a thermal sensor (e.g., for measuring a temperature of the user), and an inertial measurement sensor (e.g., for detecting or measuring movement of the user). In some embodiments, the one or more input devices detect movement and current pose, orientation, and positioning of a user's hands, face, and/or whole body. In some embodiments, the one or more input devices include buttons, dials, crowns, switches, movable components, or solid state components, for example, the one or more input devices may include a device that detects local sensor input, such as intensity or force sensor input, and in some embodiments, the computer system 101 uses the input to trigger a corresponding operation and optionally provide haptic feedback, such as haptic feedback corresponding to the detected input.

The state diagram 12000 shown in fig. 12A includes four states of the computer system 101. In some embodiments, computer system 101 is a wristwatch. When the computer system is ex vivo (e.g., the user 7002 is not wearing the computer system (e.g., a watch) on the wrist or another portion of the user 7002's body), the computer system is in the first state 12002 or sleep state. The watch transitions from the first state 12002 (sleep) to the second state 12004 or standby state when the computer system is lifted or a side button of the computer system is pressed, or when the computer system receives an incoming call invitation (e.g., a telephone call). The computer system is ex vivo in both the first state 12002 and the second state 12004.

When a computer system (e.g., a watch) is worn on the body of the user 7002 and the computer system detects biometric feedback from the user 7002, the computer system transitions to a third state 12006. The biometric feedback may include sensor data detected by the computer system that indicates the pulse of the user 7002, the skin temperature of the user 7002, the gaze location, the iris pattern, the facial expression, the eye color and/or shape, or other biometric or physiological measurements of the user 7002. In this third state 12006, the computer system is portable but not yet authenticated. When the biometric authentication, password entry, or activation of the sharing mode is activated on the computer system, the computer system enters an authentication state or fourth state 12008.

When the computer system is in the fourth state 12008, if the computer system no longer detects any biometric feedback from the user 7002, the computer system determines that it is no longer being worn on the body of the user 7002 and transitions directly from the fourth state 12008 to the second state 12004.

Similarly, when the computer system is in the third state 12006 and it no longer detects any biometric feedback from the user 7002, and the computer system determines that it is no longer being worn on the body of the user 7002, the computer system transitions to the second state 12004.

After a timeout period (e.g., 1 minute, 5 minutes, 10 minutes), the computer system transitions from the second state 12004 to the first (e.g., sleep or low power) state 12002.

In some embodiments, computer system 101 is a head-mounted device. When the head-mounted device is off the head (e.g., not positioned on the head of the user 7002, thereby covering the eyes of the user 7002), the head-mounted device is in either the first state 12002 or the second state 12004. When the user 7002 wears the head-mounted device (e.g., the head-mounted device is positioned on the head of the user 7002 so as to cover the eyes of the user 7002 such that the user interface generated by the one or more display generating components of the head-mounted device is visible to the user 7002), the head-mounted device transitions to a third state 12006. After biometric verification of the user 7002 (e.g., eye tracking of the user 7002 or facial recognition of the user 7002), the headset transitions from the third state 12006 to the fourth state 12008. When the user 7002 removes the head mounted device (e.g., removes the head mounted device from the head of the user 7002) and the device detects a loss of biometric feedback from the user 7002, the head mounted device transitions from the fourth state 12008 to the second state 12004. After a timeout period in which the head-mounted device fails to transition to the third state 12006 or the fourth state 12008, the head-mounted device transitions to the first state 12002 or the sleep state.

Table 4 below describes the behavior of computer system 101 in each of the four states depicted in fig. 12A.

The first (e.g., sleep or low power) state 12002 differs from the second (e.g., low power, standby or sleep) state 12004 in that, for example, the first (e.g., sleep or low power) state 12002 uses less power than the second (e.g., lower power, standby or sleep) state 12004, but also takes a longer time to wake up from. The computer system 101 reduces the frequency of sensor measurements and/or sensor measurement processes (e.g., sensors for gaze tracking and sensors for world tracking) during the first (e.g., sleep or low power) state 12002 to a greater extent than the second (e.g., lower power, standby or sleep) state 12004.

The fourth (with authentication) state 12008 differs from the second (e.g., lower power, standby, or sleep) state 12004 in that the fourth state 12008 consumes more power than the second state 12004 (e.g., display generating components (e.g., screen) on the computer system 101 are turned off, earmuff audio is turned off; the fourth (with authentication) state 12008 differs from the second (e.g., low power, standby, or dormant) state 12004 in that the fourth state 12008 allows significantly more user interaction (e.g., an application running in the background in the second state 12004 but running in the foreground in the fourth state 12008) than the second (e.g., low power, standby, or dormant) state 12004.

The fourth (with-person authenticated) state 12008 differs from the third (e.g., with-person unauthenticated) state 12006 in that the application is inactive in the third (e.g., with-person unauthenticated) state 12006 until the user is authenticated. Various applications (e.g., telephone call, video call/conference, media delivery, screen mirroring) are also suspended in a third (e.g., not authenticated with) state 12006, but resumed or continued to run in a fourth (authenticated with) state 12008.

The fourth (with authentication) state 12008 differs from the first (e.g., sleep or low power) state 12002 in that the power consumption in the first (e.g., sleep or low power) state 12002 is much lower. For example, in a first (e.g., sleep or low power) state 12002, the screen is off, the earmuff audio is off, the application is inactive, gaze tracking, hand tracking, and world tracking are off, the application is contextualized or terminated, but in a fourth (personal authentication) state 12008, the screen is on, the earmuff audio is on, the application is active and running in the foreground, gaze tracking, hand tracking, and world tracking are on.

The third (non-portable) state 12006 differs from the second (e.g., low power, standby, or sleep) state 12004 in that the third (non-portable) state 12006 consumes more power than the second state 12004 (e.g., in the second state, display generating components (e.g., screens) on the computer system 101 are turned off, earmuff audio is turned off, hand tracking is turned off, but these are all turned on in the third (non-portable) state 12006).

The third (non-portable) state 12006 differs from the first (e.g., sleep or low power) state 12002 in that the third (non-portable) state 12006 consumes more power than the first (e.g., sleep or low power) state 12002 (e.g., in the first (e.g., sleep or low power) state 12002, display generating components (e.g., screens) on the computer system 101 are turned off, earmuff audio is turned off, hand tracking, gaze tracking, world tracking are all turned off, but all turned on in the third (non-portable) state 12006). Various applications (e.g., phone call, video call/conference, media delivery, screen mirroring) are terminated in a first (e.g., sleep or low power) state 12002, but suspended in a third (e.g., not authenticated with) state 12006.

When computer system 101 corresponds to a watch, display generating component 7100 of the watch (e.g., the screen of the watch) is turned off in both a first (e.g., sleep or low power) state 12002 and a second (e.g., low power, standby or sleep) state 12004. In contrast, in both the third state 12006 and the fourth state 12008, the display generating component 7100 of the wristwatch (e.g., the screen of the wristwatch) is turned on.

When computer system 101 corresponds to a headset, in both a first (e.g., sleep, or low power) state 12002 and a second (e.g., low power, standby, or sleep) state 12004, the earmuff audio of the headset is turned off and speaker audio is available. In both the third state 12006 and the fourth state 12008, the earmuff audio of the head-mounted device is on and speaker audio is not available. In some embodiments, speaker audio refers to an audio output component on a head-mounted device that broadcasts sound waves over a larger spatial area (e.g., such as a speaker device) when compared to earmuff audio that delivers sound waves closer to the user's ear. For example, when the head-mounted device is away from the user's body in a first (e.g., sleep or low power) state 12002 or a second (e.g., low power, standby or sleep) state 12004, earmuff audio is not available because sound waves are not delivered close to the user's ear. Alternatively, the head-mounted device may be used as a speaker device, and broadcast audio (sound waves) over a larger spatial area.

In both the first (e.g., sleep or low power) state 12002 and the second (e.g., low power, standby or sleep) state 12004, no software applications are active on the computer system 101. In the third state 12006, the software application is inactive until the user 7002 is authenticated. In other words, when the screen is locked, no application is running (e.g., no screen locking application is executed, and more generally, a software application on the computer system does not generate a user interface or provide information to the user). In a fourth state 12008, one or more software applications are active on computer system 101.

In both the first (e.g., sleep or low power) state 12002 and the second (e.g., low power, standby or sleep) state 12004, the hand tracking functionality of the computer system 101 is turned off. Conversely, in both the third state 12006 and the fourth state 12008, the hand tracking functionality of the computer system 101 is turned on.

The absence of gaze tracking and world tracking in the first state 12002 is a distinction between the behavior of the computer system 101 in the first (e.g., sleep or low power) state 12002 and the second (e.g., low power, standby or dormant) state 12004. In some implementations, gaze tracking includes detecting the presence of eyes (e.g., capturing images using a camera and determining whether one or more eyes of user 7002 have been captured in any of the images using image processing techniques). In some embodiments, world tracking includes using optical tracking to determine the position and orientation of computer system 101. In some embodiments, world tracking includes using inertial tracking from accelerometers and gyroscopes and/or other positioning sensors to determine the positioning of computer system 101 in a physical three-dimensional environment in which computer system 101 is located. In some embodiments, computer system 101 in second state 12004 has warmed up and sensed its environment, and responded more quickly when user 7002 was wearing computer system 101 (e.g., identified user 7002 more quickly and/or provided visual and audio output to user 7002 more quickly).

In the first, second, and third states 12002, 12004, 12006, the software application running on the computer system 101 continues to run in the background ("background"), and when the computer system 101 transitions to any of the first, second, and third states 12002, 12004, 12006, any recording session that may be ongoing in the fourth state 12008 is terminated.

In a first state 12002, a telephone call, video call, or conference session, media session (e.g., music, video, or podcast), and any screen images of computer system 101 are terminated. In the second state 12004 and the third state 12006, the telephone call and video call or conference session are muted and terminated after a timeout period (e.g., about 1 minute, about 5 minutes, less than 10 minutes). The media session (e.g., music, video, or podcast) is paused. Any screen mirroring of computer system 101 is paused and terminated after a timeout period. In some implementations, the paused media session is terminated after a timeout period.

In a fourth state 12008, the display generation component 7100 of the computer system 101 presents applications in focus in the foreground (sometimes referred to herein as "focused applications" of "foreground"). When the computer system 101 transitions from any of the first state 12002, the second state 12004, or the third state 12006 to the fourth state 12008, the telephone call, video call, or conference session, media session, and screen image are all restored. In screen mirroring, the visual output generated by display generation component 7100 is replicated on a different display unit (e.g., another screen, or rendered by a projector) external to (e.g., not part of) computer system 101. In some embodiments, the display generation component also displays an indication (e.g., a predefined icon or object) that the mirror image of the output from the display generation component is paused. Screen mirroring has been described above with reference to fig. 9A. In some embodiments, the other user with whom the user 7002 performs screen mirroring is a participant in a communication session with the user 7002. A real-time communication session is described with reference to fig. 9D. Providing an indication that the image of the output from the display generation component is paused automatically conveys the disturbance to other participants without requiring activity input from an authorized user. This indication helps to minimize confusion and reduces the chance of other participants misinterpreting pauses in the screen image as requiring fault diagnosis.

Fig. 12B (e.g., fig. 12B1 and 12B 2) illustrates a computer system (e.g., watch 12010 or HMD 12011) on the wrist of user 7002 and/or on the head of user 7002. In fig. 12B-12G 2, HMD 12011 optionally shows a view of content displayed in the right optical module of the HMD, which is typically paired with a left optical module that shows a slight change in the content shown in the right optical module, in order to generate the appearance of the stereoscopic depth of the displayed content. The computer system (e.g., watch 12010 or HMD 12011) includes a crown 12014 configured to receive both rotational input (as indicated by the curved arrow) and press input. The computer system (e.g., watch 12010 or HMD 12011) also includes buttons 12016. In some embodiments, crown 12014 and/or buttons 12016 correspond to hardware input elements 7108 described above. In this example, the displayed application interface 12018 includes an audio player with playback control. Fig. 12B shows the computer system (e.g., watch 12010 or HMD 12011) in a fourth state 12008 because the audio player is active and the audio player application with application interface 12018 is presented in the foreground.

Fig. 12C (e.g., fig. 12C1 and 12C 2) shows the computer system (e.g., watch 12010 or HMD 12011) after the computer system (e.g., watch 12010 or HMD 12011) is removed from the wrist of user 7002 such that the computer system is ex-vivo (e.g., the computer system (e.g., watch 12010 or HMD 12011) is not in contact with any portion of the body of user 7002). The application interface 12018 shows that the audio player is now paused. Fig. 12C shows the computer system (e.g., watch 12010 or HMD 12011) in a second state 12004 when the media session is paused but not yet terminated. After a timeout period (e.g., less than 10 minutes, less than 5 minutes), if the computer system (e.g., watch 12010 or HMD 12011) does not transition to the third state 12006 or fourth state 12008, the computer system (e.g., watch 12010 or HMD 12011) transitions to the first (e.g., sleep or low power) state 12002 and the media application (e.g., audio player application) terminates.

Generally, when a session (e.g., a media consumption session, a recording session, a content sharing session) in an application (e.g., a media application, a conferencing application, a telephony application, a gaming application, a web content browsing application, or other local application or third party application) is active (e.g., in the foreground of a user interface) and when the wearable device is being worn, a first signal (e.g., the first signal is a signal from a biometric sensor) is detected in response to the wearable device indicating that the wearable device has been removed. For example, the biometric sensor may include a camera and an image processing component, and when the image processing component is unable to locate the user's eyes or the presence of any eyes in an image captured by the camera, the biometric sensor outputs a control signal that the wearable device has not been placed in front of the user's eyes. As another example, the biometric sensor may be a pulse sensor (e.g., for detecting a pulse of a user) that outputs a signal when the wearable device has been removed to indicate that a pulse has not been detected. As another example, the first signal is a control signal provided by an inertial measurement device (e.g., an accelerometer, a gyroscope, and/or an inertial measurement unit), and the inertial measurement device outputs the first signal when the inertial measurement device (or a computer system using information from the inertial measurement device) determines that the inertial measurement device is oriented in a manner that is incompatible with a wearable device being worn (e.g., the wearable device is positioned upside down, the wearable device is lying sideways, or a camera in the wearable device is pointed to the sky or the ground due to the orientation of the wearable device). As another example, the first signal is a control signal provided by a thermal sensor (e.g., a thermal sensor that detects the time it has been removed from the body temperature source of the wearer). In some embodiments, signals from multiple biometric sensors are analyzed together to determine whether the wearable device is being worn. For example, when a user places the wearable device on her forehead, the camera will not detect the presence of any eyes, but the thermal sensor will still detect the body temperature and the inertial measurement device will detect the "upright" positioning of the wearable device, the wearable device transitioning to a different state (e.g., a first (e.g., sleep or low power) state 12002 or a second (e.g., low power, standby or sleep) state 12004.

Fig. 12D (e.g., fig. 12D1 and 12D 2) shows a computer system (e.g., wristwatch 12010 or HMD 12011) in a first state 12002, the computer system being ex-vivo and resting on its side. The computer system (e.g., watch 12010 or HMD 12011) is in a first (e.g., sleep or low power) state 12002, and the screen on the computer system (e.g., watch 12010 or HMD 12011) is off. Even if speaker audio is available, the screen of the computer system (e.g., watch 12010 or HMD 12011) is turned off. When a computer system (e.g., watch 12010 or HMD 12011) is in a first (sleep) state 12002, an application is inactive on the computer system. The computer system (e.g., watch 12010 or HMD 12011) does not track any biometric input from the user 7002 (e.g., no hand tracking, no gaze tracking). Further, in the first (e.g., sleep or low power) state 12002, the computer system (e.g., watch 12010 or HMD 12011) also does not track its external environment. When the computer system (e.g., watch 12010 or HMD 12011) is in the first state 12002, the recording session, phone call, video call, or conference session, media session, and screen image are all closed (e.g., not performed). In a first state 12002, any open applications on the computer system (e.g., watch 12010 or HMD 12011) are running in the background. In some embodiments, an application running in the background is terminated after a timeout period (e.g., after one hour, after 30 minutes, after 15 minutes).

In some embodiments, computer system 101 is a head-mounted device. In the first state 12002 and the second state 12004, the display generation component 7100 of the head-mounted device is turned off, and no display is provided to the user 7002. In some implementations, the head-mounted device includes an audio outlet that directs sound to the ear of the user 7002. Such an audio outlet provides earmuff audio to the user 7002. In some implementations, the head-mounted device includes an audio outlet that broadcasts sound from the head-mounted device to a wider area. Such an audio outlet provides speaker audio. In the first state 12002 and the second state 12004, the headset is away from the head of the user 7002 (e.g., the user 7002 is not wearing the headset), and when speaker audio is available, the earmuff audio is turned off. In some embodiments or situations, such as when computer system 101 (e.g., a head mounted device) is receiving audio from a source providing spatial audio (e.g., when a user is watching a movie or engaged in a shared audio or video session that includes spatial audio), the audio provided by computer system 101 to the user is spatial audio. Spatial audio provides audio to a user at an analog location in a three-dimensional environment where a computer system (e.g., a head mounted device) is located.

The application is inactive on the head-mounted device when the head-mounted device is in a first (e.g., sleep or low power) state 12002 and a second (e.g., low power, standby or sleep) state 12004. Any application that was active before the headset transitioned to either the first (e.g., sleep or low power) state 12002 or the second (e.g., low power, standby or dormant) state 12004 switches to running in the background. When the headset is in the first state 12002, recording sessions, phone calls, video calls or conference sessions, media sessions, and screen images running on the headset are all terminated or closed.

Hand tracking is not activated when the head mounted device is in a first (e.g., sleep or low power) state 12002 and a second (e.g., low power, standby or sleep) state 12004. In some implementations, hand tracking is performed by an optical or infrared camera provided in an outward facing manner on the head mounted device to image the hand of the user 7002.

In a first (e.g., sleep or low power) state 12002, the headset also does not detect a device context. Detecting the device context may include performing gaze tracking or world tracking. In some implementations, gaze tracking is performed by an optical or infrared camera provided in an inward facing manner on the head mounted device to sense the eyes of the user 7002. Once the eyes of the user 7002 are detected, the movement of the eyes is followed to track the gaze of the user 7002. In some embodiments, world tracking (or world detection) is performed by one or more inertial measurement devices (e.g., one or more accelerometers, gyroscopes, and/or inertial measurement units) disposed within the head mounted device. In some embodiments, world tracking is performed by an optical or infrared camera provided in an outward facing manner on the head-mounted device to image the external environment in which the user 7002 is located.

In response to detecting that the computer system (e.g., the watch 12010 or the HMD 12011) is lifted, the watch 12012 transitions from the first state 12002 to the second state 12004, as shown in fig. 12E (e.g., fig. 12E1 and 12E 2). When the watch detects that it has been lifted, the computer system (e.g., watch 12010 or HMD 12011) transitions to a second state 12004. Optionally, when a computer system (e.g., watch 12010 or HMD 12011) detects a press input to button 12016 or a press input to crown 12014, the computer system (e.g., watch 12010 or HMD 12011) also transitions to a second (e.g., low power, standby, or sleep) state 12004. Alternatively, when the computer system (e.g., watch 12010 or HMD 12011) receives an incoming invitation (e.g., a voice call invitation or a video call invitation), the computer system (e.g., watch 12010 or HMD 12011) also transitions from a first (e.g., sleep or low power) state 12002 to a second (e.g., low power, standby or sleep) state 12004. Providing one or more intermediate (e.g., standby) states in which the wearable device senses its physical environment allows the wearable device to more quickly warm up and ready for a delivery experience (e.g., media experience, communication session) once the wearable device is positioned on the body of an authorized user. The wearable device senses its surroundings and is better ready to transition to an active carry-on state (e.g., performs a faster transition to an active carry-on state) when an authorized user interacts with it, making the transition operationally more efficient for the wearable device and more time-efficient for the authorized user.

In some embodiments, computer system 101 is a head-mounted device. In a second (e.g., low power, standby, or sleep) state 12004, after transitioning from a first (e.g., sleep or low power) state 12002, the headset begins to perform gaze tracking and world tracking, which are functions that the headset does not perform while in the first (e.g., sleep or low power) state 12002. When the headset is in a second (e.g., low power, standby, or dormant) state 12004, the telephone call and video call or conference session are muted. After a timeout period (e.g., the headset does not leave the second state 12004 within 1 minute, within 5 minutes, or within 10 minutes), the telephone call and video call or conference session are terminated. When computer system 101 is in a second (e.g., low power, standby, or hibernation) state, any media session is paused, and the screen image running on the head-mounted device is also paused. After a timeout period, both the screen image and the media session are terminated.

In general, when the session of the application is inactive in a first (e.g., sleep or low power) state 12002 or a second (e.g., low power, standby or dormant) state 12004 (e.g., running in the background, pausing, not receiving any user input, not providing any output to the user), the biometric sensor outputs a signal that the wearable device has now been placed in front of the user's eye in response to the wearable device detecting a second signal (e.g., the second signal is a signal provided by a biometric sensor, for example, the biometric sensor may include a camera and an image processing component, and when the image processing component is capable of locating the presence of the user's eye or alternatively any person's eye in an image captured by the camera, as another example, the first signal is a signal provided by an inertial measurement device (e.g., accelerometer, gyroscope) and when the inertial measurement device determines that it is oriented in a manner compatible with the wearable device being worn (e.g., the wearable device is not positioned upside down, the wearable device is not lying on its side, for example, in some embodiments, the second signal is provided by a thermal sensor, e.g., optionally, the signals from the plurality of biometric sensors are collectively analyzed to determine whether the wearable device is being worn, e.g., when the user places the wearable device on her forehead, the camera will not detect the presence of any eyes, but the thermal sensor will still detect the body temperature and the inertial measurement device will detect the "upright" positioning of the wearable device, and based on those determinations, when the first criterion is met, the wearable device resumes the session of the application (e.g., the user of the wearable device is determined to be an authorized or authenticated user based on automatic biometric verification, based on password entry, or determined to be an authorized user based on the sharing mode being active). On the other hand, when the first criterion is not satisfied, the session of the application is not restored.

Fig. 12F (e.g., fig. 12F1 and 12F 2) illustrates a computer system (e.g., a watch 12010 or an HMD 12011) in a third state 12006 according to some embodiments of the present disclosure. For example, as shown in fig. 12E, by having the user 7002 put on a computer system (e.g., watch 12010 or HMD 12011) after the computer system is lifted, the computer system (e.g., watch 12010 or HMD 12011) transitions from the second state 12004 to the third state 12006. A computer system (e.g., watch 12010 or HMD 12011) is placed on the wrist of user 7002, but the computer system (e.g., watch 12010 or HMD 12011) has not authenticated the user 7002. Thus, as displayed on user interface 12024, the telephone call from caller Abe is muted. The user 7002 may be authenticated when a biometric feature 12026 associated with the user 7002 is detected by a computer system (e.g., a watch 12010 or HMD 12011). In some embodiments, as shown in fig. 12F, the biometric feature is a 2-dimensional feature (e.g., tattoo, mark, tag, birthmark, fingerprint) on a portion of the body of the user 7002. A sensor (e.g., camera, scanner) of a computer system (e.g., watch 12010 or HMD 12011) detects the biometric feature 12026, and the computer system (e.g., watch 12010 or HMD 12011) or sensor determines whether the biometric feature 12026 matches a feature associated with the user 7002. In response to determining that the biometric feature 12026 matches a feature associated with the user 7002, the computer system (e.g., the watch 12010 or the HMD 12011) authenticates the user 7002 and transitions the computer system (e.g., the watch 12010 or the HMD 12011) from the third state 12006 to a fourth state 12008, as shown in fig. 12G (e.g., fig. 12G1 and 12G 2). In response to determining that biometric feature 12026 does not match the feature associated with user 7002, the computer system (e.g., watch 12010 or HMD 12011) does not authenticate user 7002 and remains in third state 12006. After a timeout period (e.g., the computer system (e.g., the watch 12010 or the HMD 12011) remains in the third state 12006, the computer system (e.g., the watch 12010 or the HMD 12011) does not leave the third state 12006 within 1 minute, within 5 minutes, or within 10 minutes), the telephone call with Abe is terminated.

In some embodiments, computer system 101 is a head-mounted device. Instead of biometric features such as 12026, the headset relies on gaze location, iris pattern, facial expression, eye color and/or shape to authenticate whether the user wearing the headset is an authorized user. In a third (unauthenticated) state 12006, the headset continues to perform gaze tracking and world tracking, which are functions that the headset does not perform while in the first (e.g., sleep or low power) state 12002. When the headset is in a third (unauthenticated) state 12006, the phone call and video call or conference session are muted. After a timeout period (e.g., the headset does not leave the third (unauthenticated) state 12006 within 1 minute, within 5 minutes, or within 10 minutes), the telephone call and video call or conference session are terminated. Any media sessions are paused, and screen images running on the head-mounted device are also paused. After a timeout period, both the screen image and the media session are terminated.

Unlike a computer system (e.g., watch 12010 or HMD 12011) having a display generation component (e.g., a watch screen) and/or one or more sensors disposed on an outer surface of the computer system (e.g., watch 12010 or HMD 12011), the head-mounted device may include a display generation component that presents a visual display to the user 7002 in an interior portion of the head-mounted device. For example, the head-mounted device is worn on the head of the user 7002 so as to cover the eyes of the user 7002. Similarly, the biometric sensor of the head-mounted device may also be directed toward an interior portion of the head-mounted device to track the gaze of the user 7002 or perform facial recognition operations.

The user 7002 repositions the computer system (e.g., watch 12010 or HMD 12011) by sliding the computer system (e.g., watch 12010 or HMD 12011) up and down along the forearm, and a sensor of the computer system (e.g., watch 12010 or HMD 12011) detects the presence of the biometric feature 12026, as shown in fig. 12G. In response to detecting the presence of the biometric feature 12026 and upon determining that the biometric feature 12026 corresponds to a feature associated with the user 7002, the computer system (e.g., the watch 12010 or the HMD 12011) authenticates the user 7002 and transitions from the third state 12006 to the fourth state 12008 based on the authentication of the user 7002. For a wearable device that is a head-mounted device, the relevant biometric features include one or more of gaze location, iris pattern, facial expression, eye color, and/or shape of the authorized user to authenticate whether the user wearing the head-mounted device matches the authorized user.

Alternatively, the computer system (e.g., the watch 12010 or the HMD 12011) also transitions from the third (unauthenticated) state 12006 to the fourth (authenticated) state 12008 upon entry of a password to the computer system (e.g., the watch 12010 or the HMD 12011) or upon activation of a sharing mode on the computer system (e.g., the watch 12010 or the HMD 12011).

In a fourth (authenticated) state 12008, the telephone call (e.g., with Abe) is resumed (e.g., unmuted), as shown by user interface 12024 in fig. 12G. Generally, the focused application is displayed in the foreground in the fourth (authenticated) state 12008 and the phone call, video call or conference session, media session, and screen image resumes. In some embodiments, even after the user is determined to be an authorized user, some sessions of the application may not be restored without additional user input (e.g., the application (e.g., using the characteristics of the respective sessions of the respective applications to determine whether to resume the respective sessions provides increased security/privacy by ensuring that certain types of sessions (e.g., recording sessions) with more security/privacy protection do not automatically restart after the wearable device has been removed from the user's body, even when an authorized user is detected.

In some embodiments, computer system 101 is a wearable device, such as a head-mounted device. In a fourth (authenticated) state 12008, the headset continues to perform gaze tracking and world tracking. When the computer system 101 is removed from the user's body, the computer system 101 transitions from a fourth (authenticated) state 12008 to a second (e.g., low power, standby, or sleep) state 12004.

Fig. 8A to 8G illustrate examples of how different operations are triggered by input to the input device depending on the current display mode. FIG. 14 is a flowchart of an example method 1400 for how different operations are triggered by input to an input device depending on a current display mode. The user interfaces in fig. 8A through 8G are used to illustrate the processes described below, including the process in fig. 14.

As shown in the examples in fig. 7B to 11F, content visible via the display generation component 7100 of the computer system 101 is displayed on a touch screen held by the user 7002. In some embodiments, the display generation component 7100 of computer system 101 is a head-mounted display that is worn on the head of user 7002 (e.g., the content shown in fig. 7B-11F as being visible via the display generation component 7100 of computer system 101 corresponds to the field of view of user 7002 when the head-mounted display is worn).

Additional description regarding fig. 7A-7O, 8A-8G, 9A-9D, 10A-10D, 11A-11F, and 12A-12G is provided below with reference to methods 13000, 14000, 15000, 16000, 17000, 18000, and 20000 described with respect to fig. 13-18 and 20 below.

Fig. 13 is a flowchart (also referred to as a flow chart) of an example method 1300 of displaying a main menu user interface within a three-dimensional environment, according to some embodiments.

In some embodiments, the method 13000 is performed at a computer system (e.g., computer system 101 in fig. 1) that includes a display generation component (e.g., display generation component 120 in fig. 1A, 3, and 4) (e.g., heads-up display, touch screen, or projector) and one or more cameras (e.g., cameras directed downward toward the user's hand (e.g., color sensors, infrared sensors, and other depth sensing cameras) or cameras directed forward from the user's head). In some embodiments, method 13000 is governed by instructions stored in a non-transitory (or transitory) computer-readable storage medium and executed by one or more processors of a computer system, such as one or more processors 202 of computer system 101 (e.g., control 110 in fig. 1A). Some operations in method 13000 are optionally combined and/or the order of some operations is optionally changed.

In some embodiments, the method 13000 is performed at a computer system (e.g., computer system 101 in fig. 1) in communication with a first display generating component (e.g., display generating component 120, display generating component 7100 in fig. 1A, 3, and 4) (e.g., heads-up display, HMD, display, touch screen, projector), one or more audio output devices (e.g., headphones, speakers located in a physical environment, speakers located within the same housing as the first display generating component, or attached to the same support structure as the first display generating component (e.g., built-in speakers of the HMD)), and one or more input devices (e.g., cameras, controllers, touch-sensitive surfaces, joysticks, buttons, gloves, watches, motion sensors, or orientation sensors). In some embodiments, the first display generation component is a user-oriented display component and provides a CGR experience to the user. In some embodiments, the computer system is an integrated device having one or more processors and memory enclosed in the same housing as at least some of the first display generating component, the one or more audio output devices, and the one or more input devices. In some embodiments, the computer system includes a computing component (e.g., a server, a mobile electronic device such as a smart phone or tablet device, a wearable device such as a watch, wristband or earpiece, a desktop computer, or a laptop computer) that includes one or more processors and memory separate from one or more of the display generation component (e.g., a heads-up display, a touch screen, or a stand-alone display), one or more output devices (e.g., an earpiece or external speaker), and one or more input devices. In some embodiments, the display generating component and the one or more audio output devices are integrated and enclosed in the same housing.

In method 13000, a display is generated that includes one or more display generating components and one or more input devices (e.g., buttons, dials, rotatable input elements, switches, A movable hardware input device or a solid state hardware input device that detects local sensor input, such as intensity or force sensor input), or a device in communication with one or more display generation components and one or more input devices, in some embodiments, the device (e.g., sometimes referred to herein as a computer system) uses the local sensor input from the solid state hardware input device to trigger a corresponding operation, and optionally provide haptic feedback, such as haptic feedback corresponding to the detected input), when the device displays an application user interface via one or more display generation components (e.g., when an application user interface 7018 as shown and described with reference to fig. 7B is being displayed), the device detects (13002) a first input to an input device of the one or more input devices (e.g., device detection to a hardware input element 7108 (e.g., button), Crown or rotatable and depressible input element) as shown and described with reference to fig. 7B) disposed on (e.g., integrated into) a housing of a device that includes one or more display generating components; responsive to detecting (13004) an input device disposed on a housing of the device (e.g., solid state buttons, Hardware buttons) by replacing (13006) the display of at least a portion of the application user interface (e.g., ceasing the display of at least a portion of the application user interface); stopping the display of the application user interface 7018 as shown and described with respect to FIG. 7C, occluding at least a portion of the application user interface with the main menu user interface, displaying at least a portion of the occluding application user interface with the main menu user interface, displaying the main menu user interface (e.g., the main menu user interface 7110 as shown and described with respect to FIG. 7C includes application icons (e.g., representations 7112-7126 as shown and described with respect to FIG. 7C), displaying the main menu user interface 7110 as shown and described with respect to FIG. 7C), Gadgets, communication options (e.g., representations 7138, 7140, and 7142 as shown and described with reference to fig. 7D), and/or affordances for displaying an augmented reality (XR) context (e.g., representations 7144 and 7146 as shown and described with reference to fig. 7E), e.g., a main menu user interface superimposed over an application User Interface (UI), e.g., objects in the main menu UI (e.g., application icons, and the like, Virtual UI icons and other objects) are opaque or partially transparent, thereby blocking or occluding corresponding portions of the application UI via the one or more display generating components (e.g., those portions of the application UI that are positioned behind the main menu UI; in some embodiments, the main menu UI includes a album having a plurality of objects on the album, and the album is opaque or partially transparent, thereby blocking or occluding those portions of the application UI that are positioned behind the main menu UI via the one or more display generating components), and when the main menu user interface is displayed via the one or more display generating components, device detection (13008) of a second input to an input device disposed on a housing of the device (e.g., device detection pair hardware input element 7108 (e.g., button), Crown or rotatable and depressible input element), as shown and described with respect to fig. 7H), in response to detecting (13010) a second input to an input device disposed on a housing of the device, device cancel (13012) the main menu user interface (e.g., cease display of the main menu user interface 7110, as shown and described with respect to fig. 7J, cease display of at least a portion of the main menu user interface).

Using a single input to an input device (e.g., hardware input element 7108 (e.g., button, crown, or depressible input element) as shown and described with reference to fig. 7B is disposed on the housing of one or more display generating components 7100 by which portions of the physical environment and virtual environment are rendered visible) provides intuitive top-level access to different representation sets (e.g., of an application as shown and described with reference to fig. 7C; a representation set of a person with which the user 7002 may initiate or remain in communication as shown and described with reference to fig. 7D; a representation set of a virtual environment may be selected as shown and described with reference to fig. 7E) without displaying additional controls (e.g., without requiring the user to view user interface elements), thereby improving the operational efficiency of user-machine interactions based on a single input. Using a single input to an input device (e.g., a hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element), as shown and described with reference to fig. 7B) reduces the amount of time required to navigate within or transition out of the virtual environment. The physical location of the input device provides an intuitive and reliable mechanism (e.g., a haptic touch/mechanical actuation mechanism) for receiving user input, which improves the reliability and operational efficiency of the device (e.g., a computer system).

In some embodiments, the method 13000 includes providing a perspective portion of the physical environment of the device or one of the one or more display generating components to a user of the device concurrently with displaying the application user interface (e.g., displaying the perspective portion including the display representing 7014' while also displaying the mini-player application user interface 7154, as shown and described with reference to fig. 7M), and optionally also displaying a portion of the virtual environment. For example, in some embodiments, the application user interface is displayed on or in front of a portion of the mixed reality environment that includes both a transparent portion of the physical environment of the device (or the physical environment of one of the one or more display generating components) and a virtual element (e.g., a portion of the virtual environment, box 7016, as shown and described with reference to fig. 7M) that is different from the application user interface.

In some embodiments, the device is a head mounted device that includes the input device and the one or more display generating components, and the method includes generating a user interface that is visible to a user when the head mounted device is positioned on the user's head so as to cover the user's eyes. In some embodiments, the device is a tablet or other computer system having one or more integrated cameras and integrated displays (e.g., camera 10010 integrated on the tablet as shown and described with reference to fig. 10C), the input device is provided on a housing (e.g., integrated into the housing) of a head-mounted device (e.g., hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element) or button 7508, as shown and described with reference to fig. 10A-10D) rather than on a separate controller.

The physical location of the input device on the head-mounted device facilitates direct user control of the head-mounted device (e.g., without the user having to hold any separate physical controller; one or more of the user's hands are free from holding separate controllers) without displaying additional controls, and provides an intuitive and reliable mechanism (e.g., hardware input elements 7108 (e.g., buttons, crowns, or rotatable and depressible input elements) or buttons 7508 for receiving user input, as shown and described with reference to fig. 10A-10D, tactile touch/mechanical actuation), which improves reliability and operational efficiency of the head-mounted device.

In some embodiments, the main menu user interface is presented substantially in a central portion of the field of view of the user of the device (e.g., in the middle portion 7104 of the virtual environment 7000, as shown and described with reference to fig. 7B-7C, 7H-7I, 7L-7M, 8C, 9B, and 11D, not under the display of the application, in the central portion of the field of view of the user along the gaze direction of the user).

Presenting the main menu user interface substantially in the central portion of the field of view of the user of the device increases operational efficiency by avoiding further input (e.g., lowering or raising the user's gaze, or visually searching the main menu user interface and/or tilting/rotating the user's head to focus on the main menu user interface) and reducing the amount of time required to begin navigating within the main menu user interface, thereby increasing operational efficiency of the device (e.g., computer system).

In some embodiments, the input device is a hardware button or a solid state button (e.g., button or rotatable input element 7108 as shown and described with reference to fig. 7A-7O, or button 7508 as shown and described with reference to fig. 10A).

Hardware buttons or solid state buttons provide an efficient mechanism for a user to transition out of or navigate within a virtual environment without displaying additional controls (e.g., browsing user interface elements), which increases the reliability and operational efficiency of the device (e.g., computer system). Solid state buttons reduce the number of moving parts and allow the system to be reconfigurable (e.g., through firmware updates that allow the solid state buttons to provide different feedback, provide other functionality, receive additional types of input), thereby improving the performance and efficiency of the device (e.g., computer system).

In some embodiments, the device detects a rotational input to the hardware button, and in response to detecting the rotational input, the device performs a second operation (e.g., changing the level of immersion presented to a user of the device, as shown and described with reference to fig. 11B-11D; changing the volume of audio provided to the user of the device; scrolling through a user interface presented to the user of the device) that is different from displaying or canceling the main menu user interface.

Providing multiple system operations (e.g., application-specific operations) in response to different inputs to a single input device reduces the number of different input devices that must be provided to accomplish different tasks (e.g., N input devices may implement M operations, where N < M). Reducing the number of input devices that must be provided reduces physical clutter on the device, freeing up more physical space on the device and helping to prevent accidental input from inadvertent contact. Reducing the number of input devices also reduces the need to provide additional hardware wiring within the device, and instead the processor may be programmed to interpret different inputs from a fewer number of input devices.

In some embodiments, in response to detecting the first input to the input device, the device cancels the application user interface prior to or concurrent with displaying the main menu user interface (e.g., cancel application user interface 7018 when main menu user interface 7110 is displayed, as shown and described with reference to fig. 7B-7C; cancel resized application user interface 8002 when main menu user interface 7110 is displayed, as shown and described with reference to fig. 8B-8C; and cancel application user interfaces 9004, 9006, and 9008 when main menu user interface 7110 is displayed, as shown and described with reference to fig. 9A-9B).

Using the first input to the input device to cancel the application user interface prior to or concurrent with displaying the main menu user interface makes the main menu user interface easier to focus. Instead of requiring the user to individually instruct to turn off the application and/or to navigate to a special user interface control element to manually select the display of the main menu user interface, responding to the first input by both canceling the application and bringing the main menu user interface into focus without having to display additional controls increases the operating efficiency of the device and more efficiently utilizes the user's time.

In some embodiments, the device generates and displays a first user interface object associated with the application user interface (e.g., a "speed" object of the dragged application) before detecting a first input to an input device of the one or more input devices, and in response to detecting a first input to an input device (e.g., an input device disposed on a housing of the device, such as rotatable input element 7108 or button 7508, as shown in fig. 7A and 10A), the device and the cancel application user interface while maintaining display of the first user interface object (e.g., the "speed" object is retained (continued to be displayed) when the application is cancelled).

Providing a first user interface object (e.g., the first user interface object is an instance of an application or an object extracted or dragged from an application, sometimes referred to herein as a "speed object") allows a user to keep the use of the application (e.g., using an instance of an application) or to keep the display of data associated with the application even after the primary application is cancelled (e.g., the speed object is an instance copied from the application). Maintaining the display of such user interface objects allows the user to continue controlling the application (e.g., navigating through the main menu user interface) while multitasking, without displaying additional controls. The multitasking functionality is not affected by the presence of the main menu user interface triggered by the first input, thereby improving the performance and efficiency of the device (e.g., computer system).

In some implementations, prior to detecting the first input, the device generates and displays a first user interface object associated with the application user interface by extracting the first user interface object from the application user interface based on a third input (e.g., a user gesture that includes pulling the first user interface object out of the application user interface) directed to the application user interface (e.g., corresponding to or on the application user interface).

Providing a first user interface object (e.g., the first user interface object is an object of an application that is extracted or dragged from the application: "an overview object") allows a user to keep the use of the application (e.g., an instance of the application) or keep the display of data associated with the application even after the main menu user interface is displayed (e.g., the overview object is an instance that is copied from the application). Maintaining the display of such user interface objects allows the user to continue controlling the application (e.g., navigating through the main menu user interface) while multitasking, without displaying additional controls. The multitasking functionality is not affected by the presence of the main menu user interface triggered by the first input, thereby improving the performance and efficiency of the device (e.g., computer system).

In some implementations, in response to detecting the second input (e.g., the second button press), the device cancels both the first user interface object and the main menu user interface.

Using a single input (e.g., a second button press) to cancel both the first user interface object and the main menu user interface eliminates the need to display additional controls. The user does not have to waste time closing the first user interface object and/or navigating to a particular user interface control element to manually close the first user interface object, respectively, thereby improving performance and operational efficiency of the device (e.g., computer system).

In some embodiments, when the device displays the main menu user interface and the first user interface object (e.g., the mini-player user interface 7154, as shown and described with reference to fig. 7M) via the one or more display generating components, the device detects a fourth input directed to (e.g., corresponding to or on) a representation of a second application (e.g., different from the first application, as shown and described with reference to fig. 7M) displayed on the main menu user interface (e.g., the representation 7126 of the web browsing application) and, in response to detecting the fourth input, displays an application user interface of the second application concurrently with displaying the first user interface object (e.g., the application user interface 7178 of the web browsing application concurrently with the mini-player user interface 7154, as shown and described with reference to fig. 7L-7N).

Launching the second application from the main menu user interface when the first user interface object is displayed eliminates the need to display additional controls. Maintaining the display of the first user interface object provides a visual alert to the user that can facilitate selection of the appropriate second application. In some cases, the displayed first user interface object provides information that can be used in the second application without requiring the user to restart the first application after starting the second application, thereby allowing multiple tasks to be completed simultaneously, thereby improving performance and operating efficiency of the device (e.g., computer system).

In some implementations, the device detects a fifth input (e.g., a drag input) that moves a first user interface object (e.g., a video clip, an audio clip, a text file, or a message) onto an application user interface (e.g., a message or document) of a second application, and in response to detecting the fifth input, the device performs an operation in the second application (e.g., adds the video clip to the message or document) based on the first user interface object.

The ability to drag the first user interface object directly into the second application allows operations in the second application to be performed based on the first user interface object without displaying additional controls. Dragging the first user interface object allows for more direct and efficient user-machine interaction than having to sequentially open a particular application through the main menu user interface, and furthermore, the first user interface object is displayed and easily accessible when the user interacts with the second application, thereby improving performance and operating efficiency of the device (e.g., computer system).

In some embodiments, the device cancels the main menu user interface by replacing the display of the main menu user interface with the presentation of the passthrough portion of the physical environment of the device via one or more display generating components (e.g., cancel the main menu user interface 7110 when the passthrough portion containing representation 7014' is presented, as shown and described with reference to fig. 7K). In some embodiments, the transparent portion is an optical transparent, wherein a portion of the head-mounted display or head-up display is made translucent or transparent such that a user may view the real world around the user therethrough without removing the head-mounted display or moving away from the head-up display, the transparent portion gradually transitioning from translucent or transparent to completely opaque as the virtual or mixed reality environment is displayed, in some embodiments, the transparent portion is a virtual transparent, wherein a portion of the display generating component displays a live feed of images or video of at least a portion of the physical environment (e.g., representation 7014' of physical table 7014 is displayed in a virtual transparent captured by one or more cameras (e.g., a rear camera of a mobile device or associated with the head-mounted display, or other camera feeding image data to the device), as shown and described with reference to FIG. 7K.

Canceling the main menu user interface by replacing the display of the main menu user interface with a presentation of a pass-through portion of the physical environment of the head-mounted device via a display generation component improves user security, allowing the user to know the physical environment of the device (e.g., respond to an emergency or other situation requiring the user's attention or interaction with the physical environment) after the user has completed navigating the main menu user interface (via the pass-through portion of the physical environment of the device). Using the second input to activate the display of the pass-through portion allows the user to exit from the virtual environment and view at least a portion of the physical environment without displaying additional controls.

Canceling the main menu user interface includes ceasing to display the virtual environment in which the main menu user interface is displayed (e.g., the virtual environment includes virtual content that is computer-generated content that is different from the transparent portion of the physical environment).

Stopping displaying the virtual environment while canceling the main menu user interface allows the user to exit from the virtual environment and view at least a portion of the physical environment (e.g., cancel display of the virtual environment) by causing the second input to function similar to the input to the escape button without displaying additional controls.

In some embodiments, the device detects a sixth input on the representation of the first virtual environment displayed in the main menu user interface and, in response to detecting the sixth input on the representation of the first virtual environment displayed in the main menu user interface, the device replaces any currently displayed virtual environment with the first virtual environment (e.g., in response to a user selection pointing to representation 7144 as shown and described with reference to FIG. 8C-8E, the virtual environment depicting an office environment including an office environment of office table 7148 surrounded by an office chair as shown and described with reference to FIG. 8C is replaced with a virtual environment depicting beach scenery).

Displaying a main menu user interface that provides quick access to a set of selectable virtual environments provides a way to change the user's virtual experience without displaying additional controls, thereby minimizing the number of inputs required to select a desired virtual environment, thereby improving performance and operating efficiency of the device (e.g., computer system).

In some embodiments, the device displays representations of software applications capable of executing on the device (e.g., representations 7112-7126, as shown and described with reference to FIG. 7M) in a main menu user interface, detects seventh inputs directed to respective ones of the representations of software applications capable of executing on the device displayed in the main menu user interface (e.g., user inputs directed to representation 7126, as shown and described with reference to FIG. 7M), and, in response to detecting seventh inputs directed to respective ones of the software applications, displays (e.g., in a perspective of a three-dimensional environment such that the software applications corresponding to the representations are run in the perspective as focused applications) an application user interface (e.g., displays application user interface 7178, as shown and described with reference to FIG. 7N) of the software applications.

Allowing a single input to trigger the display of the main menu user interface allows the user to quickly access and navigate the set of applications in the main menu user interface, regardless of what is in progress (e.g., while the first application is running), without displaying additional controls, minimizes the number of inputs required to select the desired operation, improving performance and operating efficiency of the device (e.g., computer system).

In some embodiments, the device displays a first representation of the first person and a second representation of the second person in the main menu user interface, the first representation and the second representation being used to initiate (e.g., or continue) communication with the first person and the second person, respectively (e.g., as shown and described with reference to FIG. 7C), detects an eighth input directed to the first representation of the first person, and in response to detecting the eighth input directed to the first representation of the first person, displays a communication user interface for initiating a communication session with the first person.

In some embodiments, the device detects a ninth input directed to a representation of the collection displayed in the main menu user interface and, in response to detecting the ninth input directed to the representation of the collection, displays a representation of one or more virtual three-dimensional environments or one or more augmented reality environments (e.g., as shown and described with reference to FIG. 7E and FIGS. 8C-8E).

Allowing a single input to trigger the display of the main menu user interface allows the user to quickly access and navigate the representation set to change the user's virtual environment, regardless of what is in progress (e.g., while the first application is running), without displaying additional controls, minimizes the number of inputs required to select the desired operation, improving performance and operating efficiency of the device (e.g., computer system).

In some implementations, when the main menu user interface is displayed, the device detects a tenth input (e.g., a hand gesture, a gaze input, or a rotational input to a rotatable button, a user input of a hand movement provided by the hand 7020, as shown and described with reference to FIG. 7H), and in response to detecting the tenth input, the device scrolls through the main menu user interface based on the tenth input (e.g., a duration of the third input, a magnitude of the third input, or a speed of the third input) such that a first content in at least a portion of the main menu user interface is replaced by a second content (e.g., representations 7112-7126 are replaced by representations 7156-7174), as a second page of representations of applications, as shown and described with reference to FIG. 7H-7I, page-by-page scrolling, or continuous scrolling.

Scrolling through the main menu user interface allows the user to browse through a large number of items without being overwhelmed by too many items being presented to the user at the same time, thereby assisting in timely selection of desired operations without displaying additional controls. Furthermore, providing a scrollable main menu user interface in response to a first input effectively provides a greater range of applications, people, virtual environments, or other operations than is possible with a static scrollable main menu user interface.

In some embodiments, when a main menu user interface having a first section is displayed, the device detects an eleventh input (e.g., a hand gesture, a gaze input, or a rotational input to a rotatable button, a user input directed to tab 7134 as shown and described with reference to FIG. 7C, a user input directed to tab 7136 as shown and described with reference to FIG. 7D), and in response to detecting the eleventh input, the device displays a second section of the main menu user interface based on the eleventh input (e.g., displays a representation set of people with whom the user 7002 may initiate or remain in communication in response to the user input directed to tab 7134, as shown and described with reference to FIG. 7D; displays a representation set of a selectable virtual environment in response to the user input directed to tab 7136, as shown and described with reference to FIG. 7E; each section corresponds to a different set of selectable options; a representation set of software applications capable of executing on the device, as shown and described with reference to FIG. 7C, FIG. 7H, and FIG. 7I), one or more virtual environments, one or more representations of the three-dimensional environments, one or more of the three-dimensional environments, as shown and one or more sections 8 and the first section and the second section 8 are not shown and described with reference to FIG. 8.

Allowing a single input to trigger the display of the main menu user interface allows a user to quickly access and navigate the collection of applications in the main menu user interface and/or to change the user's virtual environment and/or to interact with additional users, regardless of what process is in progress (e.g., while the first application is running) without displaying additional controls, minimizing the number of inputs required to select a desired operation, improving performance and efficiency of the device (e.g., computer system). Furthermore, providing a main menu user interface having sections navigable by a user in response to a first input effectively provides the user with a greater range of applications, people, virtual environments, or other operations than is possible with a static main menu user interface.

In some embodiments, when a first section of the main menu user interface is displayed (e.g., the section corresponds to a different respective set of selectable options), such as a first set of representations of a software application capable of executing on the device and a second set of representations of one or more virtual three-dimensional environments or one or more augmented reality environments, although not shown, in response to user input (e.g., press input) to a hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element), after the set of representations of the virtual environments are displayed (as shown and described with reference to FIG. 7E), the main menu user interface 7110 displaying the set of representations of the virtual environments is dismissed, when a user provides a next user input (e.g., a button, crown, or rotatable and depressible input element) to the hardware input element 7108 within a first time period (e.g., within 5 minutes, within 1 minute), the main menu user interface 7110 displays the set of representations of virtual environments as shown and described with reference to FIG. 7E is set up to the device, and the user input device is set up to the device as shown and described with reference to FIG. 7J is set up from the first input device, such as shown and the twelve input device is set up to the device J7. In some embodiments, the device detects a thirteenth input (e.g., when the main menu user interface 7110 is not displayed) to an input device disposed on a housing of the device (a pressing input to a hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element)) and, in response to detecting the thirteenth input to an input device disposed on a housing of the device, the device displays a first section of the main menu user interface based on the thirteenth input (e.g., the first section includes a representation set of applications as shown and described with reference to fig. 7C), e.g., as shown by the transition from fig. 7K to fig. 7L.

Retaining information about the last accessed section on the main menu user interface reduces interference, allowing the user to quickly return to the previously accessed section of the main menu user interface without displaying additional controls when the user accesses the main menu user interface after leaving the main menu user interface. This feature helps save the user time, avoiding the need to re-browse various sections of the main menu user interface to return to previously accessed sections of the main menu user interface, for example, when the user briefly leaves the main menu user interface to perform a different operation, such as an operation in a particular application.

In some embodiments, in accordance with a determination that the time difference between detecting the twelfth input and detecting the thirteenth input is within a time threshold (e.g., the next day, the next session, or an hour, the time threshold optionally being dependent on a section of the main menu user interface (e.g., the application section being reset within a smaller time threshold than the people/contacts section)), the device displays a first section of the main menu user interface based on the thirteenth input (e.g., the first section of the main menu user interface is the same section that has been displayed before the user left the main menu user interface), and in accordance with a determination that the time difference exceeds the time threshold, the device resets the display of the main menu user interface to a predetermined section (e.g., the first page of the application or the first page of the contact).

Retaining information about the last accessed section on the main menu user interface reduces interference, allowing the user to quickly return to the previously accessed section of the main menu user interface without displaying additional controls when the user accesses the main menu user interface within a preset time threshold after leaving the main menu user interface. This feature helps save the user time, avoiding the need to re-browse various sections of the main menu user interface to return to previously accessed sections of the main menu user interface when the user briefly leaves the main menu user interface to perform a different operation, such as an operation in a particular application.

In some embodiments, when an application user interface is displayed via one or more display generating components (the application user interface comprising a first application user interface of a media content playing application (e.g., application user interface 7152 as shown and described in fig. 7F; application user interface 110002 as shown and described with reference to fig. 11C)) and when media content is played using the media content playing application, the device detects a first input to an input device disposed on (e.g., integrated into) a housing of the device comprising the one or more display generating components. In response to detecting a first input to an input device (e.g., disposed on a housing of the device), the device displays a main menu user interface (e.g., the main menu user interface includes application icons, gadgets, communication options, and/or affordances for displaying XR contexts) via one or more display generating components, and replaces display of a first application user interface of the media content playback application with a second application user interface of the media content playback application that is smaller in size than the first application user interface of the media content playback application (e.g., main menu user interface 7110 and mini-player user interface 7154 as shown and described with reference to fig. 7H, or main menu user interface 7110 and mini-player user interface 11012 are displayed in conjunction with cancelling the media application and displaying the mini-player as shown and described with reference to fig. 11D).

Providing a way for a user to multitask and continue playing media content while navigating to a main menu user interface increases the performance and efficiency of a computer system. Providing a mini-player allows a user to continue to control playback of media content and/or to indicate to the user the current "location" of media content playback (e.g., by displaying a time index corresponding to the current content media playback location or, for video content, displaying a video frame at the current location of video content) as the user navigates through the main menu user interface without displaying additional controls.

In some embodiments, replacing the display of the first application user interface of the media content play application with the second application user interface of the media content play application includes displaying a media player (e.g., a video picture-in-picture (PiP) player, such as the mini-player user interface 11012 shown and described with reference to fig. 11D, or displaying an audio player, such as the mini-player user interface 7154 in fig. 7G-7I), and the second application user interface of the media content play application includes one or more of a representation of the media content being played on the media content play application (e.g., a name or other identifier of the currently played media content, a time index corresponding to a current content media playback location, or for view content, a video frame at a current location in the video content, such as the mini-player user interface 11012 in fig. 11D), and playback control for the media content play application (e.g., pause, play, fast forward, or rewind, such as the mini-player user interface 7154 in fig. 7H, or the mini-player user interface 7112 in fig. 11D).

Providing a means for a user to multitask and continue the media experience (at least with some capability) while navigating virtually through the main menu user interface increases the performance and efficiency of the device (e.g., computer system). Providing a video picture-in-picture (PiP) player or displaying an audio mini-player allows a user to control the media experience (e.g., by providing playback control in the mini-player) and/or indicate to the user the current "location" of the user's media experience (e.g., by displaying a time index, or for video content, displaying a representation of the current video frame) when navigating through a main menu user interface, without displaying additional controls.

In some embodiments, in response to detecting a second input to the input device while the main menu user interface is displayed, the device cancels the main menu user interface and continues to display a second application user interface of the media content playback application (e.g., the mini-player user interface 7154 persists after the main menu user interface 7110 is hidden, as shown and described with reference to fig. 7H-7J).

Having the mini-player persist after the main menu user interface is canceled provides an uninterrupted media experience even after navigation in the virtual environment via the main menu user interface has ended, thereby improving the operational efficiency of the device (e.g., computer system) (e.g., the user does not need to navigate the main menu user interface and then restart the media application after canceling the main menu user interface).

In some embodiments, the device detects a first number of inputs to an input device provided on a housing of the device (e.g., two presses quickly, two presses within 1 second, two presses within 0.5 seconds or less) for a first period of time, and in response to detecting the first number of inputs to an input device provided on the housing of the device for the first period of time, the device displays an application management user interface (e.g., in response to detecting a quick succession of two press inputs, displays a system user interface 7180, e.g., a forced exit menu or multitasking user interface, as shown and described with reference to fig. 7O and table 3).

Using different types of inputs on a single input device to trigger multiple system operations (e.g., to trigger non-application specific operations) reduces the number of different input devices that must be provided to accomplish different tasks (e.g., N input devices may implement M operations, where N < M). Reducing the number of input devices that must be provided reduces physical clutter on the device, freeing up more physical space on the device and helping to prevent accidental input from inadvertent contact. Reducing the number of input devices also reduces the need to provide additional hardware wiring within the device, and instead the processor may be programmed to interpret different inputs from a fewer number of input devices. Using the same user input device, the user can quickly reach the application management user interface without having to present additional/intermediate controls.

In some embodiments, when a system user interface (e.g., system interface 7180, as shown and described with reference to FIG. 7O) is displayed via one or more display generating components, the device detects a corresponding input to an input device disposed on a housing of the device, the corresponding input being of the same type as a first input to the input device, and in response to detecting a corresponding input to an input device disposed on a housing of the device (e.g., integrated into the housing) when the system user interface is displayed, the device replaces a display of at least a portion of the system user interface by displaying a main menu user interface via one or more display generating components (e.g., although not shown in FIG. 7O), in response to detecting a user input (e.g., a press input to a hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element), as shown and described with reference to FIG. 7O, cancels the system user interface 7108, and presents a main menu user interface 7110, which includes a communication icon, a small menu icon, or a communication-enabled representation of the user interface as shown and described with reference to FIG. 7C.

In response to detecting a corresponding input that is the same type of input as the first input, the display of the main menu user interface is streamlined (e.g., by normalization) without regard to the user interface (e.g., system user interface or application user interface) that is currently being displayed, reducing the number of different control elements required by the device and allowing the user to view different sets of representations (e.g., representations of applications, people, and/or virtual environments) without displaying additional controls.

In some embodiments, after canceling the main menu user interface, and when the main menu user interface is not displayed, the device detects a fourteenth input to an input device disposed on a housing of the device. In response to detecting a fourteenth input to an input device disposed on the housing of the device, the device redisplays the main menu user interface (e.g., the main menu user interface includes application icons, gadgets, communication options, and/or affordances for displaying XR context) via the one or more display generation components.

The additional input enables the main menu user interface to be redisplayed after it has been cancelled without displaying additional controls. Allowing additional inputs to redisplay the main menu user interface provides a simple way for the user to return to the main menu user interface based on a single input, regardless of what process the user may have used on the device after canceling the main menu user interface. The input serves as a generic mechanism that enables a user to navigate directly to the top-level main menu user interface and then browse through different sets of representations (e.g., representations of applications, people, and/or virtual environments) in the main menu user interface without displaying additional controls.

FIG. 14 is a flowchart (also referred to as a flowchart) of an example method 1400 for performing different operations based on input to an input device depending on a current display mode, according to some embodiments.

In some embodiments, the method 14000 is performed at a computer system (e.g., computer system 101 in fig. 1) that includes a display generation component (e.g., display generation component 120 in fig. 1A, 3, and 4) (e.g., heads-up display, touch screen, or projector) and one or more cameras (e.g., cameras directed downward toward the user's hand (e.g., color sensors, infrared sensors, and other depth sensing cameras) or cameras directed forward from the user's head). In some embodiments, method 14000 is managed by instructions stored in a non-transitory computer-readable storage medium and executed by one or more processors of a computer system, such as one or more processors 202 of computer system 101 (e.g., control unit 110 in fig. 1A). Some operations in method 14000 are optionally combined and/or the order of some operations is optionally changed.

In some embodiments, the method 14000 is performed at a computer system (e.g., computer system 101 in fig. 1) in communication with a display generating component (e.g., display generating component 120 or display generating component 7100 in fig. 1A, 3, and 4) (e.g., heads-up display, HMD, display, touch screen, or projector) and one or more input devices (e.g., camera, controller, touch-sensitive surface, joystick, button, glove, watch, motion sensor, orientation sensor, and/or rotatable input mechanism, such as a crown). In some embodiments, the display generation component is a user-oriented display component and provides an XR experience to the user. In some embodiments, the computer system is an integrated device having one or more processors and memory enclosed in the same housing as at least some of the first display generating component, the one or more audio output devices, and the one or more input devices. In some embodiments, the computer system includes a computing component (e.g., a server, a mobile electronic device such as a smart phone or tablet device, a wearable device such as a watch, wristband or earpiece, a desktop computer, or a laptop computer) that includes one or more processors and memory separate from one or more of the display generation component (e.g., a heads-up display, a touch screen, or a stand-alone display), one or more output devices (e.g., an earpiece or external speaker), and one or more input devices. In some embodiments, the display generating component and the one or more audio output devices are integrated and enclosed in the same housing.

The method 14000 includes the computer system detecting (14002) a first input to an input device of the one or more input devices while the application user interface is displayed via the display generation component (e.g., detecting a press input to a hardware input element 7108 (e.g., a button, crown, or rotatable and pressable input element) while the application user interface 8000 is displayed, as shown and described with reference to fig. 8A). In response to detecting (14004) the first input to the input device, in accordance with a determination that the application user interface is in a first display mode (e.g., the application user interface 8000 is in the first display mode (e.g., the full immersion mode), as shown and described with reference to fig. 8A), the computer system displays (14006) the application user interface (e.g., displays the resized application user interface 8004, as shown and described with reference to fig. 8B) via the display generation component in a second display mode, wherein the first display mode includes an immersive mode that displays only content of the application user interface (e.g., displays content of the application user interface within a field of view of the user, without displaying content other than the content of the application user interface, and/or the content of the application user interface occupies substantially all of the field of view of the user), wherein the second display mode includes a non-immersive mode that simultaneously displays corresponding content of the application user interface and other content (e.g., displays the content of the application user interface and the content other than the content of the application user interface within the field of view of the user, the content of the application user interface occupying only a portion of the field of view of the user). On the other hand, in response to detecting (14004) the first input to the input device, in accordance with a determination that the application user interface is in the second display mode (e.g., when the resized application user interface 8004 is displayed, a press input to the hardware input element 7108 (e.g., button, crown, or rotatable and pressable input element) is detected, as shown and described with reference to fig. 8B), the computer system replaces (14008) the display of at least a portion of the application user interface by displaying a main menu user interface (e.g., displaying a main menu user interface 7110, as shown and described with reference to fig. 7C), the main user interface providing access to a different set of user navigable items including applications, people, or contact lists, and virtual environments. Optionally, the main menu user interface includes application icons, gadgets, communication options, and/or affordances for displaying XR context, e.g., the main menu UI is overlaid on the application UI. Optionally, the objects in the main menu UI (e.g., application icons, virtual UI icons, and other objects) are opaque or partially transparent, thereby blocking or obscuring corresponding portions of the application UI (e.g., those portions of the application UI that are positioned behind the main menu UI). In some embodiments, the main menu UI includes a album having a plurality of objects on the album, and the album is opaque or partially transparent, thereby blocking or obscuring those portions of the application UI that are positioned behind the main menu UI via the display generating component.

The user may use a single input to the input device to transition the device from a high immersion level (e.g., a full immersion mode in which only the content of the respective application is displayed, as shown and described with reference to application user interface 8000 of fig. 8A) to a lower immersion mode or a non-immersion mode (e.g., a resized application user interface 8004, as shown and described with reference to fig. 8B), or from a non-immersion mode to a mode in which the main menu user interface is also displayed), and to provide intuitive top-level access to a different set of representations (e.g., main menu user interface 7110, as shown and described with reference to fig. 8C) while the user is in a non-immersion experience, without displaying additional controls (e.g., without requiring the user to view user interface elements), thereby improving the operational efficiency of user-machine interaction based on a single input. Using a single input to the input device (e.g., a single press input to hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element), as shown and described with reference to fig. 8A and 8B) reduces the amount of time required to navigate within or transition out of the virtual environment.

In some implementations, when the main menu user interface is displayed via the display generating component (e.g., when the non-immersive experience is displayed, when the non-immersive application user interface 9002 is displayed, as shown and described with reference to fig. 9B), the computer system detects a second input to the input device (e.g., a hardware button, a solid button, or a rotatable input element, a subsequent press input to the hardware input element 7108 (e.g., a button, a crown, or a rotatable and pressable input element)), and in response to detecting the second input to the input device (e.g., the second press input), the computer system cancels the main menu user interface (e.g., stops display of the main menu user interface 7110, as shown and described with reference to fig. 9C).

When the computer system is operating in the non-immersion mode, using the second input to cancel the main menu user interface (e.g., by providing the user with a mini-player user interface 7154 to provide the user with a non-immersion experience, for example, as shown and described with reference to fig. 7I-7J) provides an efficient way to terminate navigation activities on the main menu user interface without interfering with the application user interface in the non-immersion experience. No additional controls need to be provided to the user and the user does not need to browse any additional user interface control elements to exit the main menu user interface, thereby improving the operating efficiency of the computer system.

In some embodiments, displaying the application user interface in the non-immersive mode includes simultaneously displaying the virtual environment and the application user interface (e.g., the resized application user interface 8004 is displayed in the non-immersive mode and includes a display depicting the virtual environment of the office environment, as shown and described with reference to fig. 8B), and in response to detecting a first input to the input device while displaying the application user interface in the non-immersive mode, the computer system continues to display at least a portion of the virtual environment (e.g., continues to display the virtual environment depicting the office environment, as shown in fig. 8C, while simultaneously displaying the main menu user interface 7110 in response to a press input on a hardware input element 7108 (e.g., a button, crown, or rotatable and pressable input element), thereby displaying the application user interface in the virtual environment in the second mode. Upon detecting a first input to an input device of the one or more input devices, the computer system maintains display of the application user interface in a second display mode in the virtual environment (e.g., displays a non-immersive experience in the virtual environment that continues to be displayed after the button is pressed, as shown and described with reference to fig. 8B and 8C).

When interacting with an application user interface in a non-immersive mode, the virtual environment forms part of the user experience. Displaying the application user interface in a non-immersive experience while maintaining the display of the virtual environment after the first input is detected minimizes interference to the user in navigating the main menu user interface without displaying additional controls. By maintaining the display of the virtual environment, the user does not need to re-initialize the virtual environment after navigating in the main menu user interface, thereby improving the performance and operating efficiency of the computer system.

In some implementations, when the main menu user interface is displayed, the computer system continues to display at least a portion of the virtual environment (e.g., as shown and described with reference to fig. 7H, 8B, and 8C, the resized application user interface 8004 is displayed in a non-immersive mode, as shown and described with reference to fig. 8B, continues to display the virtual environment depicting the office environment, as shown in fig. 8C, while the main menu user interface 7110 is displayed in response to a press input on the hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element).

Continuing to display the virtual environment while the main menu user interface is displayed minimizes interference to the user while navigating the main menu user interface without displaying additional controls. By maintaining the display of the virtual environment, the user does not need to re-initialize the virtual environment after navigating in the main menu user interface, thereby improving the performance and efficiency of the computer system.

In some embodiments, the computer system displays representations of two or more virtual environments in a main menu user interface (e.g., as shown and described with reference to FIG. 8D), and in response to detecting a selection of a first virtual environment of the two or more virtual environments, replaces at least a respective portion of the virtual environment with the first virtual environment (e.g., the first virtual environment is different from the portion of the virtual environment as shown and described with reference to FIG. 8E).

Displaying a main menu user interface that provides quick access to a set of selectable virtual environments provides a way to change the user's virtual experience without displaying additional controls, thereby minimizing the number of inputs required to select a desired virtual environment, thereby improving the performance and efficiency of the computer system.

In some embodiments, when the computer system displays representations of software applications capable of executing on the device in the main menu user interface, the computer system detects a third input directed to a respective one of the representations of software applications capable of executing on the device displayed in the main menu user interface and, in response to detecting the third input directed to the respective one of the software applications, the computer system displays an application user interface of the software applications (e.g., in a perspective of the three-dimensional environment such that the software application corresponding to the representation is run in the perspective as an application in focus (e.g., application user interface 7178, as shown and described with reference to fig. 7L and 7M).

In some embodiments, when the computer system displays a first representation of a first person and a second representation of a second person in the main menu user interface (the first representation and the second representation being used to initiate or continue communication with the first person and the second person), the computer system detects a fourth input directed to the first representation of the first person and, in response to detecting the fourth input directed to the first representation of the first person, the computer system displays a communication user interface for initiating a communication session with the first person (e.g., representation 7138 of the first person, representation 7140 of the second person, representation 7142 of the third person is shown in fig. 7D, in response to user input directed to the first person, the computer system displays a communication user interface for initiating a communication session with the first person (e.g., a communication session as shown and described with reference to fig. 9D).

In some embodiments, when the computer system displays representations of one or more virtual three-dimensional environments or one or more augmented reality environments in the main menu user interface, the computer system detects a fifth input directed to a respective one of the representations of one or more virtual three-dimensional environments or one or more augmented reality environments, and in response to detecting the fifth input directed to the respective one of the representations of one or more virtual three-dimensional environments or one or more augmented reality environments, the computer system replaces any currently displayed virtual environment with the virtual three-dimensional environment or augmented reality environment associated with the respective representation (e.g., in response to a user selection directed to representation 7144 as shown and described with reference to fig. 8C-8E, the virtual environment depicting an office environment including an office table 7148 surrounded by an office chair as shown and described with reference to fig. 8C is replaced with a virtual environment depicting beach scenery).

In some implementations, the input device is a hardware button or a solid state button. Using input to hardware buttons or solid state buttons to control the level of immersion providing application content (e.g., from a fully immersed mode to a non-immersed mode) or displaying a main menu user interface provides intuitive top-level access to basic operational functions of a computer system without displaying additional controls (e.g., without requiring a user to view user interface elements), thereby improving the operational efficiency of the computer system. The solid state buttons reduce the number of moving parts, which increases reliability and allows the system to be reconfigurable (e.g., through firmware updates that allow the solid state buttons to provide different feedback, provide other functionality, receive additional types of input), thereby increasing the performance and efficiency of the computer system.

FIG. 15 is a flowchart (also referred to as a flowchart) of an example method 1500 for performing one or more different operations based on input to an input device, where the performed operations depend on characteristics of a displayed application user interface, according to some embodiments.

In some embodiments, the method 15000 is performed at a computer system (e.g., computer system 101 in fig. 1) that includes a display generation component (e.g., display generation component 120 in fig. 1A, 3, and 4) (e.g., heads-up display, touch screen, or projector) and one or more cameras (e.g., cameras directed downward toward the user's hand (e.g., color sensors, infrared sensors, and other depth sensing cameras) or cameras directed forward from the user's head). In some embodiments, method 15000 is managed by instructions stored in a non-transitory computer readable storage medium and executed by one or more processors of a computer system, such as one or more processors 202 of computer system 101 (e.g., control unit 110 in fig. 1A). Some operations in method 15000 are optionally combined and/or the order of some operations is optionally changed.

In some embodiments, the method 15000 is performed at a computer system (e.g., computer system 101 in fig. 1) in communication with a display generation component (e.g., display generation component 120 or display generation component 7100 in fig. 1A, 3, and 4) (e.g., heads-up display, HMD, display, touch screen, or projector) and one or more input devices (e.g., camera, controller, touch-sensitive surface, joystick, button, glove, watch, motion sensor, orientation sensor, and/or rotatable input mechanism, such as a crown). In some embodiments, the display generation component is a user-oriented display component and provides an XR experience to the user. In some embodiments, the computer system is an integrated device having one or more processors and memory enclosed in the same housing as at least some of the first display generating component, the one or more audio output devices, and the one or more input devices. In some embodiments, the computer system includes a computing component (e.g., a server, a mobile electronic device such as a smart phone or tablet device, a wearable device such as a watch, wristband or earpiece, a desktop computer, or a laptop computer) that includes one or more processors and memory separate from one or more of the display generation component (e.g., a heads-up display, a touch screen, or a stand-alone display), one or more output devices (e.g., an earpiece or external speaker), and one or more input devices. In some embodiments, the display generating component and the one or more audio output devices are integrated and enclosed in the same housing.

In method 15000, the computer system includes or communicates with a display generating component and one or more input devices (e.g., buttons, dials, rotatable input mechanisms, switches, movable components, or solid state components, for example, devices that detect local sensor inputs such as intensity or force sensor inputs, and the computer system uses the inputs to trigger corresponding operations and optionally provide haptic feedback, such as haptic feedback corresponding to the detected inputs). In method 15000, while the computer system is displaying an application user interface of the application via the display generation component, the computer system detects (15002) a first input (e.g., a press input) to an input device (e.g., a button, a solid state button, a hardware button, or a rotatable input mechanism) of the one or more input devices. In response to detecting (15004) the first input to the input device, the computer system displays (15006) a main menu user interface (e.g., displays main menu user interface 7110, as shown and described with reference to FIG. 9B) via the display generation component. The home menu user interface is sometimes referred to as a home screen user interface and the home screen user interface does not necessarily block or replace all other displayed content. The main menu user interface is a virtual user interface that is optionally displayed in the XR environment, rather than a default login user interface that is displayed to the user whenever an interaction with the computer system is initiated. The home screen user interface is different from a default login user interface that automatically displays various representations to a user without specific user input. Further, in response to the first input, in accordance with a determination that the application is currently being shared in a content sharing session, wherein content of the application is concurrently visible to multiple participants in the content sharing session (e.g., sharing media content for consumption in conjunction with the multiple participants, sharing/streaming game content to multiple participants in a multiplayer game, and/or sharing videoconference content among multiple participants of the videoconference), the computer system maintains (15008) display of at least a portion of the application user interface while the main menu user interface is displayed (e.g., continues to display the application user interface 9002, as shown and described with reference to fig. 9B-9D). On the other hand, in response to the first input, in accordance with a determination that the application is not shared in the content sharing session (e.g., the application is not shared in any content sharing session and is not in the content sharing session), the computer system stops (15010) display of the application user interface (e.g., stops displaying the application user interfaces 9004, 9006, 9008, as shown and described with reference to fig. 9A and 9B).

Canceling the private application of the user using the first input to the input device while not affecting any shared application minimizes interference to both the user and other users during the sharing experience, thereby improving the efficiency of the multi-user interaction. The ability to use the first input to distinguish between shared applications and private (e.g., non-shared) applications allows for separate control of both types of applications (e.g., prioritizing shared applications over private applications) without having to display additional controls. The amount of interference that a user may experience while in a group interaction session is reduced by using a first input to quickly cancel a private application and making a shared application more easily focus.

In some embodiments or in some cases, the computer system shares an application currently in a content sharing session with multiple participants in a real-time communication session (e.g., shares a video player application user interface 9002 with Abe, mary, edwin and Isaac in a content sharing session as shown and described with reference to fig. 9D). For example, multiple participants of a real-time communication session may communicate with each other using audio (e.g., via microphones and/or speakers in communication with the user's respective computer system) and video and/or 3D representations such as avatars that represent changes in the positioning and/or expression of the participants in the real-time communication session over time. In some embodiments, audio received from the respective user is simulated as being received from a location corresponding to the current location of the respective user in the three-dimensional environment (e.g., sound of Edwin's voice is presented to the participant in the form of sound originating from location 9408 within the shared three-dimensional environment 9200'). In some embodiments, the location in the virtual three-dimensional environment where the content sharing session occurs is different from the locations of the representations of the multiple participants (e.g., the content sharing session of the application user interface 9002 occurs near location 9010 and is different from locations Abe, mary, isaac and Edwin at locations 9402, 9404, 9406, and 9408, respectively, as shown and described with reference to fig. 9D).

In some embodiments, the application user interface of the application currently being shared in the content sharing session, or an element or corresponding portion of the application user interface of the application currently being shared in the content sharing session, has a shared spatial relationship in which one or more user interface objects visible to multiple participants in the content sharing session (e.g., in a real-time communication session as shown and described with reference to fig. 9D, the content sharing session of the application user interface 9002 occurs near location 9010 behind box 7016, user interface objects visible to multiple participants in the content sharing session, and the spatial relationship between box 7016 and application user interface 9002 is consistent for Abe, mary, isaac and Edwin at locations 9402, 9404, 9406, and 9408, as shown and described with respect to fig. 9D) have a consistent spatial relationship from different perspectives of the plurality of participants in the content sharing session (e.g., box 7016 appears to the left of Edwin with respect to Edwin at location 9408 and a majority of application user interface 9002 appears to the right of box 7016, conversely, box 7016 appears to the right of Abe with respect to Abe at location 9402 and a majority of application user interface 9002 also appears to the right of box 7016, as shown and described with respect to fig. 9D, method 1500 optionally includes maintaining a shared spatial relationship for the application user interface, an element or portion of the application user interface, or one or more user interface objects visible to the plurality of participants in the content sharing session such that different perspectives of the plurality of participants in the content sharing session have a consistent spatial relationship, as shown and described with respect to fig. 9D).

Allowing a shared application to have a three-dimensional sense of realism that increases the experience with respect to the shared spatial relationship of multiple users and increases the ease of use for each user. Each user may independently position himself at a location relative to a user interface object that represents the corresponding content selected/trimmed for the particular user. The spatial relationship selected for a particular user (the spatial relationship between the user interface object and the representation of the particular user) will not affect the spatial relationship desired by another user. Allowing different spatial relationships between an application or an element or portion of an application and different users enhances the ability of different users to control their individual interactions (e.g., viewing interactions) with the application or the element or portion of the application.

In some embodiments, the spatial relationship is shared such that the spatial relationship between the first user interface object representing the respective content to the first participant and the viewpoint of the first participant from the perspective of the first participant coincides with the spatial relationship between the second user interface object representing the respective content to the second participant and the representation of the first participant from the perspective of the second participant, and the spatial relationship between the second user interface object representing the respective content to the second participant and the viewpoint of the second participant from the perspective of the second participant coincides with the spatial relationship between the first user interface object representing the respective content to the first participant and the representation of the second participant from the perspective of the first participant (e.g., the representation of Edwin (at position 9408) appears to the right of Abe (at position 9402), and a majority of the application user interface of 9002 appears to the box 7016 and the representation of Edwin at position 9408 to the right of Abe from the viewpoint of Abe, and a majority of the application user interface of Edwin appears to the left of the box 7016 as viewed from the left of the viewpoint of Abe, and a majority of the application of Edwin appears to the left as shown in the left of the box 7016.

Allowing a shared application to have a three-dimensional sense of realism that increases the experience with respect to different spatial relationships of multiple users and increases the ease of use for each user. Each user may position himself at a position relative to a user interface object representing the respective content selected/trimmed for the particular user. The spatial relationship selected for a particular user (the spatial relationship between the user interface object and the representation of the particular user) will not affect the spatial relationship desired by another user. Allowing different spatial relationships to be obtained between different users enhances the ability of the different users to control their individual interactions (e.g., viewing interactions) with each of the user interface objects.

In some embodiments, a computer system detects input of an application user interface of an application currently being shared in a content sharing session by a first participant of a plurality of participants, and in response to detecting input of the application user interface by the first participant, the computer system moves an application user interface of the application currently being shared in the content sharing session or an element or corresponding portion of the application user interface of the application currently being shared in the content sharing session for both the first participant and a second participant of the plurality of participants (e.g., as shown and described with reference to FIG. 9D). For example, during and after the movement of the application user interface or an element or portion of the application user interface, the spatial relationship of the application user interface or the element or portion of the application user interface changes in a consistent manner with respect to the first participant and the second participant of the plurality of participants. If the second participant moves content (e.g., a second user interface object or application user interface), the content will also move for the first user (e.g., "same content" will move in a representation of the application user interface or a representation of the three-dimensional environment provided to both the first participant and the second participant).

The user interface allowing one participant to move an application with respect to another participant eliminates the need to apply the same changes to multiple participants sequentially or manually, thereby improving the communication efficiency of the multiple participants. Allowing simultaneous changes in spatial relationships between user interface objects representing respective content to different participants in a self-consistent manner increases the realism of a multi-user experience and better simulates the content sharing experience in a physical environment.

In some embodiments, the computer system displays a main menu user interface in front of the application user interface of the application (e.g., main menu user interface 7110 is presented closer to the user in the z-direction than application user interface 9002, as shown and described with reference to fig. 9B).

Displaying the main menu user interface in front of the application user interface of the application allows the user to navigate the set of applications in the main menu user interface and/or to change the virtual environment of the user and/or to interact with additional users while an ongoing content sharing session is ongoing. Displaying the main menu user interface in front of the application user interface improves operational efficiency-eliminating the need to interfere (e.g., by having to close) with the content sharing session of the sharing application in order for a particular user to navigate the main menu user interface.

In some embodiments, the computer system concurrently displays application user interfaces for two or more applications (e.g., the two or more applications include private applications and/or applications used in a content sharing session, such as a video player application having an application user interface 9002 corresponding to Abe, mary, isaac and Edwin in a content sharing session and an application user interface 9004 corresponding to a messaging application, an application user interface 9006 corresponding to a calendar application, and an application user interface 9008 corresponding to a web browsing application, as shown and described with reference to fig. 9A).

Displaying application user interfaces for two or more applications simultaneously allows a user to multitask and thereby provide more information to the user without additional user input, thereby improving the operating efficiency of the computer system.

In some embodiments, in response to the first input, the computer system ceases to display respective application user interfaces of the two or more applications while continuing to display another application user interface of the two or more applications (e.g., ceasing to display application user interfaces of the two or more applications includes ceasing to display application user interfaces of the private applications, such as application user interfaces 9004, 9006, and 9008, as shown and described with reference to fig. 9A and 9B, and continuing to display another application user interface of the two or more applications includes continuing to display application user interfaces of applications used in the content sharing session, such as application user interface 9002, as shown and described with reference to fig. 9B).

Using the first input to stop displaying the application user interface of the first application while continuing to display the application user interface of another application helps reduce the amount of interference that the user may experience while in the group interaction session without having to display additional controls. Further, canceling the private application while continuing to display the shared application in response to the first input enables the user to focus the shared application without having to display additional controls.

In some embodiments, in response to a first input, the computer system ceases to display a first plurality of the two or more applications (e.g., the first plurality of the two or more applications is a private application group that does not have an ongoing content sharing session) (e.g., an application user interface) while continuing to display at least one of the two or more applications (e.g., at least one of the two or more applications is an application currently in the content sharing session, such as application user interface 9002, as shown and described with reference to fig. 9B) (e.g., an application user interface).

Using the first input to stop displaying the application user interface of the first plurality of applications while continuing to display the application user interface of another application helps reduce the amount of interference that a user may experience while in a group interaction session. Canceling the private application while continuing to display the shared application in response to the first input enables the user to focus the shared application without having to display additional controls. Furthermore, the amount of input required to cancel the private application and to maintain the display of the shared application is reduced—instead of having to individually minimize or cancel the first plurality of applications, the first input is sufficient to stop the display of the first plurality of applications.

In some embodiments, in response to a first input, the computer system maintains a display of a second plurality of the two or more applications (e.g., the second plurality of the two or more applications are applications currently in a content sharing session, such as application user interface 9002 that is in a content sharing session with Abe, isaac, mary and Edwin, as shown and described with respect to fig. 9A) (e.g., application user interface) while the computer system ceases to display at least one of the two or more applications (e.g., at least one application is a private application (e.g., application user interfaces 9004, 9006, and/or 9008) that does not have an ongoing content sharing session, as shown and described with respect to fig. 9B) (e.g., application user interface).

Using the first input to maintain display of an application user interface of the second plurality of applications while ceasing to display an application user interface of another application helps reduce the amount of interference a user may experience while in a group interaction session. Canceling one or more private applications while continuing to display the shared application in response to the first input enables the user to focus the shared application without having to display additional controls. Further, the amount of input required to cancel the private application and maintain the display of the shared application is reduced—instead of having to individually minimize or cancel at least one of the two or more applications, the first input is sufficient to maintain the display of a second of the two or more applications while ceasing to display at least one of the two or more applications.

In some embodiments, when at least a portion of both the main menu user interface and the application user interface of the application currently being shared in the content sharing session are displayed (e.g., the state is reached after the first press input), the computer system detects a second press input (e.g., the second press input), and in response to detecting the second input, the computer system stops the display of the main menu user interface but maintains the display of the portion of the application user interface of the application currently being shared in the content sharing session while the main menu user interface 10 is not displayed (e.g., when the main menu user interface 7110 and the application user interface 9002 in the content sharing session are displayed, as shown and described with reference to FIG. 9B, the computer detects the second press input on the hardware input element 7108 (e.g., a button, a crown, or a rotatable and pressable input element), and in response to detecting the second press input on the hardware input element 7108 (e.g., a button, a crown, or a rotatable and pressable input element), the main menu user interface 10 is canceled while the display of the application user interface 9002 is maintained as shown and described with reference to FIG. 9C.

Using a second input, such as a press input, to cancel the main menu user interface provides an efficient way to terminate navigation activity on the main menu user interface without interfering with the content sharing session of the sharing application. No additional controls need to be provided to the user and the user does not need to browse any additional user interface control elements to exit the main menu user interface, thereby improving the operating efficiency of the device.

In some embodiments, the computer system concurrently displays, via the display generation component, the application currently being shared in the content sharing session (e.g., the application user interface of the application) and the pass-through portion of the physical environment of the computer system (e.g., both the application user interface 9002 in the content sharing session and the pass-through portion of the physical environment of the computer system, as described with reference to fig. 9C).

Allowing the shared application to be displayed concurrently with the pass-through content increases the security of the user by allowing the user (via the pass-through portion of the physical environment of the computer system) to be aware of the physical environment of the computer system (e.g., responsive to an emergency or other situation requiring the user's attention or requiring the user to interact with the physical environment) while not interfering with an ongoing content sharing session involving more than one user.

In some implementations, when the main menu user interface is displayed, the computer system detects movement of the application user interface by a second participant of the plurality of participants, and in response to detecting movement of the application user interface by the second participant, the computer system moves the application user interface (e.g., application user interface 9002 moves from one location to another location) for the plurality of participants including the first participant and the second participant based on movement of the application user interface by the second participant, as described above with reference to fig. 9D.

Allowing other participants to move the user interface of the application while the first participant is navigating the main user interface on her separate computer system helps to minimize interference with the multi-user experience (e.g., the content sharing session of the application). For example, other participants may continue to interact with the user interface of the application in the content sharing session without regard to or being constrained by the fact that the main menu user interface is displayed for the first participant. Furthermore, allowing for simultaneous changes in spatial relationships between user interface objects representing respective content to different participants in a self-consistent manner increases the realism of a multi-user experience and better simulates a content sharing experience in a physical environment. The simultaneous change in the positioning of the user interfaces of the applications of two or more participants also eliminates the need to apply the same change, either sequentially or manually, to the application user interfaces as seen by the multiple participants (e.g., as displayed by the respective computer systems of the multiple participants), thereby improving the communication efficiency of the multiple participants.

In some implementations, the first input to the input device includes a press input on a hardware button or a solid state button. In some embodiments, the hardware button includes a rotatable input element or mechanism, such as a digital crown.

Providing a dedicated button (e.g., a solid state button or a hardware button) for receiving the first input allows the user (e.g., without having to interact with the user interface of any software application) to more quickly and responsively distinguish the shared application from the private application. Instead of wasting time closing the application and/or navigating to a special user interface control element to manually select the shared application, a dedicated button (e.g., a hardware button or a solid state button) can quickly cancel the private application and bring the shared application into focus without having to display additional controls. Reducing the amount of input required to cancel the private application and bring the shared application into focus enhances the operability of the device and makes the user-device interface more efficient, which additionally reduces power usage and extends the battery life of the device by enabling the user to use the device more quickly and efficiently.

Fig. 16 is a flowchart (also referred to as a flowchart) of an example method 1600 for resetting an input registration process, according to some embodiments.

In some embodiments, the method 16000 is performed at a computer system (e.g., computer system 101 in fig. 1) that includes a display generation component (e.g., display generation component 120 in fig. 1A, 3, and 4) (e.g., heads-up display, touch screen, or projector) and one or more cameras (e.g., cameras directed downward toward the user's hand (e.g., color sensors, infrared sensors, and other depth sensing cameras) or cameras directed forward from the user's head). In some embodiments, method 16000 is managed by instructions stored in a non-transitory computer-readable storage medium and executed by one or more processors of a computer system, such as one or more processors 202 of computer system 101 (e.g., control unit 110 in fig. 1A). Some operations in method 16000 are optionally combined, and/or the order of some operations is optionally changed.

In some embodiments, the method 16000 is performed at a computer system (e.g., computer system 101 in fig. 1) in communication with a display generation component (e.g., display generation component 120 or display generation component 7100 in fig. 1A, 3, and 4) (e.g., heads-up display, HMD, display, touch screen, or projector) and one or more input devices (e.g., camera, controller, touch-sensitive surface, joystick, button, glove, watch, motion sensor, orientation sensor, and/or rotatable input mechanism, such as a crown). In some embodiments, the display generation component is a user-oriented display component and provides an XR experience to the user. In some embodiments, the computer system is an integrated device having one or more processors and memory enclosed in the same housing as at least some of the first display generating component, the one or more audio output devices, and the one or more input devices. In some embodiments, the computer system includes a computing component (e.g., a server, a mobile electronic device such as a smart phone or tablet device, a wearable device such as a watch, wristband or earpiece, a desktop computer, or a laptop computer) that includes one or more processors and memory separate from one or more of the display generation component (e.g., a heads-up display, a touch screen, or a stand-alone display), one or more output devices (e.g., an earpiece or external speaker), and one or more input devices. In some embodiments, the display generating component and the one or more audio output devices are integrated and enclosed in the same housing.

In method 16000, a computer system includes or communicates with a display generation component and one or more input devices (e.g., buttons, dials, rotatable input elements, switches, movable or solid state components, cameras, infrared sensors, accelerometers, gyroscopes, inertial measurement sensors, touch sensitive surfaces (e.g., devices that detect local sensor inputs such as intensity or force sensor inputs that the computer system uses to trigger corresponding operations and optionally provide haptic feedback such as haptic feedback corresponding to the detected inputs), cameras, controllers, and/or joysticks). When the computer system is in operation (e.g., when an application is running or when an application to which a user is providing a first type of input is running), the computer system detects (16002) the first input of the first type of input via an input device (e.g., a camera, infrared sensor, and/or inertial measurement sensor, accelerometer, or gyroscope) of the one or more input devices, wherein the first type of input is determined based on a location and/or movement of the first biometric feature (e.g., a location and/or movement of an eye, pupil, face, head, body, arm, hand, finger, leg, foot, toe, or other biometric feature of the user of the device).

In response to detecting (16004) the first input via the input device, the computer system performs (16006) a first operation in accordance with the first input, wherein the operation is determined at least in part from first input registration information from a previous input registration process for the first type of input. After performing the first operation according to the first input (e.g., the first operation does not perform satisfactorily due to inaccurate calibration of the first type of input or other drawbacks associated with previous input registration procedures, or the computer system needs to recalibrate due to a change in one or more characteristics of the user (e.g., changing the appearance or other characteristics of the user's finger, wrist, arm, eye (e.g., due to infection or a change in contact lens type/color) or changing the impairments of sound (e.g., due to illness)), the computer system detects (16008) a second input of a second type of input (e.g., different from the first type of input) via an input device of the one or more input devices (e.g., the same input device or a different input device), and in response to detecting (16010) the second input, the computer system initiates (16012) an input registration procedure for the first type of input (e.g., by presenting one or more of the user interface 10004, user interface 10006, user interface 10008, visual indication 10012) as shown and described with reference to fig. 10B through 10D).

Initializing an input registration reset for the first type of input using the second type of input allows more accurate and precise input registration information to be used for calibration and/or performing operations based on the first type of input. Instead of having the user use the first type of input to navigate through the user interface element (e.g., menu or other control element) in order to reset the input registration for the first type of input (e.g., the first type of input may need to be reset due to inaccurate calibration, making it difficult to navigate the interface control element using the inaccurately calibrated first type of input), using the second type of input to initialize the input registration improves operational efficiency, reduces user frustration, and reduces the number of inputs required to initialize the input registration reset process. Resetting the input registration using the second type of input also helps to reduce the amount of time required to begin the input registration reset process. For example, using the second type of input enables an input registration reset to be initialized without displaying additional controls (e.g., using the first type of input to browse user interface elements).

In some implementations, the first type of input includes a gaze of the user, the first biometric feature includes a location and/or movement of an eye of the user (e.g., input registration for the first type of input includes determining or calibrating an inter-pupillary distance, iris size, and/or angular range of movement of the eye of the user), and the input device via which the first input of the first type of input is detected includes a camera (e.g., an RGB/visible spectrum camera and/or an infrared camera).

Allowing input registration reset for eye/gaze using a second type of input (e.g., pressure input or touch input) on a different input device (e.g., hardware button or solid state button) allows calibration of input reset for different modalities (e.g., gaze) for a first modality (e.g., tactile touch/mechanical actuation, button 7508, as shown and described with reference to fig. 10A-10D). A more reliable input mode (e.g., tactile touch/mechanical actuation on hardware/solid state buttons) that does not require calibration can be used to initialize calibration correction in one modality (gaze/eye tracking), which improves reliability and operational efficiency of the computer system.

In some embodiments, the first type of input includes a user's hand movement, the first biometric feature includes a location and/or movement of one or more portions of the user's hand (e.g., input registration for the first type of input includes determining or calibrating a size of a person's hand, a range of motion of the user's hand or wrist, lengths of different joints in the user's hand, and/or a range of motion of different joints in the hand), and the input device via which the first input of the first type of input is detected includes a camera (e.g., an RGB/visible spectrum camera and/or an infrared camera).

Allowing input registration reset for hand tracking using a second type of input (e.g., pressure/touch) on a different input device (e.g., hardware/solid state button) allows calibration of input (e.g., tactile touch/mechanical actuation) reset for a first modality (e.g., hand tracking, visual hand tracking, infrared hand tracking) for a different modality. Calibration correction in one modality (hand tracking) may be initialized using a more reliable mode (e.g., tactile touch/mechanical actuation on hardware/solid state buttons, such as button 7508 shown and described with reference to fig. 10A-10D) that does not require calibration, which improves reliability and operational efficiency of the computer system.

In some implementations, initiating an input registration process for a first type of input includes presenting instructions (e.g., user interface element 10006 and user interface element 10008, as shown and described with reference to FIG. 10C) to a user for input registration for the first type of input (e.g., instructions for slowly rotating the user's head, rotating the user's head in a particular direction (as shown and described with reference to FIG. 10C), or gazing at a displayed virtual object (e.g., when the virtual object is moved or displayed at a fixed location), moving the user's hand to various positions, and/or performing different hand gestures), and collecting second input registration information for the first type of input based on user actions performed according to the presented instructions. (e.g., such that a new input enrollment experience is presented to the user, as shown and described with reference to fig. 10B-10D).

Collecting the second input registration information after initiating the registration reset allows updating and improving the calibration of the first type of input, thereby improving the operational efficiency of the user-machine interaction based on the first type of input.

In some embodiments, the computer system detects a third input of the first type of input via an input device of the one or more input devices, and in response to detecting the third input via the input device, the computer system performs a second operation in accordance with the third input, wherein the second operation is determined at least in part by second input registration information for the first type of input. In some embodiments, the computer system extracts statistical information from the previous input registration process and the second input registration information, and the computer system calibrates the first type of input using a weighted average of all collected input registration information.

Performing the second operation based at least in part on the second input registration information for the first type of input allows new operations to be performed using the first type of input that is better calibrated, updated, and/or improved, thereby improving the operational efficiency of user-machine interactions based on the first type of input.

In some implementations, the input device includes buttons (e.g., hardware buttons, solid state buttons, rotatable input elements).

Providing dedicated buttons (e.g., solid state buttons, hardware buttons, buttons 7508 as shown and described with reference to fig. 10A-10D) for resetting other types of user inputs (e.g., hand tracking or gaze tracking) allows a user (e.g., when using any software application) to trigger an input registration reset more quickly and responsively. Instead of wasting time closing an application and/or navigating to a particular user interface control element using inaccurately calibrated biometric input, an actual button (e.g., a hardware button or a solid state button) can quickly trigger a user input enrollment reset for a first type of input, rather than relying on inaccurately calibrated input (e.g., biometric input) to trigger a user input enrollment reset.

In some embodiments, the button is further configured to turn the computer system on or off, and the method 1600 includes, when the computer system is not in operation, the computer system 101 detecting a fourth input on the button and, in response to detecting the fourth input on the button, turning the computer system on. Optionally, the method 1600 further comprises, prior to shutting down the computer system, the computer system determining if the previous input is a press and hold input, and shutting down the computer system in accordance with the determination that the previous input is a press and hold input. In addition, in accordance with a determination that the previous input is not a press and hold input, the computer system refrains from shutting down. In some embodiments, a pressing and holding input is provided to a rotatable input element (e.g., rotatable input element 7108, as shown and described with reference to fig. 11A-11F).

In some embodiments, prior to initiating an input registration process for an input of a first type, the computer system determines whether the second input is a first series of press inputs and if the input device is a button (e.g., or a rotatable input element configured to detect both press inputs and rotation inputs, such as rotatable input element 7108 shown and described with reference to fig. 11A-11F), the computer system initiates an input registration process for an input of the first type in accordance with determining that the second input is a first series of press inputs and the input device is a button (e.g., or a rotatable input element), and in accordance with determining that the second input is not a first series of press inputs or the input device is not a button (e.g., is not a rotatable input element), the computer system refrains from initiating an input registration process for an input of the first type. In some embodiments, the first series of press inputs includes four consecutive press inputs.

Using different types of inputs on a single input device to trigger multiple system operations (e.g., to trigger non-application specific operations) reduces the number of different input devices that must be provided to accomplish different tasks (e.g., N input devices may implement M operations, where N < M). Reducing the number of input devices that must be provided reduces physical clutter on the device, freeing up more physical space on the device and helping to prevent accidental input from inadvertent contact. Reducing the number of input devices also reduces the need to provide additional hardware wiring within the device, and instead the processor may be programmed to interpret different inputs from a fewer number of input devices.

In some embodiments, the computer system detects a fifth input on the button when the computer system is in the sleep mode, and in response to detecting the fifth input on the button, the computer system wakes up the computer system from the sleep mode (e.g., when a button 12016 on the computer system (e.g., watch 12010 or HMD 12011) is pressed, the computer system (e.g., watch 12010 or HMD 12011) wakes up from the sleep mode as shown and described with reference to fig. 12D to the standby mode of operation as shown and described with reference to fig. 12E). In some embodiments, prior to waking the computer system from the sleep mode, the computer system determines whether the fifth input is a press input. In accordance with a determination that the fifth input is a press input, the computer system wakes up from the sleep mode, and in accordance with a determination that the fifth input is not a press input, the computer system refrains from waking up from the sleep mode.

Providing multiple system operations (e.g., triggering an application-specific operation) in response to different inputs to a single input device reduces the number of different input devices that must be provided to accomplish different tasks (e.g., N input devices may implement M operations, where N < M). Reducing the number of input devices that must be provided reduces physical clutter on the device, freeing up more physical space on the device and helping to prevent accidental input from inadvertent contact. Reducing the number of input devices also reduces the need to provide additional hardware wiring within the device, and instead the processor may be programmed to interpret different inputs from a fewer number of input devices.

In some embodiments, the computer system detects a sixth input on the button, and in response to detecting the sixth input on the button, the computer system captures media rendered visible via the display generating component. In some implementations, the computer system determines whether the sixth input is a press input before capturing the media rendered visible via the display generating component. In accordance with a determination that the sixth input is a press input, the computer system captures media rendered visible via the display generating component (e.g., media provided by the display generating component through the application user interface 11002 as shown in FIG. 11C when button 7508 is pressed), and in accordance with a determination that the sixth input is not a press input, the computer system refrains from capturing media rendered visible via the display generating component.

In some embodiments, the computer system detects a seventh input on the button in conjunction with detecting an eighth input on the second input device. In some embodiments, the seventh input and the eighth input are concurrent or overlapping inputs. In response to detecting a seventh input to the button in conjunction with an eighth input on the second input device, the computer system performs one or more system operations (e.g., captures a screen shot of the display as shown in FIG. 11E when the button 7508 and hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element) are pressed and released together; all displays generated by the display generation component shown in FIG. 11E cease as the computer system is powered down when the button 7508 and hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element) are pressed and held). In some embodiments, which system operation(s) to perform depends on the duration and style of the seventh input and the eighth input. The simultaneously detected inputs (such as the seventh input and the eighth input) are sometimes referred to as reconciliation inputs. In some embodiments, prior to performing one or more system operations, the computer system determines whether the seventh input is a press input, whether the eighth input is a press input, and whether the second input device is a rotatable input element. In accordance with a determination that the seventh input is a press input, the eighth input is a press input, and the second input device is a rotatable input element, the computer system performs one or more system operations. In accordance with a determination that the seventh input is not a press input, the eighth input is not a press input, or the second input device is not a rotatable input element, the computer system refrains from performing one or more system operations.

The combined use of more than one input device to request or instruct a corresponding system operation (e.g., an application-specific operation) reduces the number of different input devices that must be provided to accomplish different tasks (e.g., N input devices may implement M operations, where N < M). Reducing the number of input devices that must be provided reduces physical clutter on the device, freeing up more physical space on the device and helping to prevent accidental input from inadvertent contact. Reducing the number of input devices also reduces the need to provide additional hardware wiring within the device, and instead, the processor may be programmed to interpret reconciled inputs from a fewer number of input devices.

In some embodiments, one or more system operations are selected from the group consisting of taking a screenshot, restarting the computer system, or resetting the computer system (e.g., as shown and described with reference to Table 2). The use of reconciliation inputs enables system operations (e.g., application-specific operations) such as capturing a screenshot, restarting a computer, and resetting a computer system to be performed without displaying additional controls.

Fig. 17 is a flow chart of a method 17000 for adjusting an immersion level of an augmented reality (XR) experience of a user in a three-dimensional environment, according to some embodiments.

In some embodiments, the method 17000 is performed at a computer system (e.g., computer system 101 in fig. 1) that includes a display generation component (e.g., display generation component 120 in fig. 1A, 3, and 4) (e.g., heads-up display, touch screen, or projector) and one or more cameras (e.g., cameras directed downward toward the user's hand (e.g., color sensors, infrared sensors, and other depth sensing cameras) or cameras directed forward from the user's head).

In some embodiments, the method 17000 is managed by instructions stored in a computer readable storage medium (optionally, a non-transitory computer readable storage medium) and executed by one or more processors of a computer system, such as the one or more processors 202 of the computer system 101 (e.g., the control unit 110 in fig. 1A). Some of the operations in method 17000 are optionally combined and/or the order of some of the operations are optionally changed.

In some embodiments, the method 17000 is performed at a computer system (e.g., computer system 101 in fig. 1) that includes or is in communication with a display generation component (e.g., display generation component 120 or display generation component 7100 in fig. 1A, 3, and 4) (e.g., heads-up display, HMD, display, touch screen, or projector) and one or more input devices (e.g., camera, controller, touch-sensitive surface, joystick, button, glove, watch, motion sensor, orientation sensor, and/or rotatable input mechanism, such as a crown). In some embodiments, the display generation component is or includes a user-oriented display component and provides an XR experience to the user. In some embodiments, the computer system is an integrated device having one or more processors and memory enclosed in the same housing as at least some of the first display generating component, the one or more audio output devices, and the one or more input devices. In some embodiments, the computer system includes a computing component (e.g., a server, a mobile electronic device such as a smart phone or tablet device, a wearable device such as a watch, wristband or earpiece, a desktop computer, or a laptop computer) that includes one or more processors and memory separate from one or more of the display generation component (e.g., a heads-up display, a touch screen, or a stand-alone display), one or more output devices (e.g., an earpiece or external speaker), and one or more input devices. In some embodiments, the display generating component and the one or more audio output devices are integrated and enclosed in the same housing.

In method 17000, a computer system includes or communicates with a display generating component and one or more input devices (e.g., buttons, dials, rotatable input elements, switches, movable components, solid state components, touch sensitive input devices (e.g., devices that detect local sensor inputs such as touches, touch movements, and/or touch intensities) that the computer system uses to trigger corresponding operations and optionally provide haptic feedback, such as haptic feedback corresponding to the detected inputs), cameras, infrared sensors, accelerometers, and/or gyroscopes). The computer system detects (17002) a first input (e.g., a rotational input or a press input) on a rotatable input mechanism (e.g., a bi-directional rotatable input mechanism) of an input device of the one or more input devices. In response to detecting (17004) a first input on the rotatable input mechanism, in accordance with a determination that the first input is a first type of input (e.g., a rotational input), the computer system changes (17006) an immersion level associated with a display of an augmented reality (XR) environment (e.g., a three-dimensional environment) generated by the display generation component (e.g., when the application user interface 110002 is displayed at the first immersion level as shown in fig. 11B, in response to detecting a rotational input on the rotational input element 7108 in the first rotational direction, the display generation component presents a second immersion level as shown in fig. 11C in which a larger view of the user's field of view is presented to the user along with virtual content 11006, and when the application user interface 110002 is displayed at the second immersion level as shown in fig. 11C, in response to detecting a rotational input on the rotational input element 7108 in the second rotational direction, the display generation component presents the first immersion level as shown in fig. 11B in which a smaller view of the user's field of view is presented to the user along with virtual content 11004). For example, immediately prior to detecting the first input, the computer system provides for display of an XR environment, and in response to the first input, the computer system reduces an immersion level of the displayed environment from an initial immersion level in which the computer system displays a Virtual Reality (VR) environment in which a passthrough portion of a physical environment of the computer system is not displayed (e.g., three-dimensional environment 11000 shown in fig. 11C does not have a passthrough portion of a physical environment of the computer system), to a first immersion level in which the display of the XR environment includes both virtual content from the application and the passthrough portion of the physical environment of the computer system (e.g., the immersion level includes a passthrough region showing representations 7004', 7006', 7008', and 7014' of the physical environment 7000 and virtual content such as box 7016, as shown and described with reference to fig. 11A).

The immersion level affects the perceived experience of the user by changing the properties of the mixed reality three-dimensional environment. Changing the immersion level changes the relative salience of the virtual content to content (visual and/or audio) from the physical world. For example, for an audio component, increasing the immersion level includes, for example, increasing noise cancellation, increasing the spatial nature of spatial audio associated with the XR environment (e.g., by moving the audio sources to more points around the user or increasing the number and/or volume of point sources of audio), and/or by increasing the volume of audio associated with the virtual environment. In some embodiments, increasing the immersion level changes the degree to which the mixed reality environment reduces (or eliminates) signals from the physical world (e.g., audio and/or visual transmission of a portion of the physical environment of the computer system) presented to the user. For example, increasing the immersion level includes increasing the proportion of the visual field of view in which the virtual content is displayed, or decreasing the significance of the representation of the real world (e.g., physical environment 7000 as shown and described with reference to fig. 7A) by dimming, fading, or reducing the amount of the representation of the real world presented to the user.

Changing the immersion level may also include changing a visual presentation of the mixed reality environment, including the extent of the field of view and the extent to which the visibility of the external physical environment is reduced. Changing the immersion level may include changing the number or degree of sensory modalities that the user may use to interact with the mixed reality three-dimensional environment (e.g., through the user's voice, gaze, and body movements). Changing the immersion level may also include changing the degree to which the mixed reality environment simulates the fidelity and resolution of the desired environment. Changing the immersion level may also include modifying the point of view of the mixed reality environment to a degree that matches the point of view or perspective of the user, for example, by capturing movement of the user and adjusting the portion of the three-dimensional environment that is within the field of view in time. In some implementations, the change in the immersion level optionally has a magnitude based on the magnitude of the rotation and/or has a direction based on the direction of rotation, e.g., changing the immersion level includes increasing the proportion of the visual field of view displaying the virtual content or decreasing the saliency of the representation of the real world (e.g., by dimming, fading, or decreasing the amount of the representation of the real world displayed). For audio components, changing the immersion level includes, for example, increasing noise cancellation, increasing the spatial nature of spatial audio associated with the virtual environment (e.g., by moving sources to more points around the user or increasing the number and/or volume of point sources of audio), and/or by increasing the volume of audio associated with the virtual environment). In some embodiments, the first input is a press input and the number of presses matches the immersion level (e.g., three presses correspond to a higher immersion level than two presses).

In accordance with a determination that the first input is a second type of input (e.g., a press input), the computer system performs (17008) an operation that is different from changing the immersion level associated with the display of the XR environment. Optionally, the operation may be to display a main menu user interface 7110 as shown and described with reference to fig. 11D, where the virtual content is computer-generated content that is different from the transparent portion of the physical environment, or the operation may be to take a screenshot, power down the device, reboot the device, enter a hardware reset mode, etc., as described with reference to table 2. In some embodiments, optionally, the operation is performed without changing the immersion level. In some embodiments, the first input is a combined input that begins with a first type of input and ends with a second type of input, and/or the first input terminates when physical contact with the rotatable input mechanism ceases.

Using a single input device (e.g., rotatable input element 7108, as shown and described with reference to fig. 11A-11F) that accepts two or more different types of inputs reduces the number of different input devices that must be provided to request or indicate that different functionalities be performed. The use of a rotary input mechanism allows a user to provide a continuous input range, and the bi-directionality of the rotary input mechanism allows the input to be easily and intuitively changed in either direction without having to display additional controls to the user. The same rotary input mechanism is capable of receiving a second type of input (e.g., a press input) that implements a discrete function (e.g., cancel or display a user interface object). Reducing the number of input devices that must be provided reduces physical clutter on the device, freeing up more physical space on the device and helping to prevent accidental input from inadvertent contact. The use of a rotational input mechanism provides direct access to changes in immersion levels and execution of different operations, reducing the amount of time required to achieve a particular result (e.g., the user does not have to navigate a menu or visually displayed control elements to make selections for executing the operations and/or changing the immersion levels), thereby improving the operating efficiency of the computer system. Increasing the immersion level helps remove constraints in the physical environment of the computer system (e.g., realistically simulate a more spacious virtual environment by blocking sensory output inputs from the physical environment (e.g., blocking visual inputs of small/restricted rooms, removing (audio) echoes from small physical spaces) to provide a virtual environment that is more beneficial for a user to interact with an application).

In some implementations, in response to a second input of the first type of input, the computer system changes an immersion level associated with the display of the XR environment generated by the display generation component to a second immersion level in which the display of the XR environment includes different virtual content at the same time (e.g., displays additional virtual content at a second immersion level that is a higher immersion level than the first immersion level, e.g., at the second immersion level, compared to a first immersion level associated with smaller virtual content 11004 (e.g., virtual content extending over a smaller viewing angle) as shown and described with reference to fig. 11B), a larger virtual content 11006 (e.g., virtual content extending over a larger viewing angle) is presented to the user as shown and described with reference to fig. 11C. As another example, virtual content is eliminated at a second immersion level (the second immersion level being a lower immersion level than the first immersion level), or displayed at a different fidelity level than virtual content displayed when the first immersion level is associated with the display of the XR environment (e.g., the same virtual content is displayed at a higher fidelity (e.g., at a clearer contrast, at a higher resolution, and/or more realistically) at a second immersion level (the second immersion level being a lower immersion level than the first immersion level), at a lower fidelity (e.g., at a lower contrast, at a lower resolution, and/or less realistically, more mixed into the display of the XR environment). For example, the second immersion level includes virtual content that is different from the first immersion level, or the second immersion level has virtual content that is different from the transparent portion of the physical environment (e.g., virtual content 11006 as shown and described with reference to fig. 11C has computer-generated content that is different from the transparent portion of the physical environment as compared to virtual content 11004 as shown and described with reference to fig. 11B).

The use of a rotary input mechanism allows the user to provide a continuous input range and observe direct visual changes in the XR environment without having to display additional controls to the user. The use of a rotational input mechanism provides direct access to changes in immersion levels and execution of different operations, reducing the amount of time required to achieve a particular result (e.g., the user does not have to navigate a menu or visually displayed control elements to make a selection for changing immersion levels), thereby improving the operating efficiency of the computer system.

In some implementations, the second type of input includes a press input, the computer system detecting a third input provided to the rotatable input mechanism and responsive to the rotatable input mechanism detecting the third input as a press input, the computer system performing an operation selected from the group consisting of cancelling an active application, cancelling a virtual object displayed via the display generating component, displaying an application manager user interface, enabling an unobstructed mode, and redisplaying a plurality of previously displayed user interface elements in the XR environment (e.g., as shown in FIG. 7B, in response to detecting a user input (e.g., a press input on a hardware input element 7108 (e.g., a button, crown, or rotatable and pressable input element)) in response to displaying an active application such as application user interface 7018), cancelling the application user interface 7018, displaying a system user interface 7180 as shown and described with reference to FIG. 7O in response to detecting a quick succession of two press inputs as shown in FIG. 7B. In some implementations, redisplaying the plurality of previously displayed user interface elements in the XR environment includes presenting the user interface elements at new locations in the XR environment in response to the user re-centering the field of view. In some embodiments, the computer system performs one or more operations described with reference to table 3 in response to detecting the third input.

The same rotary input mechanism can receive a second type of input (e.g., a press input or a sequence of press inputs) that requests and indicates a corresponding discrete/binary type (e.g., open or close) function (e.g., cancel an active application, cancel a virtual object displayed via a display generation component, display an application manager user interface, enable a barrier-free mode, and/or redisplay a plurality of previously displayed user interface elements in an XR environment, as described with reference to table 3).

In some embodiments, changing the immersion level associated with the display of the XR environment (17006) is based on detecting a rotational input to the rotatable input mechanism (e.g., as shown and described with reference to fig. 11A-11F).

The use of a rotary input mechanism allows a user to provide a continuous or semi-continuous input range (e.g., five (or eight or ten) or more different input values or levels), and the bi-directionality of the rotary input mechanism allows easy and intuitive change of input in either direction without having to display additional controls to the user.

In some embodiments, changing the immersion level associated with the display of the XR environment (17006) based on detecting the rotational input includes, in accordance with a determination that the first input is a rotational input in a first direction, the computer system increasing the immersion level (e.g., in response to detecting a clockwise rotational input on the rotational input element 7108 when the application user interface 110002 is displayed at the first immersion level as shown in FIG. 11B, the display-generating component presents a second immersion level as shown in FIG. 11C in which a larger view of the user's field of view is presented to the user with virtual content 11006), and in accordance with a determination that the first input is a rotational input in a second direction different (e.g., opposite) the first direction), the computer system decreasing the immersion level (in response to detecting a counterclockwise rotational input on the rotational input element 7108 when the application user interface 110002 is displayed at the second immersion level as shown in FIG. 11C, the display-generating component presents a first immersion level as shown in FIG. 11B in which a smaller view of the user's field of view is presented to the user with virtual content 11004). In some embodiments, the first direction is clockwise and the second direction is counter-clockwise (or vice versa), the clockwise rotation input increases the immersion level and the counter-clockwise rotation input decreases the immersion level (or vice versa).

The use of a rotary input mechanism allows a user to provide a continuous input range, and the bi-directionality of the rotary input mechanism allows the input to be easily and intuitively changed in either direction without having to display additional controls to the user.

In some embodiments, the first type of input comprises a rotational input of a rotatable input mechanism, and the second type of input comprises a pressing input of the rotatable input mechanism.

Using a single input device (e.g., rotatable input element 7108, as shown and described with reference to fig. 11A-11F) that accepts two (or more) different types of inputs reduces the number of different input devices that must be provided to achieve different functionalities. Reducing the number of input devices that must be provided reduces physical clutter on the device, freeing up more physical space on the device and helping to prevent accidental input from inadvertent contact.

In some embodiments, in response to detecting the first input, in accordance with a determination that the first input is a second type of input and includes a first number of press inputs, the computer system performs a first operation, and in accordance with a determination that the first input is a second type of input and includes a second number of press inputs different from the first number, the computer system performs a second operation different from the first operation. In some embodiments, the computer system is configured to perform different operations based on the number of detected press inputs, as described with reference to table 3. For example, for a single press input, (1) a main menu user interface is displayed (as shown and described with reference to fig. 7A and 7B), (2) a passthrough portion of a physical environment is provided (as shown and described with reference to fig. 7J-7K), or (3) an application exits full screen or immersive display mode (as shown and described with reference to fig. 8A, 8B, 8F, and 8G). For example, in response to a two-press input, a forced exit menu is displayed, as shown and described with reference to fig. 7O. For a three press input, the three press input switches between an unobstructed mode being active and an unobstructed mode being inactive, or an option for enabling or disabling the unobstructed mode is displayed.

Using the number of press inputs to affect the operation reduces the number of different input devices that must be provided to accomplish different tasks (e.g., N input devices may implement M operations, where N < M). Reducing the number of input devices that must be provided reduces physical clutter on the device, freeing up more physical space on the device and helping to prevent accidental input from inadvertent contact. Reducing the number of input devices also reduces the need to provide additional hardware wiring within the device, and alternatively, the processor may be programmed to interpret more types of input from a particular input device (e.g., based on the number of press inputs).

In some embodiments, the computer system detects a first number of press inputs directed to the rotatable input mechanism, and in response to detecting the first number of press inputs directed to the rotatable input mechanism, the computer system cancels the application user interface 7018 and presents a main menu user interface 7110 by causing the active application to run in the background and/or display the main menu user interface via the display generating component (e.g., as shown in FIG. 7B, when the application user interface 7018 is displayed, in response to detecting a press input on a user input (e.g., a hardware input element 7108 (e.g., button, crown, or rotatable and pressable input element)), and presents the main menu user interface 7110 as shown and described with reference to FIG. 7C.

In some embodiments, the computer system detects a second number of press inputs directed to the rotatable input mechanism, and in response to detecting the second number of press inputs directed to the rotatable input mechanism, the computer system displays an application manager user interface. In some embodiments, the second number of press inputs is optionally different from the first number of press inputs. In some embodiments, the application manager user interface includes a system interface 7180, i.e., a forced exit menu from which a user can close a currently running application, as shown and described with reference to fig. 7O.

In some embodiments, the computer system detects a third number of press inputs directed to the rotatable input mechanism, and in response to detecting the third number of press inputs directed to the rotatable input mechanism, the computer system performs or enables barrier-free mode operation. In some embodiments, the first number of inputs, the second number of inputs, and/or the third number of press inputs are optionally different from one another, as shown and described with reference to table 2. In some implementations, the three press input switches between the barrier-free mode being active and the barrier-free mode being inactive, or displays an option for enabling or disabling the barrier-free mode.

In some embodiments, the computer system detects a fourth number of press inputs directed to the rotatable input mechanism, and in response to detecting the fourth number of press inputs directed to the rotatable input mechanism, the computer system cancels the virtual object by displaying a corresponding passthrough portion of the physical environment of the computer system (e.g., in response to detecting a press input to 7108, the computer system cancels the blank virtual background shown in FIG. 7J and presents a passthrough portion including representation 7014', as shown and described with reference to FIG. 7K, and in response to detecting a press input to 7108, the computer system cancels the blank virtual background shown in main menu user interface 7110 depicted in FIG. 9B and presents a passthrough portion including representation 7014', as shown and described with reference to FIG. 9C). In some embodiments, the first number of inputs, the second number of inputs, the third number of press inputs, and/or the fourth number of press inputs are optionally different from one another. In some embodiments, in response to a pressing input to the rotatable input element, all virtual objects are cancelled and the computer system is transitioned to (or maintained at) the minimum level of immersion (e.g., no immersion). In some implementations, in response to a pressing input to the rotatable input element, the immersion level is reduced (e.g., more aspects of the physical environment are presented to the user 7002), but virtual content from applications running in the foreground is still displayed (e.g., rendering a pass-through portion including representation 7014', but box 7016 is still displayed, as shown and described with reference to fig. 7K and 9C).

In some embodiments, in response to detecting the first input, in accordance with a determination that the first input is a second type of input and has a duration that satisfies a first criterion, the computer system performs a first operation, and in accordance with a determination that the first input is a second type of input and has a duration that satisfies a second criterion different from the first criterion (e.g., the duration that satisfies the second criterion is different from the duration that satisfies the first criterion), the computer system performs a second operation different from the first operation. In some embodiments, the computer system is configured to perform different operations for press inputs of different durations. In some implementations, pressing and holding a rotatable input element (e.g., hardware input element 7108 (e.g., button, crown, or a rotatable and depressible input element)) causes the display to be re-centered (e.g., fade-out and fade-in), while a tap or short single-press input causes (1) the main menu user interface to be displayed (e.g., as shown and described with reference to fig. 7C), (2) the pass-through portion of the physical environment to be provided (e.g., as shown and described with reference to fig. 7K and 9C), or (3) the application (e.g., application user interface 8000) to exit full-screen mode, as shown and described with reference to fig. 8A, 8B, 8F, and 8G.

Using input durations to affect system operation (e.g., operation that is not specific to a particular application) reduces the number of different input devices that must be provided to accomplish different tasks (e.g., N input devices may implement M operations, where N < M). Reducing the number of input devices that must be provided reduces physical clutter on the device, freeing up more physical space on the device and helping to prevent accidental input from inadvertent contact. Reducing the number of input devices also reduces the need to provide additional hardware wiring within the device, and instead, the processor may be programmed to interpret more types of input (e.g., short presses, long presses, and holds) from a particular input device.

In some embodiments, in accordance with a determination that the first input is a second type of input (e.g., a press input), the computer system displays a main menu user interface in the XR environment (e.g., main menu user interface 7110 appears in the foreground closer to the user than other objects or features of the XR environment, as shown and described with reference to fig. 9B, 11D, and 11F).

The ability to navigate the main menu user interface while the user is in an XR environment (e.g., by accessing a collection of applications on the main menu user interface or a collection of contacts that are capable of interacting with the user) reduces the amount of time required to complete the user's desired operation, regardless of the current display mode (e.g., VR or AR). Navigation of the main menu user interface is not limited to a particular mode and does not require additional controls to be displayed to the user to access the main menu user interface.

In some embodiments, the computer system detects a fourth input of the second type of input in connection with detecting a fifth input on the second input device. In some embodiments, the fourth input and the fifth input are concurrent or overlapping inputs. In some embodiments, the second input device is a hardware button or a solid state button (e.g., the second input device is button 7508 and the first input device is rotatable input element 7108, as shown and described with reference to fig. 10A-10D). In some implementations, the second input device is a camera. In response to detecting a fourth input of the second type of input in conjunction with a fifth input on the second input device, the computer system performs one or more third operations (e.g., capturing a screen shot of the display as shown in FIG. 11E when the button 7508 and hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element) are pressed and released together; all displays generated by the display generating component shown in FIG. 11E cease as the computer system is powered down when the button 7508 and hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element) are pressed and held). In some embodiments, which operation(s) is/are performed as the third operation(s) depends on the duration and style of the fourth input and the fifth input. In some embodiments, the computer system optionally performs one or more third operations, without performing operations that are performed when the rotatable input element is pressed by itself. The simultaneously detected inputs (such as the fourth input and the fifth input) are sometimes referred to as reconciliation inputs.

The combined use of more than one input device to request and instruct the corresponding system operation (e.g., an application-specific operation) reduces the number of different input devices that must be provided to accomplish different tasks (e.g., N input devices may request and instruct M operations, where N < M). Reducing the number of input devices that must be provided reduces physical clutter on the device, freeing up more physical space on the device and helping to prevent accidental input from inadvertent contact. Reducing the number of input devices also reduces the need to provide additional hardware wiring within the device, and instead, the processor may be programmed to interpret reconciled inputs from a fewer number of input devices.

In some implementations, the third operation is an element selected from the group consisting of taking a screenshot, powering down the computer system, restarting the computer system, and entering a hardware reset mode of the computer system (e.g., as shown and described with reference to Table 2). In some implementations, the third operation is to take a screenshot when the button and rotatable input element are pressed and released together (e.g., when the button 7508 and hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element) are pressed and released together, a screenshot of the display as shown in fig. 11E is captured). In some embodiments, the third operation is to power down the computer when the button and rotatable input element are pressed and held (e.g., when the button 7508 and hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element) are pressed and held), all displays generated by the display generation component shown in fig. 11E cease as the computer system is powered down. In some embodiments, the third operation is to force a reboot of the computer system when the button and the rotatable input element are pressed and held. In some embodiments, when the button and rotatable input element are pressed and initially held together, the third operation enters a hardware reset mode, and then the button is released (e.g., the fifth input, which is a press input on the button, ends) while the fourth input, which is a press input on the rotatable input element, continues.

The use of reconciliation inputs enables system operations (e.g., application-specific operations) such as capturing screenshots, powering off, restarting the computer, and resetting the computer system to be performed without displaying additional controls.

FIG. 18 is a flow chart of a method 18000 for controlling a computer system based on physical positioning of the computer system relative to a user and changes in physical positioning and a state of the computer system, according to some embodiments.

In some embodiments, method 18000 is performed at a computer system (e.g., computer system 101 in fig. 1) that includes a display generation component (e.g., display generation component 120 in fig. 1A, 3, and 4) (e.g., heads-up display, touch screen, or projector) and one or more cameras (e.g., cameras directed downward toward the user's hand (e.g., color sensors, infrared sensors, and other depth sensing cameras) or cameras directed forward from the user's head). In some embodiments, the method 18000 is managed by instructions stored in a (optionally non-transitory) computer-readable storage medium and executed by one or more processors of a computing system (such as the one or more processors 202 of the computing system 101) (e.g., the control unit 110 in fig. 1A). Some of the operations in method 18000 are optionally combined and/or the order of some of the operations are optionally changed.

In some embodiments, method 18000 is performed at a computer system (e.g., computer system 101 in fig. 1) in communication with a display generation component (e.g., display generation component 120 or display generation component 7100 in fig. 1A, 3, and 4) (e.g., heads-up display, HMD, display, touch-screen, or projector) and one or more input devices (e.g., camera, controller, touch-sensitive surface, joystick, button, glove, watch, motion sensor, orientation sensor, and/or rotatable input mechanism, such as a crown). In some embodiments, the display generation component is a user-oriented display component and provides an XR experience to the user. In some embodiments, the computer system is an integrated device having one or more processors and memory enclosed in the same housing as at least some of the first display generating component, the one or more audio output devices, and the one or more input devices. In some embodiments, the computer system includes a computing component (e.g., a server, a mobile electronic device such as a smart phone or tablet device, a wearable device such as a watch, wristband or earpiece, a desktop computer, or a laptop computer) that includes one or more processors and memory separate from one or more of the display generation component (e.g., a heads-up display, a touch screen, or a stand-alone display), one or more output devices (e.g., an earpiece or external speaker), and one or more input devices. In some embodiments, the display generating component and the one or more audio output devices are integrated and enclosed in the same housing.

In method 18000, the computer system is a wearable device (e.g., a watch, a pair of headphones, a headset, and/or a strapping device) that includes or communicates with a display generating component and one or more input devices (e.g., buttons, dials, rotatable input elements, switches, movable or solid state components, devices that detect local sensor input such as intensity or force sensor input (and the computer system uses the input to trigger a corresponding operation and optionally provide haptic feedback, such as haptic feedback corresponding to the detected input), a biometric sensor, a pulse sensor, a thermal sensor, a camera, and/or an inertial measurement sensor). When the respective session (e.g., media consumption session, recording session, and/or content sharing session) is active (e.g., running and/or displaying in the foreground of the user interface) in the respective application (e.g., media application, conferencing application, telephony application, gaming application, web content browsing application, or other local application or third party application) and when the wearable device is being worn (e.g., when the wearable device is in a state corresponding to the wearable device being worn), the wearable device detects (18002) a first signal indicating that the wearable device has been removed (e.g., indicating that the wearable device is in the process of being removed; the computer system (e.g., watch 12010 or HMD 12011) is removed from the wrist of the user, as shown and described with reference to fig. 12C; in addition to the computer system (e.g., watch 12010 or HMD 12011) shown in fig. 12B through 12G, in some embodiments the wearable device is a head-mounted HMD display device (HMD). When the HMD is no longer covering the eyes of the user, the HMD is unable to keep track of the eyes of the user, the eyes of the user can then be located when the HMD is no more located at the eyes of the user.

In some embodiments, the first signal is a signal provided by a biometric sensor, e.g., the biometric sensor may include a camera and an image processing component. When the image processing component is unable to locate the presence of any eyes in the image captured by the camera, the biometric sensor outputs a signal indicating that the wearable device has not been placed in front of the eyes of the user. In some embodiments, the biometric sensor is a pulse sensor (e.g., for detecting a pulse of a user) that returns a signal as an output to indicate that a pulse has not been detected. In some implementations, the first signal is a signal provided by an inertial measurement device (e.g., an accelerometer or gyroscope) when the inertial measurement device determines that it is oriented in a manner incompatible with the wearable device being worn (e.g., the wearable device is positioned upside down, the wearable device is lying on its side, the camera in the wearable device is pointing to the sky or the ground due to the orientation of the wearable device). In some embodiments, the first signal is a signal provided by a thermal sensor. For example, the thermal sensor detects when it has been removed from the body temperature source of the wearer. In some embodiments, signals from multiple biometric sensors are analyzed together to determine whether the wearable device is being worn, has been removed, or is in the process of being removed. For example, when a user places the wearable device on her forehead (e.g., as a result of the user moving the wearable device from a position over the user's eyes to a position on the user's forehead), the camera of the wearable device does not detect the presence of any eyes, but the thermal sensor of the wearable device detects the body temperature and the inertial measurement device of the wearable device detects the "upright" position of the wearable device, and the wearable device determines that the wearable device is being worn, but may be in the process of being removed, based on those signals.

In response to detecting (18004) the first signal, the wearable device causes (18006) the respective session of the respective application to become inactive (e.g., when the computer system (e.g., watch 12010 or HMD 12011) is removed from the wrist of the user, the computer system (e.g., watch 12010 or HMD 12011) pauses the song, as shown and described with reference to fig. 12C), and when the respective application is inactive (e.g., when the respective session of the respective application is inactive, when the respective application is running in the background, pauses, does not receive any user input, and/or does not provide any output to the user), the wearable device detects (18008) a second signal indicating that the wearable device is being worn. In some embodiments, the second signal is a signal provided by a biometric sensor. For example, the biometric sensor may include a camera and an image processing component. When the image processing component is able to locate the presence of any eyes in the image captured by the camera, the biometric sensor outputs a signal indicating that the wearable device has now been placed in front of the eyes of the user. In some embodiments, the biometric sensor may be a pulse sensor that returns a signal as an output to indicate that a pulse has been detected. In some implementations, the second signal is a signal provided by an inertial measurement device (e.g., an accelerometer or gyroscope) when the inertial measurement device determines that it is oriented in a manner compatible with the wearable device being worn (e.g., the wearable device is not positioned upside down, the wearable device is not on its side, and a camera in the wearable device is not directed to the sky or the ground). In some embodiments, the second signal is a signal provided by a thermal sensor. For example, the thermal sensor detects a source of body temperature of the wearer. In some embodiments, signals from multiple biometric sensors are analyzed together to determine whether the wearable device is being worn. For example, when a user places the wearable device on her forehead, the camera will not detect the presence of any eyes, but the thermal sensor will still detect body temperature, and the inertial measurement device will detect an "upright" positioning of the wearable device.

In response (18010) to detecting the second signal, in accordance with a determination that the respective criterion is met, the wearable device resumes (18012) the respective session of the respective application, and in accordance with a determination that the respective criterion is not met, the wearable device relinquishes (18014) the respective session of the respective application, wherein the respective criterion includes a criterion that is met when a current user of the wearable device is determined to be an authorized user of the wearable device. In some implementations, the wearable device determines the user as an authorized user based on automatic biometric verification, based on password entry, or based on the sharing mode being active (e.g., using biometric features 12026 as shown and described with reference to fig. 12F and 12G). When the wearable device corresponds to a head-mounted display device, instead of biometric features such as 12026, the head-mounted device relies on gaze location, iris pattern, facial expression, eye color, and/or shape to authenticate whether the user wearing the head-mounted device is an authorized user.

Using the respective criteria to determine whether to automatically resume the respective session of the respective application enables the respective session to be resumed without any active user input and without displaying additional controls. Using the respective criteria causes the device to automatically resume the respective session when the respective criteria are met, thereby providing a more efficient human-machine interface for the wearable device, which provides a more efficient way for the user to control the wearable device while minimizing interference or requiring the user to browse additional control elements before the respective session can be resumed. Determining whether the current user of the wearable device is an authorized user of the wearable device provides improved security and/or privacy protection by ensuring that the respective session of the respective application is restored only when the authorized user is detected.

In some implementations, the respective criteria include a type of the respective session that satisfies a predefined criteria (e.g., belonging to or not belonging to a predefined set of session types) relative to the predefined set of session types. In some implementations, in accordance with a determination that the respective criteria is met because the respective session of the respective application is a first type of session (e.g., media consumption or real-time communication), the wearable device resumes the respective session of the respective application, and in accordance with a determination that the respective criteria is not met because the respective session of the respective application is a second type of session (e.g., recording session), the wearable device forgoes resuming the respective session of the respective application.

Using the characteristics of the respective sessions of the respective applications to determine whether to resume the respective sessions provides improved security/privacy by ensuring that certain types of sessions (e.g., recording sessions) with more security/privacy protection do not automatically restart after the wearable device has been removed from the user's body, even when an authorized user is detected.

In some implementations, the respective criteria are met when the respective session of the respective application is configured to deliver media content to an authorized user of the wearable device. In some embodiments, when the respective session of the respective application is configured to allow real-time audio data or real-time video data of the participant to be generated by the participant of the respective session and the respective session is configured to provide information regarding the positioning of the participant in the three-dimensional environment, the respective criteria are met (although not shown in fig. 12G, if the user 7002 removes the computer system (e.g., watch 12010 or HMD 12011) on her wrist and then places the computer system (e.g., watch 12010 or HMD 12011) back on her wrist, the user interface 12024 will allow the user 7002 to resume the telephone call with Abe, as shown in fig. 12G). In some implementations, the respective criteria include session criteria (although not shown in fig. 12C, if the user 7002 places the computer system (e.g., watch 12010 or HMD 12011) on her wrist (after removing the computer system (e.g., watch 12010 or HMD 12011), as shown in fig. 12C), the user interface 12018 will resume delivery of the media content, as shown in fig. 12B) that are met if the session is a playback session of the media application or a non-recording session of the real-time communication application. In some embodiments, the wearable device restoring the respective session of the respective application includes restoring a content sharing session, wherein content of the respective application is concurrently visible to a plurality of participants in the content sharing session.

In some implementations, when the respective application includes a record of content (e.g., audio data and/or video data) generated during the respective session, the respective criterion is not satisfied and the wearable device forgoes resuming the respective session of the respective application. In some embodiments, if the application is a recording session in the application, the session criteria are not met).

Not automatically restoring the recording session increases security/privacy by ensuring that after the wearable device has been removed from the body of the authorized user, even when the authorized user is detected, additional user input (e.g., permissions from other participants, browsing additional control elements) is required before the recording session is restored.

In some embodiments, the respective criteria are met when the time between detection of the first signal and detection of the second signal is less than a predetermined threshold (e.g., less than a timeout period between a second (e.g., low power, standby, or hibernate) state and a fourth (authenticated) state), in which case the wearable device resumes the respective session of the respective application, and the respective criteria are not met when the time between detection of the first signal and detection of the second signal is equal to or greater than a predetermined threshold (greater than a timeout period between the first (e.g., sleep, or low power) or second (e.g., low power, standby, or hibernate) state and the fourth (authenticated) state), in which case the wearable device discards resuming the respective session of the respective application (e.g., ends the respective session of the respective application, although not shown in fig. 12C), if the user 7002 places the computer system (e.g., the watch 12010 or HMD 11) on her wrist (after removal of the computer system (e.g., 12010 or HMD 11) has passed the predetermined threshold), resumes the delivery of the user media (e.g., 12018) on the watch 12018 after the user interface 12018 has passed the predetermined threshold has passed, as shown.

Not automatically resuming the respective session of the respective application after the predetermined time threshold helps to save battery power of the wearable device.

In some implementations, the wearable device causes the respective session of the respective application to become inactive by pausing playback of media content (e.g., pausing video, audio, and/or other media playback, as indicated by the application user interface 12018 in fig. 12C) from the respective session of the respective application.

Automatically pausing media playback helps ensure that an authorized user obtains an uninterrupted media experience once the wearable device is again donned. The authorized user need not actively or manually pause the media consumption session when the wearable device is removed. The authorized user also does not need to actively or manually resume the media consumption session after the wearable device is again donned. The authorized user does not need to rewind the media back to an earlier point in time as is the case when playback of the media is not paused.

In some implementations, the wearable device causes the respective session of the respective application to become inactive by at least one of muting audio data associated with the respective session of the respective application (e.g., the application user interface 12018 shown and described with reference to fig. 12C, the application user interface 12024 shown and described with reference to fig. 12F), or suspending video recording of content (e.g., video, audio, and/or other multimedia data) generated in the respective session of the respective application. In some embodiments, the real-time communication session of the application provides information about the positioning (e.g., real-time positioning) of participants (e.g., in the form of avatars) in the three-dimensional environment.

Automatically muting audio (e.g., application user interface 12018 as shown and described with reference to fig. 12C) and stopping video recording eliminates the need for an authorized user to actively/manually mute audio and stop video recording when the wearable device is removed. Such automatic audio muting and stopping of video recording also improves security/privacy by ensuring that audio is not played and video is not recorded in the absence of an authorized user.

In some embodiments, the wearable device copies the display of the user interface 12018 on a display component (e.g., a separate display monitor) separate from the computer system (e.g., watch 12010 or HMD 12011) by pausing the mirror (e.g., screen mirror, although not shown in fig. 12B) of the output from the display generating component of the computer on a different device, and when the respective session of the media play application becomes inactive, the copy of the display of the user interface 12018 on the display monitor pauses (e.g., the display of the user interface 12018 becomes obscured and does not update on the display monitor) to cause the respective session of the respective application to become inactive.

Automatically pausing the mirroring of the output from the display generation component eliminates the need for an authorized user to actively/manually pause the mirroring. Such automatic suspension of the mirroring of the output also improves security/privacy by ensuring that data from the wearable device is not shared with others in the absence of an authorized user.

In some embodiments, in conjunction with pausing the mirroring of the output from the display generating component of the computer on a different device, the wearable device displays an indication of the pausing of the mirroring of the output from the display generating component via the display generating component (e.g., an indication of "screen mirror pause" is displayed).

Providing an indication that the image of the output from the display generation component is paused automatically conveys the disturbance to other participants without requiring activity input from an authorized user. This indication helps to minimize confusion and reduces the chance that other participants will misinterpret pauses in the image as requiring fault diagnosis.

In some embodiments, after the first signal has been detected (e.g., when the wearable device is away from the head or when the wearable device is away from the body), the wearable device monitors the context of the wearable device (e.g., position, orientation, operating state, presence of trackable eyes, presence of trackable hands, or trackable portion of hands) using one or more sensors (e.g., cameras, photodiodes, inertial measurement devices, accelerometers, gyroscopes, and/or GPS systems) including, for example, the computer system (e.g., watch 12010 or HMD 12011) as shown and described with reference to fig. 12E, using pulse sensors while in an operational standby state to monitor the context of the watch (e.g., whether the user has worn the computer system (e.g., watch 12010 or HMD 12011) on the user's body).

As described above with reference to fig. 12E, providing one or more intermediate (e.g., standby) states when the wearable device leaves the body of the authorized user allows the wearable device to more quickly warm up and ready for a delivery experience once the wearable device is on the body of the authorized user (e.g., a media experience provided by the application user interface 12018 as shown and described with reference to fig. 12B, or a communication session provided by the application user interface 12024 as shown and described with reference to fig. 12G). In the intermediate state, the wearable device senses its surroundings and is better ready to transition to (e.g., faster transition) the active carry-on state when an authorized user interacts with it, making the authentication more time-efficient for the authorized user.

In some implementations, the wearable device uses one or more sensors (e.g., cameras, photodiodes, inertial measurement devices, accelerometers, gyroscopes, or GPS) to detect characteristics of the physical environment of the wearable device (e.g., presence of a user, presence of an authorized user, location of the wearable device within the physical environment, orientation of the wearable device within the physical environment, and/or brightness of the physical environment) to monitor the context (e.g., location, orientation, or operational state) of the wearable device.

Providing one or more intermediate (e.g., standby) states in which the wearable device senses its physical environment allows the wearable device to more quickly warm up and ready for a delivery experience (e.g., media experience, communication session) once the wearable device is on the body of an authorized user. The wearable device senses its surroundings and is better ready to transition to (e.g., faster transition) active carry-on state when an authorized user interacts with it, making the experience more time-efficient for the authorized user.

In some embodiments, the wearable device uses one or more sensors (e.g., a camera, infrared sensor, and/or pulse sensor) to detect the presence of a biometric feature (e.g., the presence of an eye or a portion of a hand or arm within the field of view of the camera, the pulse of a user, biometric feature 12026 as shown and described with reference to fig. 12F and 12G) to monitor the context of the wearable device. When the wearable device corresponds to a head-mounted display device, gaze location, iris pattern, facial expression, eye color, and/or shape are related to authenticating whether a user wearing the head-mounted display device matches an authorized user. In some embodiments, the wearable device determines whether the wearable device is in proximity to any user.

Providing one or more intermediate (e.g., standby) states in which the wearable device senses biometric input allows the wearable device to more quickly warm up and be ready to interact with (e.g., authenticate) or receive hand gesture input from an authorized user once the wearable device is on the body of the authorized user. The wearable device senses biometric input and is better prepared to transition to the (faster transition) active carry-on state when an authorized user interacts with it, thereby making the experience more time-efficient for the authorized user.

In some embodiments, in accordance with a determination that a threshold amount of time (e.g., a predetermined timeout period) has elapsed since the first signal was detected and the second signal was not detected, the wearable device transitions to an operational sleep state (e.g., although not shown in fig. 12C, after the predetermined timeout period has elapsed, a screen on a computer system (e.g., watch 12010 or HMD 12011) as shown and described with reference to fig. 12C will be closed and the computer system (e.g., watch 12010 or HMD 12011) enters a sleep state in which the sensor(s) no longer detect a device context), wherein the wearable device reduces the frequency with which the sensor(s) are used to monitor the context of the wearable device (this optionally includes ceasing to use the sensor(s) to monitor the context of the wearable device). In some embodiments, the sleep state is a state in which one or more sensors no longer detect the context of the wearable device. In some embodiments, the sensor is a camera on or in the wearable device, the sensor is an inertial measurement device in the wearable device, and/or the sensor is a device separate from but in communication with the wearable device (e.g., an external beacon that transmits a signal to a detector on the wearable device, or an external beacon that detects a signal transmitted by the wearable device). In some embodiments, the context of the wearable device is a physical orientation of the wearable device, a location of the wearable device, a presence state indicating whether a presence of an eye or hand is detected as determined from an image captured by a camera of the wearable device, and/or a presence state indicating whether a presence of movement of a living being is detected by an infrared thermal sensor.

Entering a sleep state after a timeout period helps to save battery power of the wearable device and reduces the amount of charge required to operate the wearable device.

In some embodiments, when the wearable device is in a sleep state, the wearable device detects an upward displacement of at least a portion of the wearable device (e.g., a lifting of the entire wearable device, a lifting of a portion of the wearable device, or a displacement that results in a change in a height of at least a portion of the wearable device, a lifting of a computer system (e.g., a watch 12010 or an HMD 12011), as shown and described with reference to fig. 12E), and in response to detecting an upward displacement of at least the portion of the wearable device, the wearable device transitions from an active state to a standby state of operation (e.g., the computer system (e.g., watch 12010 or HMD 12011) transitions to a standby state of operation, as shown and described with reference to fig. 12E). In some implementations, the wearable device monitors the context of the wearable device while in a standby state of operation. The standby state is a lower power state than the carry-on state of the wearable device. When the respective session of the respective application becomes inactive, for example, the wearable device enters a standby state as it is removed from a portion of the user's body. Entering a lower power standby state helps to conserve battery power as compared to the carry-on state. Continuing device context monitoring while the wearable device is in a standby state (e.g., such that the wearable device is warmed up), rather than stopping device context monitoring (as done when the wearable device is in a sleep state) allows the wearable device to provide output more quickly once the wearable device is in a carry-on state.

Transitioning from the sleep state to the standby state while the wearable device is still away from the body of the authorized user (but after the user lifts the wearable device, as shown and described with reference to fig. 12A and 12E) allows the wearable device to warm up more quickly and be ready to interact with the authorized user once it is on the body of the authorized user, thereby making the experience more time-efficient for the authorized user.

In some embodiments, when the wearable device is in a sleep state, the wearable device detects a first input (e.g., a press input to a button 12014 or a button 12016) to one or more input devices (e.g., a hardware button, a solid state button, a crown, a camera, and/or a thermal sensor), and in response to detecting the first input, transitions the wearable device from the sleep state to a standby state of operation (e.g., a computer system (e.g., a watch 12010 or an HMD 12011) to the standby state of operation) includes monitoring a context of the wearable device in response to detecting the press input to the button 12014 or the button 12016, as shown and described with reference to fig. 12E.

Transitioning from the sleep state to the standby state while the wearable device is still away from the body of the authorized user (but after providing the first input to the wearable device in the sleep state) allows the wearable device to more quickly warm up and be ready to interact with the authorized user once the wearable device is on the body of the authorized user, thereby making the experience more time-efficient for the authorized user.

It should be understood that the particular order in which the operations in fig. 18 are described is merely an example and is not intended to suggest that the order is the only order in which the operations may be performed. Those of ordinary skill in the art will recognize a variety of ways to reorder the operations described herein. Additionally, it should be noted that the details of other processes described herein with respect to other methods described herein (e.g., methods 13000, 14000, 15000, 16000, 17000, 18000, and 20000) are likewise applicable in a similar manner to method 18000 described above with respect to fig. 18. For example, the gestures, gaze inputs, physical objects, user interface objects, controls, movements, criteria, three-dimensional environments, display generation components, surfaces, representations of physical objects, virtual objects, audio output patterns, reference frames, viewpoints, physical environments, representations of physical environments, views of three-dimensional environments, immersion levels, visual effects, and/or animations described above with reference to method 18000 optionally have one or more of the gestures, gaze inputs, physical objects, user interface objects, controls, movements, criteria, three-dimensional environments, display generation components, surfaces, representations of physical objects, virtual objects, audio output patterns, reference frames, viewpoints, physical environments, representations of physical environments, views of three-dimensional environments, immersion levels, visual effects, and/or characteristics of animations described with reference to other methods described herein (e.g., methods 13000, 14000, 15000, 16000, 17000, 18000, and 20000). For the sake of brevity, these details are not repeated here.

Fig. 19A-19P illustrate example techniques for navigating an unobstructed menu during system configuration. Fig. 20 is a flowchart of an exemplary method 2000 for navigating an unobstructed menu during system configuration, according to some embodiments. The user interfaces in fig. 19A to 19P are used to illustrate the processes described below, including the process in fig. 20.

Fig. 19A illustrates an example physical environment 7000 including a user 7002 interacting with the computer system 101. As shown in the example of fig. 19A, computer system 101 is placed on a piece of luggage. In some embodiments, computer system 101 is a handheld computer system, a head-mounted computer system, or other computer system, as described in more detail above with reference to fig. 7A. Physical environment 7000 includes physical walls 7004 and 7006, floor 7008, and physical object 7014, as described in more detail above with reference to fig. 7A.

Fig. 19A illustrates a view of a three-dimensional environment 7000' visible to a user (such as user 7002 in fig. 7A) via a display generation component of a computer system (such as display generation component 7100 of computer system 101), as described in further detail with reference to fig. 7A. The view of three-dimensional environment 7000 'of fig. 19A (also referred to as view 7000' for ease of reference) includes a representation (or optical view) of a portion of physical environment 7000 including physical walls 7004 and 7006, floor 7008, and physical object 7014, as described herein with reference to fig. 7A. For example, view 20000' includes a representation of an object in physical environment 7000 (e.g., digital passthrough as captured by one or more cameras of computer system 101) or an optical view of an object in a physical environment (e.g., optical passthrough as seen through one or more transparent or translucent portions of display generation component 7100). For example, in fig. 20A, view 7000' includes representations (or optical views) of wall 7004', wall 7006', floor 7008', and box 7014', as described herein with reference to fig. 7A-7B.

In some embodiments, during initial configuration of a computer system (e.g., computer system 101), input on a hardware input device (e.g., button 7508 or hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element)) is detected, which causes computer system 101 to display an unobstructed configuration menu, such as unobstructed configuration menu 1900. The unobstructed configuration menu is browsed and options in the unobstructed menu are selected, wherein various inputs (e.g., press inputs and/or rotate inputs) are detected on one or more hardware input devices (e.g., buttons 7508 or hardware input elements 7108 (e.g., buttons, crowns, or rotatable and depressible input elements)), as described in further detail with reference to fig. 19B-19P. Even if the computer system is enabled to be controlled with an air gesture (e.g., optionally in combination with gaze input), during an initial configuration of the computer system, the computer system is enabled to be controlled with alternative inputs (e.g., press inputs and/or rotation inputs on one or more hardware input devices) that are different from the air gesture, as described in further detail with reference to fig. 19B-19P. Further, upon browsing the barrier-free configuration menu, verbal descriptions of options and controls available in the barrier-free configuration menu are provided, as described in further detail with reference to fig. 19B-19P. During this initial configuration of the computer system, the user selects a preferred modality for interaction with the computer system with little assistance from others.

Fig. 19B illustrates a view 7000' of a three-dimensional environment (e.g., a mixed reality three-dimensional environment) that is viewable by user 7002 when computer system 101 is powered on. For example, view 7000' shows a portion of physical environment 7000 when computer system 101 is first powered on, or when first activated after factory settings or other settings have been reset on the computer system, or otherwise during initial settings of the computer system (e.g., when input modalities have not been registered, personalized, and/or calibrated, e.g., user speech, hand gestures, and/or gaze have not been registered and/or calibrated). In the scenario of fig. 19B, view 7000' does not yet include any virtual content.

Fig. 19C (e.g., fig. 19C1, 19C2, and 19C3, with a user interface similar to that shown in fig. 19C3 shown on HMD 7100a in fig. 19C 1) illustrates a transition from fig. 19B in response to an input 1910 detected on hardware button 7508. The hardware button 7508 is responsive to tactile and mechanical actuation (including pressing) as described in further detail with reference to fig. 7A-7D, 8A-8G, 9A-9D, 10A-10D, 11A-11F, 12A-12G, and tables 1,2,3, and 4. In some implementations, the input 1910 is a multiple press input, such as three presses detected on the hardware button 7508. In some embodiments, in response to input 1910, computer system 101 displays barrier-free configuration menu 1900 in view 70000', as shown in fig. 19C. The barrier-free configuration menu 1900 includes options 1902, 1904, 1906, and 1908 (collectively options 1902-1908) for configuring one or more interaction models, such as auxiliary and/or adaptive interaction models for persons with visual, motor, auditory, and/or cognitive impairments and/or other barrier-free needs. For example, option 1902 is used to configure an interaction model for a person with visual impairment, e.g., an interaction model that uses an interaction modality that enables a user to interact with a computer system even though the user is not visible. Option 1904 is used to configure an interaction model for people with athletic injuries, e.g., an interaction model that uses an interaction modality that enables a user to interact with a computer system even though the user is unable to move a portion of their body. Option 1906 is used to configure an interaction model for a person with hearing impairment, e.g., an interaction model that uses an interaction modality that enables a user to interact with a computer system even if the user is inaudible. Option 1908 is used to configure an interaction model for a person with cognitive impairment, e.g., an interaction model that uses an interaction modality that assists the user in interacting with the computer system even though the user has some cognitive impairment.

In some embodiments, in connection with displaying the barrier-free configuration menu 1900, the computer system 101 generates and optionally outputs a verbal description of the barrier-free configuration menu 1900. In some implementations, the verbal description of the unobstructed configuration menu 1900 includes verbal descriptions of the entire unobstructed configuration menu 1900 and/or verbal descriptions of the options 1902-1908 (e.g., regardless of whether the user 7002 browses the options 1902-1908). In some embodiments, computer system 101 outputs a verbal description of unobstructed configuration menu 1900 so that it can be heard by bystanders and people nearby. Thus, based on the verbal description, a person other than the user 7002 may assist in navigating the unobstructed configuration menu 1900 without requiring the user 7002 to uninstall (e.g., remove or take down) the computer system 101 (and/or the display generation component 7100), for example, if the computer system 101 is installed on the head of the user 7002.

In some embodiments, in response to an input (such as input 1920 in fig. 19B) directed to another hardware input device, hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element), an unobstructed mode (such as "read aloud" mode) is directly activated (e.g., without further user input, such as input directed to unobstructed configuration menu 1900, and without interaction with an unobstructed menu (such as unobstructed configuration menu 1900). In a "speakable" mode, a navigational input (e.g., an air gesture, such as a pinch gesture performed with one or both hands) causes the computer system to move the focus selector between user interface elements in the displayed user interface and correspondingly output an audio description of the element with the input focus, thereby enabling the user to interact with virtual content that is not visible to the user. In some implementations, the input 1920 is a multiple press input detected on a hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element). Hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element) is responsive to tactile and mechanical actuation (including both pressing input and rotating input) as described in further detail with reference to fig. 7A-7D, 8A-8G, 9A-9D, 10A-10D, 11A-11F, 12A-12G, and tables 1, 2, 3, and 4.

In some implementations, input detected on a single hardware button (e.g., hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element)) is enabled to control the unobstructed configuration menu 1900 (e.g., browsable and selectable options in the unobstructed configuration menu 1900) (e.g., using rotational input on a rotatable mechanism and depressible input on a depressible input mechanism hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element)) as described in further detail with respect to fig. 19C-19P.

In some implementations, the barrier-free configuration menu 1900 is displayed prior to calibrating and/or registering the user's gaze so that the computer system 101 can respond to gaze-based or gaze-including input.

In some embodiments, in conjunction with displaying the barrier-free configuration menu 1900, the computer system 101 displays the stay control 1901 in response to an input 1910 on the hardware button 7508. For example, computer system 101 displays stay control 1901 without requiring user 7002 to provide additional input (e.g., input other than input that causes computer system 101 to display unobstructed configuration menu 1900). In the scenario of fig. 19C, the computer system 101 activates the stay control mode in accordance with determining that the gaze input 1903 is directed to and remains directed to the stay control 1901 for a threshold amount of time (e.g., remains directed to the stay control 1901 until the threshold amount of time has elapsed without changing the gaze direction outside of the stay control 1902). The dwell control mode is an unobstructed mode in which the gaze and/or head direction of the user 7002 is used to perform various actions that are otherwise performed with a mouse, keyboard, touch gestures, and/or air gestures (e.g., without requiring use of hardware input devices and/or hands of the user 7002).

In the scenario of fig. 19C, the options 1902-1908 through the unobstructed configuration menu 1900 are scrolled using a rotatable input detected on a hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element). For example, rotatable input on hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element) in a first direction (e.g., clockwise or counterclockwise) moves the input focus (e.g., optionally in combination with a visual indicator of the location of the input focus) one option down from the list of options 1902-1908, and rotatable input on hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element) in a second, different direction (e.g., clockwise or counterclockwise) moves the input focus one option up from the list of options 1902-1908. In some implementations, rotational input on the hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element) in the same direction moves the input focus from one option (e.g., of options 1902-1908) to another one after the other. In the scenario of fig. 19C, the press input detected on hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element) may be used to select the option of options 1902-1908 with the input focus.

In some embodiments, after displaying the barrier-free configuration menu 1900, the input focus defaults to be on the first option (e.g., option 1902 in the scenario of fig. 19C) in the barrier-free configuration menu 1900. In some embodiments, in response to selecting an option from options 1902-1908, a respective submenu of unobstructed configuration menu 1900 corresponding to the selected option is displayed. In the scenario of fig. 19C, an input 1930 detected on a hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element) selects an option 1902 for configuring an input modality that assists a user with vision impairment.

Fig. 19D illustrates a transition from fig. 19C in response to detecting an input 1930 on a hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element). In some implementations, the input 1930 is a single press on a hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element). In response to an input 1930 on a hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element), a vision submenu 1902a of an unobstructed configuration menu 1900 is displayed, as shown in fig. 19D. The vision submenu 1902a corresponds to the option 1902. In some implementations, the user can exit the vision submenu 1902a and return to the main unobstructed configuration menu 1900 in response to selection of the return control 1918 (e.g., via one or more press inputs on a hardware input element 7108 (e.g., a button, crown, or a depressible input element)). In some embodiments, in conjunction with displaying vision submenu 1902a, computer system 101 outputs a verbal description of vision submenu 1902 a. The verbal description optionally corresponds to or includes information also included in the written description 1910. In some embodiments, both the verbal description and the written description 1910 explain the functionality, the purpose of the visual submenu 1902a, how the visual submenu 1902a may be navigated (e.g., the description may explain that input a is used to scroll through options, while input B is used to select options with input focus, as shown in fig. 19D), and/or what options are included in the visual submenu 1902a (e.g., the zoom option 1912 for enabling zooming). In the scenario of fig. 19D, vision submenu 1902a is a submenu for configuring the functionality of content (e.g., passthrough and/or virtual content) in enlarged view 7000'. For example, the computer system activates the virtual amplifier when the zoom option 1912 for enabling zooming is enabled. In some embodiments, when virtual magnification is active, computer system 101 automatically enlarges virtual content and/or representations of physical objects within the bounds of the virtual magnifier (e.g., as the viewpoint of the user changes). In some implementations, the zoom option 1912 may be switched on/off (e.g., enabled or disabled) in response to a press input (e.g., a single press input) detected on the hardware input element 7108 (e.g., a button, crown, or rotatable and pressable input element).

In the scenario of fig. 19D, an input 1932 selecting the continue button 1914 is detected. Input 1932 is detected on hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element), while resume button 1914 has an input focus (e.g., optionally, computer system 101 is navigated to resume button 1914, e.g., changes focus from zoom option 1912 to resume button 1914, in response to rotational input detected on hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element). In some embodiments, in response to selection of continue button 1914, computer system 101 navigates to another submenu associated with the currently selected option in main clear configuration menu 1900, such as submenu 1902b, if present, or computer system 101 automatically navigates back to main clear configuration menu 1900, as described in further detail below with respect to fig. 19E.

Fig. 19E illustrates a transition from fig. 19D in response to detection of an input 1932 selecting the continue button 1914. In some implementations, the input 1932 is a subsequent press input (e.g., a single press that activates the functionality associated with the continue button 1914) detected on the hardware input element 7108 (e.g., button, crown, or rotatable and pressable input element). In response to detecting input 1932 selecting continue button 1914, computer system 101 navigates to submenu 1902b, which is another submenu associated with the currently selected option 1902 in main unobstructed configuration menu 1900.

The vision submenu 1902b is a menu for configuring a speakable mode in which navigational inputs (e.g., air gestures, such as pinch gestures performed with one or both hands) cause the computer system to move a focus selector between user interface elements in a displayed user interface and output an audio description of the element having an input focus accordingly. In some embodiments, in conjunction with displaying vision submenu 1902b, computer system 101 outputs a verbal description of vision submenu 1902 b. The verbal description optionally corresponds to or includes information also included in the written description 1910 b. In some embodiments, both the verbal description and the written description 1910B explain the functionality, the purpose of the visual submenu 1902B, how the visual submenu 1902B may be navigated (e.g., the description may explain that input a is used to scroll through options, while input B is used to select options with input focus, as shown in fig. 19E), and/or what options are included in the visual submenu 1902B (e.g., the speakable control options 1912 for enabling and/or disabling speakable modes).

Vision submenu 1902b includes a speakable control option 1912 for enabling and/or disabling speakable modes and a continue button 1914b that when selected causes the computer system to navigate forward to the next available submenu or return to the main unobstructed configuration menu 1900. In the scenario of fig. 19E, the speakable mode is activated in response to an input 1936 detected on the hardware button. For example, input 1936 is a single press input detected on a hardware button (e.g., input 1936 toggles open speakable control option 1912). In some implementations, the user can exit the vision submenu 1902b and return to the previous submenu 1902a in response to selection of the return control 1918b (e.g., via one or more press inputs on a hardware input element 7108 (e.g., a button, crown, or rotatable and pressable input element)).

In some embodiments, when a control is enabled in one of the submenus of the barrier-free configuration menu 1900, the computer system 101 automatically proceeds to the next available submenu. For example, in the scenario of fig. 19F, upon enabling speakable mode by toggling open control 1912b, the computer system moves to the next available menu, such as vision submenu 1902d (e.g., without requiring further user input selecting continue button 1914 b).

In the scenario of fig. 19E, computer system 101 detects input 1934 on hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element). Input 1934 is a rotational input that scrolls the input focus through or from the speakable control option 1912b to the resume button 1914b, as shown in fig. 19F.

Fig. 19F illustrates a transition from fig. 19E in response to detecting an input 1934 on a hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element). In response to input 1934 on hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element), the computer system moves the input focus from read-aloud control option 1912b to continue button 1914b, as shown in fig. 19F. In the scenario of fig. 19F, when the input focus is positioned on the continue button 1914b, the computer system 101 detects an input 1938 on the hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element) that selects the continue button 1914 b.

Fig. 19G illustrates a transition from fig. 19F in response to detecting a selection of the continue button 1914b input 1938 on the hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element). In some implementations, the input 1938 is a subsequent press input (e.g., a single press of the select continue button 1914) detected on the hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element). In response to detecting input 1938 selecting continue button 1914, computer system 101 navigates to submenu 1902d, which is another submenu associated with the currently selected option 1902 in main unobstructed configuration menu 1900.

Vision submenu 1902d is a menu for configuring how a cursor or focus indicator is controlled in view 7000'. For example, vision submenu 1902d includes three alternatives for controlling a focus indicator, such as gaze cursor control 1911d, head cursor control 1912d, and wrist cursor control 1913d. When gaze cursor control 1911d is activated (e.g., optionally via a press input detected on hardware input element 7108 (e.g., button, crown, or rotatable and pressable input element), the computer system determines the location of the focus indicator based on the gaze direction of user 7002. When the head cursor control 1912d is activated (e.g., optionally via a press input detected on a hardware input element 7108 (e.g., button, crown, or rotatable and pressable input element), the computer system determines the position of the focus indicator based on the head direction of the user 7002. When wrist cursor control 1913d is activated (e.g., optionally via a press input detected on hardware input element 7108 (e.g., button, crown, or rotatable and pressable input element), the computer system determines the position of the focus indicator based on the direction in which the wrist of user 7002 is pointing.

In some embodiments, in conjunction with displaying vision submenu 1902d, computer system 101 outputs a verbal description of vision submenu 1902 d. The verbal description optionally corresponds to or includes information also included in the written description 1910 d. In some embodiments, both the verbal description and the written description 1910d explain the functionality, the purpose of the vision submenu 1902d, how the vision submenu 1902d may be navigated (e.g., the description may explain that input a is used to scroll through options, while input B is used to select options with input focus, as shown in fig. 19E), and/or what options are included in the vision submenu 1902d (e.g., gaze cursor control 1911d, head cursor control 1912d, and wrist cursor control 1913 d). In some implementations, the user can exit the vision submenu 1902d and return to the previous submenu 1902b in response to selection of the return control 1918d (e.g., via one or more press inputs on a hardware input element 7108 (e.g., a button, crown, or rotatable and pressable input element)).

In the scenario of fig. 19E, when the continue button 1914d is in focus, the computer system 101 detects an input 1940 on a hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element). In response to input 1940 selecting continue button 1914d, computer system 101 automatically returns to main unobstructed menu 1900 because there are no additional submenus available for the selected option 1902, as shown in FIG. 19G. In some embodiments, after all submenus of the selected options have been browsed, computer system 101 displays barrier-free configuration menu 1900 with the next available option (such as option 1904 associated with athletic injury) selected by default, as shown in fig. 19H.

Fig. 19H illustrates a transition from fig. 19G in response to selection of the continue button 1914d by input 1940. In response to input 1940 selecting continue button 1914d, computer system 101 automatically returns to main unobstructed menu 1900 because there are no additional submenus available for the selected option 1902. In the scenario of fig. 19H, computer system 101 moves input focus from control option 1902 to control option 1904 in response to rotational input 1942 detected on hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element). Further, in the scenario of fig. 19H, when the input focus is positioned on control option 1904, computer system 101 detects a press input 1944 detected on hardware input element 7108 (e.g., a button, crown, or rotatable and pressable input element).

Fig. 19I illustrates a transition from fig. 19H in response to detecting a press input 1944 on a hardware input element 7108 (e.g., a button, crown, or rotatable and pressable input element) when an input focus is positioned on a control option 1904. When the input focus is positioned on control option 1904, in response to detecting a press input 1944 on hardware input element 7108 (e.g., button, crown, or a depressible input element), the computer system selects control option 1904 for configuring an input modality that assists the user with athletic impairment, and displays an athletic submenu 1904a of unobstructed configuration menu 1900, as shown in fig. 19I.

The motion submenu 1904a is a menu for configuring (e.g., enabling/disabling) a switch control mode (e.g., also sometimes referred to as a switch interaction mode). In the switch control mode, a target location in a three-dimensional environment (such as view 7000') is selected for interaction using ray and point scanning. In the switch control mode, responsive to one or more inputs detected on different (e.g., remote or separate) hardware devices (e.g., auxiliary devices), respective actions are optionally performed. The motion submenu 1904a includes switch access control options 1942a for enabling and/or disabling the switch control mode and a continue button 1942a for proceeding to the next available submenu. When the input focus is positioned on the switch access control option 1942a, the switch control mode is activated in response to an input (e.g., a press input) detected on a hardware input element 7108 (e.g., a button, crown, or a depressible input element). In some embodiments, upon activation of the switch control mode, computer system 101 automatically displays a menu for configuring a wireless connection with an auxiliary input device (e.g., a hardware input device). In some embodiments, upon establishment between the computer system and the auxiliary input device (e.g., upon detection that the auxiliary input device is connected, in some embodiments, in conjunction with displaying the motion submenu 1904a, the computer system 101 outputs a verbal description of the motion submenu 1904a, the verbal description optionally corresponding to or including information that is also included in the written description 1940a, in some embodiments, both the verbal description and the written description 1940a explain functionality, the purpose of the motion submenu 1904a, how the motion submenu 1904a may be navigated (e.g., the description may explain that the input a is used to scroll through options, while the input B is used to select options with input focus, as shown in fig. 19I), and/or what options are included in the motion submenu 1904a (e.g., switch access control options 1942a for enabling and/or disabling the switch control mode).

In the scenario of fig. 19I, when the input focus is on switch access control option 1942a, computer system 101 detects a rotational input 1946 on hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element). In response to the rotational input 1946, the computer system 101 system moves the input focus from the switch access control option 1942a to the continue button 1944a. In the scenario of fig. 19I, when the input focus is on the continue button 1944a, the computer system 101 detects a press input 1948 on a hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element). In response to input 1948 selecting continue button 1914d, computer system 101 advances or navigates to the next available submenu, such as sports submenu 1904b, as shown in FIG. 19J.

Fig. 19J illustrates a transition from fig. 19I in response to selection of the continue button 1914d by input 1948. In response to input 1948 selecting continue button 1914d, computer system 101 displays motion submenu 1904b (e.g., optionally in place of motion submenu 1904 a). The motion submenu 1904b is a menu for configuring (e.g., enabling/disabling) a speech control mode (e.g., configuring whether the computer system 101 is responsive to speech commands as an alternative input modality). In the voice control mode, verbal commands provide instructions (e.g., navigation, selection, and/or execution of other tasks) to the computer system. The motion submenu 1904b includes a voice access control option 1942b for enabling and/or disabling a voice control mode, and a continue button 1942b for proceeding to the next available submenu. When the input focus is positioned on the voice access control option 1942b, the voice control mode is activated in response to an input (e.g., a press input) detected on a hardware input element 7108 (e.g., a button, crown, or a depressible input element). After activating the voice control mode, computer system 101 responds to voice commands (e.g., voice commands detected via one or more microphones in communication with computer system 101). In some embodiments, the same applies to each submenu of the unobstructed configuration menu 1900 and selected ones of the options 1902-1908, in conjunction with displaying the sports submenu 1904b, the computer system 101 outputs a verbal description of the sports submenu 1904 b. The verbal description optionally corresponds to or includes information also included in the written description 1940 b. In some embodiments, both the verbal description and the written description 1940B explain the functionality, the purpose of the sports submenu 1904B, how the sports submenu 1904B may be navigated (e.g., the description may explain that input a is used to scroll through options, while input B is used to select options with input focus, as shown in fig. 19J), and/or what options are included in the sports submenu 1904B (e.g., the voice access control option 1942B).

In the scenario of fig. 19J, when the continue button 19144b has input focus, the computer system 101 detects an input 1950 on a hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element). In response to input 1950 selecting continue button 1944b, computer system 101 proceeds to the next available menu, such as movement submenu 1904c (e.g., as opposed to returning to main clear menu 1900 because there is another submenu available), as shown in FIG. 19K.

Fig. 19K illustrates a transition from fig. 19J in response to selection of the continue button 1944b by the input 1950. In response to input 1950 selecting continue button 1944b, computer system 101 displays a sports submenu 1904c (e.g., optionally in place of sports submenu 1904b or in conjunction with stopping displaying a previous submenu such as sports submenu 1904 b). The motion submenu 1904c is a menu for configuring (e.g., enabling/disabling) the stay control mode. In the dwell control mode, the respective action is performed in response to the gaze input remaining beyond a dwell threshold amount of time at the respective action control (e.g., a control that is enabled to respond to the gaze input without the need to provide additional types of input). The motion submenu 1904c includes a stay access control option 1942c for enabling and/or disabling a stay control mode and a continue button 1942c for proceeding to the next available submenu (e.g., if there are any available submenus, or returning to the main clear menu 1900 because there are no additional available submenus). When the input focus is positioned on the stay access control option 1942c, the stay control mode is activated in response to an input (e.g., a press input) detected on a hardware input element 7108 (e.g., a button, crown, or a depressible input element). Upon activation of the stay control mode, the computer system 101 responds to gaze input (e.g., without need for control by an alternative input mechanism or modality) (e.g., gaze input detected via one or more cameras in communication with the computer system 101). In some embodiments, the same applies to each submenu of the unobstructed configuration menu 1900 and selected ones of the options 1902-1908, in conjunction with displaying the sports submenu 1904b, the computer system 101 outputs a verbal description of the sports submenu 1904 c.

In the scenario of fig. 19K, when the continue button 1944c has an input focus, the computer system 101 detects an input 1952 on a hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element). In response to input 1952 selecting continue button 1944c, computer system 101 automatically returns to main unobstructed menu 1900, as described in further detail below with respect to FIG. 19L.

Fig. 19L illustrates a transition from fig. 19K in response to selection of the continue button 1944c by the input 1952. In response to input 1952 selecting continue button 1944c, computer system 101 automatically returns to main clear menu 1900 because there are no additional submenus available for the selected option 1904. In the scenario of fig. 19L, computer system 101 moves input focus from control option 1904 to control option 1906 in response to rotational input 1954 detected on hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element). In some implementations, computer system 101 automatically moves input focus to control option 1906 in conjunction with returning from the last submenu available for selected option 1904 (e.g., from submenu 1904c, which is the last submenu of selected option 1904) to main clear menu 1900. Further, in the scenario of fig. 19L, when the input focus is positioned on control option 1906, computer system 101 detects press input 1956 detected on hardware input element 7108 (e.g., button, crown, or rotatable and pressable input element).

Fig. 19M illustrates a transition from fig. 19L in response to detecting a press input 1956 on a hardware input element 7108 (e.g., button, crown, or rotatable and pressable input element) when an input focus is positioned on control option 1906. When input focus is positioned on control option 1906, in response to detecting a press input 1956 on hardware input element 7108 (e.g., button, crown, or a depressible input element), computer system 101 selects control option 1906 for configuring an input modality that assists a deaf or hearing impaired user or otherwise uses hearing assistance features, and displays an auditory submenu 1906a of unobstructed configuration menu 1900 (e.g., optionally in combination with ceasing to display main unobstructed configuration menu 1900 or displaying submenu 1906a instead of main unobstructed configuration menu 1900), as shown in fig. 19M.

The auditory submenu 1906a is a menu for configuring (e.g., enabling/disabling) whether one or more types of subtitles are provided. For example, the auditory submenu 1906a includes controls for two different types of subtitles, a live subtitle control option 1962a for enabling and/or disabling display of text generated in real-time from live transcription for audio such as spoken dialog (e.g., played in an application executing on the computer system 101 and/or played in a conversation occurring around a user detected as ambient sound via one or more microphones of the computer system 101), and a closed caption control option 1964a for enabling and/or disabling display of closed caption text (e.g., a transcription generated for recorded content and optionally stored with recorded content before the content is played by the user, such as in metadata, which transcription may generally be more accurate than live transcription, but limited to content for which closed caption text has been prepared and available). In some embodiments, the same applies to each submenu of the unobstructed configuration menu 1900 and selected ones of the options 1902-1908, in conjunction with displaying the auditory submenu 1906a, the computer system 101 outputs a verbal description of the auditory submenu 1906 a.

When the input focus is positioned on the live caption control option 1962a, live captions are enabled in response to an input (e.g., a press input) detected on a hardware input element 7108 (e.g., a button, crown, or a rotatable and pressable input element). Upon activation of the live caption, computer system 101 provides the live caption upon detection of the spoken dialog. In the scenario of fig. 19M, a rotational input 1958 is detected on a hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element).

Fig. 19N illustrates a transition from fig. 19M in response to detecting a rotational input 1958 on a hardware input element 7108 (e.g., button, crown, or rotatable and pressable input element) when an input focus is positioned on a live caption control option 1962 a. When the input focus is positioned on the live caption control option 1962a, the input focus is moved from the live caption control option 1962a to the closed caption control option 1964a in response to a rotational input 1958 on a hardware input element 7108 (e.g., a button, crown, or a depressible input element). In the scenario of fig. 19N, when the input focus is positioned on the closed caption control option 1964a, the computer system detects a press input 1969 on a hardware input element 7108 (e.g., a button, crown, or rotatable and pressable input element). In response, closed captioning is enabled and the switch opens the closed captioning control option 1964a. In addition, in connection with enabling closed captioning, computer system 101 automatically navigates to the next available submenu, or as in the case of the scenario in fig. 19N-19O, computer system 101 automatically returns to main unobstructed menu 1900 because there are no additional submenus available for the selected option 1902, as shown in fig. 19O. Thus, once closed captioning is enabled, computer system 101 automatically proceeds through the unobstructed configuration menu 1900 (e.g., without requiring further user input, such as the input detected on continue button 1966 a).

Fig. 19O illustrates a transition from fig. 19N in response to an input press input 1969 on a hardware input element 7108 (e.g., a button, crown, or rotatable and pressable input element) when an input focus is positioned on a closed caption control option 1964 a. When the input focus is positioned on the closed caption control option 1964a, in response to a pressing input 1969 on a hardware input element 7108 (e.g., a button, crown, or a depressible input element), in conjunction with enabling closed captioning, the computer system 101 automatically returns to the main clear menu 1900 from the last submenu available for the selected option 1906, including automatically moving the input focus to the control option 1908.

Fig. 19P illustrates a transition from fig. 19O in response to detecting a press input 1962 on a hardware input element 7108 (e.g., a button, crown, or rotatable and pressable input element) when an input focus is positioned on a control option 1908. When the input focus is positioned on the control option 1908, in response to detecting a press input 1962 on the hardware input element 7108 (e.g., button, crown, or a depressible input element), the computer system 101 selects the control option 1908 for configuring an input modality that assists a user using the cognitive assistance feature, and displays a cognitive submenu 1908a of the unobstructed configuration menu 1900 (e.g., optionally in combination with ceasing to display the main unobstructed configuration menu 1900 or displaying the submenu 1906a in place of the main unobstructed configuration menu 1900), as shown in fig. 19P. In some embodiments, the same applies to each submenu of the unobstructed configuration menu 1900 and selected ones of the options 1902-1908, in conjunction with displaying the cognitive submenu 1908a, the computer system 101 outputs a verbal description of the cognitive submenu 1908 a.

The cognitive submenu 1908a is a menu for configuring (e.g., enabling/disabling) whether to provide typed feedback. For example, the cognitive submenu 1908a includes a typed feedback control option 1982a for enabling/disabling typed feedback. When typing feedback is enabled, computer system 101 generates and/or outputs a verbal description of the typed letters, words, or other text content typed. In some implementations, the computer system 101 outputs the verbal description while the user 7002 is typing (e.g., substantially simultaneously, e.g., less than one second or half second delay or substantially no delay). When the input focus is positioned on the typing feedback control option 1982a, typing feedback is enabled in response to an input (e.g., a press input) detected on a hardware input element 7108 (e.g., a button, crown, or a depressible input element). Upon activation of the live subtitle, computer system 101 provides typing feedback upon detection of a subsequent typing input.

In some implementations, each of the options 1902, 1904, 1906, 1908 is similarly adapted to display a respective one or more submenus (e.g., vision submenus 1902a, 1902b, and 1902d; sports submenus 1904a, 1904b, and 1904c; auditory submenu 1906a; and/or cognitive submenu 1908 a) of each of the control options 1902-1908 one at a time (e.g., without displaying other submenus corresponding to the selected option). For example, as the user 7002 advances through or browses the vision menu associated with option 1902, a vision submenu 1902b is displayed instead of 1902a and 1902d is displayed instead of vision submenu 1902a. In some implementations, similarly applies to each of the options 1902, 1904, 1906, 1908, after all submenus of the selected option have been browsed, the computer system 101 automatically moves the input focus to the next available control option in conjunction with returning from the last submenu available for the respective selected option 1904 to the main unobstructed menu 1900. In some implementations, similarly applicable to each of vision submenus 1902a, 1902b, and 1902d, motion submenus 1904a, 1904b, and 1904c, auditory submenu 1906a, and/or cognitive submenu 1908a, rotational input detected on hardware input element 7108 (e.g., button, crown, or rotatable and pressable input element) advances through (e.g., navigates or scrolls) a control option, and pressing input detected on hardware input element 7108 (e.g., button, crown, or rotatable and pressable input element) selects the respective control option with an input focus. In some implementations, instead of performing both navigation input and selection input with the same hardware button (e.g., detecting both navigation input and selection input on hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element), as in the scenario of fig. 19B-19P), navigation input and selection input may be performed on separate hardware input devices (e.g., performing rotation input on hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element), and selection input on hardware button 7508). In some embodiments, when one input modality is enabled, the other input modality is automatically disabled. For example, more than one input modality may be enabled simultaneously.

In some implementations, after the configuration process is complete and the user 7002 exits the unobstructed configuration menu 1900 (e.g., via an input direction to a control for closing the unobstructed configuration menu 1900), input directed to the hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element) no longer invokes the unobstructed configuration menu 1900. For example, after the configuration process is complete, the user may reconfigure (e.g., enable/disable) any desired auxiliary features using different settings menus, and the inputs detected on hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element) are reserved for different functions, as described in further detail with reference to fig. 7A-7D, 8A-8G, 9A-9D, 10A-10D, 11A-11F, 12A-12G, and tables 1,2, 3, and 4.

Fig. 20 is a flow chart of a method 20000 for providing a menu of a selection input mechanism according to some embodiments.

In some embodiments, method 20000 is performed at a computer system (e.g., computer system 101 in fig. 1) in communication with a display generation component (e.g., display generation component 120 or display generation component 7100 in fig. 1A, 3, and 4) (e.g., heads-up display, HMD, display, touch screen, or projector) and one or more input devices (e.g., camera, controller, touch-sensitive surface, joystick, button, glove, watch, motion sensor, orientation sensor, and/or rotatable input mechanism, such as a crown). In some embodiments, the display generation component is a user-oriented display component and provides an XR experience to the user. In some embodiments, the computer system is an integrated device having one or more processors and memory enclosed in the same housing as at least some of the first display generating component, the one or more audio output devices, and the one or more input devices. In some embodiments, the display generating component and the one or more audio output devices are integrated and enclosed in the same housing. In some embodiments, method 20000 is managed by instructions stored in a (optionally non-transitory) computer-readable storage medium and executed by one or more processors of a computing system (such as one or more processors 202 of computing system 101) (e.g., control unit 110 in fig. 1A). Some operations in method 20000 are optionally combined and/or the order of some operations is optionally changed.

When a configuration of the computer system is being performed (e.g., during an initial configuration such as a first configuration, activation, or setting of the computer system, such as when the computer system is first activated or powered on, or when the computer system is first activated after a factory setting or other setting has been reset on the computer system), the computer system detects (20002) a first input (e.g., a button, dial, crown, switch, movable hardware input device, or solid state hardware input device (such as a button, dial, crown, or switch), or a combination thereof, and/or a device that detects a local sensor input such as an intensity or force sensor input) directed to a first input (e.g., a press input or multiple press inputs, such as a double press or a triple press and/or a long press) of the one or more input devices, and the computer system uses the input to trigger a corresponding operation and optionally provide tactile feedback, such as tactile feedback corresponding to the detected input. In some embodiments, during such initial setup of the computer system, the input mechanisms have not been registered, personalized, and/or calibrated, e.g., the user's voice, hand gestures, and/or gaze have not been registered and/or calibrated. In some embodiments, during this initial setup, the input mechanisms and/or modalities are selected and/or personalized. The computer system includes one or more sensors that detect inputs including one or more of air gestures and gaze inputs. In some embodiments, the first input device is a hardware input device that is disposed on (e.g., integrated into) a housing of the computer system, rather than external to the device.

In response to detecting the first input to the first input device, the computer system displays (20004) a menu (e.g., a configuration menu for an unobstructed interaction model) comprising a plurality of selectable options for configuring one or more interaction models (e.g., auxiliary and/or adaptive interaction models for persons having visual, motor, auditory, and/or cognitive impairment and/or other unobstructed needs) (e.g., other than the first input device). For example, in the scenario of fig. 19B-19C, computer system 101 displays unobstructed configuration menu 1900 in view 70000' in response to input 1910 detected on hardware button 7508 (e.g., a multiple press input such as a three press). In some embodiments, the interaction model includes multiple input modalities, e.g., multiple channels (e.g., including inputs and outputs) based on human-machine interactions of different sensory systems (e.g., visual, auditory, haptic, and/or other sensory systems) and different forms of processed data (e.g., images, text, sounds, gaze, movement of parts of a user's body, and/or others). In some embodiments, the computer system is multi-modal, and some tasks may be performed by more than one input modality. In some embodiments, the first selectable option corresponds to a control option for configuring a barrier-free interaction model associated with visual impairment, the second selectable option corresponds to a control option for configuring a barrier-free interaction model associated with athletic impairment, the third selectable option corresponds to a control option for configuring a barrier-free interaction model associated with hearing impairment, and/or the fourth selectable option corresponds to a control option for configuring a barrier-free interaction model associated with cognitive impairment. In some embodiments, respective selectable options of the plurality of selectable options are associated with one or more submenus for configuring the barrier-free interaction model (e.g., the computer displays the one or more submenus in response to selecting the respective selectable options). In some embodiments, the menu comprising a plurality of selectable options comprises a plurality of sections (e.g., also referred to as submenus). In some implementations, multiple sections are displayed one at a time (e.g., without displaying other sections of the menu). In some implementations, multiple sections are displayed simultaneously.

In some embodiments, the menu is displayed in a view of the mixed reality three-dimensional environment visible via a display generating component in communication with the computer system. Providing (e.g., whether to display and/or read) a menu of options for different interaction models with the computer system during configuration of the computer system (e.g., during initial setup of the computer system) enables users to select in advance a preferred manner in which they interact with the computer system, including a more intuitive manner for users to later reduce the amount and/or extent of input and/or the amount of time required to interact with the computer system, and in particular enables users using interaction models other than default and that would otherwise require assistance to use the computer system to set up a computer system with an interaction model appropriate to the user only once assistance (e.g., at the beginning of initializing the computer system), enabling users to later use the computer system independently.

In some implementations, the first input device is a hardware input device (e.g., as opposed to a user interface element) that is a hardware button. In some embodiments, the hardware button includes a depressible input mechanism. For example, hardware button 7508 includes a depressible input mechanism, and hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element) includes a depressible mechanism (fig. 19B). In some implementations, the hardware button is a solid state button. In some implementations, the hardware button detects a press input and/or a rotation input. For example, hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element) includes a rotatable mechanism (fig. 19E). In some embodiments, the hardware buttons are referred to as side buttons (e.g., where the hardware buttons are disposed on the sides of a physical housing of the computer system). Providing hardware buttons that are available to interact with (e.g., provide selection inputs to) an options menu that is used to configure different interaction models with a computer system reduces the amount and/or extent of input and/or the amount of time required for a user or someone assisting the user to set a computer system with an interaction model that is appropriate for the user.

In some embodiments, the first input device is a hardware input device that includes a rotatable input mechanism (e.g., a digital crown, a rotating ring, a rotating control wheel, and/or a rotatable hardware input mechanism). For example, hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element) includes a rotatable mechanism (fig. 19E). In some embodiments, the rotary input mechanism is bi-directional. In some implementations, the rotational input detected on the hardware buttons can be used to navigate complex user interfaces (e.g., having multiple elements across multiple user interface levels), scroll, adjust immersion levels, adjust volume, control other functions based on a range of values (e.g., continuous, semi-continuous values, or different values), and/or can be used to perform other functions. In some embodiments, the computer system includes more than one hardware input device, such as a digital crown and side buttons (e.g., optionally, both the digital crown and side buttons are disposed on or integrated into the housing of the computer system). Providing a rotatable input mechanism that is available to interact with (e.g., scroll through, change focus within, or otherwise navigate) an options menu (which is used to configure different interaction models with a computer system) reduces the amount and/or extent and/or amount of time required for a user or someone assisting the user to set a computer system having an interaction model appropriate for the user.

In some embodiments, the one or more input devices include a second input device (e.g., a digital crown) that is different from the first input device (optionally, the first input device and the second input device are hardware input devices), and the computer system detects a second input to the second input device (e.g., multiple presses or clicks, such as two presses, three presses, or other number of presses detected in rapid or immediate succession (e.g., within a threshold amount of time of each other). In some embodiments, in response to detecting a second input to the second input device, the computer system activates a first unobstructed mode (e.g., a "read-aloud" mode (e.g., of one or more unobstructed modes), in which the navigation input (e.g., an air gesture, such as a pinch gesture performed with one or both hands) causes the computer system to move the focus selector between user interface elements in the displayed user interface and correspondingly output an audio description of the element having the input focus), in which a verbal description of the virtual object (e.g., the user interface elements in the user interface, selectable options for configuring one or more interaction models, and selectable options in submenus thereof, and/or other virtual objects) is provided in response to the user input (e.g., navigation input to move the focus selector sequentially forward or backward across multiple user interface tiers of a single application and/or across multiple applications). For example, in the scenario of fig. 19B, three presses or clicks on hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element) invoke a speakable mode. In some implementations, the navigation input used in the "read aloud" mode is an air gesture. In some implementations, navigation input in "speakable" mode is performed on a hardware device such as the first input device and/or the second input device. In some embodiments, three presses (e.g., or clicks) on a digital crown (e.g., hardware buttons that detect both press and/or rotate inputs) cause the computer system to display a menu (e.g., a configuration menu for an unobstructed interaction model), and three presses (e.g., or clicks) on a side button (e.g., a second hardware button that is different from the digital crown) cause the computer system to directly activate an unobstructed mode (e.g., a "speakable" mode) without selecting a selectable option from the menu. Providing an input device that is usable to directly activate a "speakable" unobstructed mode in which an audio description of a currently focused user interface element is provided during navigation within an options menu for configuring a different interaction model with a computer system reduces the amount and/or extent and/or amount of time required to enable a user or someone assisting the user to set features of the computer system with an interaction model that is appropriate for the user.

In some embodiments, the first input includes two or more presses (e.g., double-click, triple-click, and/or other immediately consecutive presses (e.g., within a threshold amount of time of each other, such as 0.5 seconds, 1 second, 2 seconds, 3 seconds, 5 seconds, or other threshold amount of time)) on a first input device (e.g., optionally, a hardware input device, such as a button or digital crown). For example, in the scenario of fig. 19B-19C, in response to input 1910 detected on hardware button 7508, the computer system displays barrier-free configuration menu 1900 in view 70000'. In some embodiments, the first input includes a first number of presses and the second input on the first input device includes a second number of presses (e.g., different from the first number of presses) that cause the computer system to perform different operations (e.g., cancel or exit a configuration menu, display a different menu (such as a main menu user interface), launch a user interface of a corresponding application, and/or other functions). In some implementations, a third input on the first input device that includes a third number of presses (optionally different from the first number of presses or the second number of presses) is used to activate or deactivate an unobstructed mode (e.g., a "speakable" mode). The direct activation of the "speakable" unobstructed mode in response to a quick succession of presses via the corresponding input device provides the user or someone assisting the user with an input shortcut that reduces the amount and/or extent and/or amount of time required to enable the user or assistant to set the features of the computer system with an interaction model appropriate to the user.

In some embodiments, the computer system detects a third input (optionally, the third input is a subsequent input detected after the first input) directed to a first hardware input device (e.g., the first input device or another hardware input device) of the one or more input devices (e.g., the third input is a navigation input such as a rotational input on a rotatable mechanism of the hardware input device in one or more directions (e.g., clockwise or counter-clockwise), one or more press inputs on a depressible mechanism of the hardware input device, a combination of one or more rotational inputs and press inputs, and optionally a gaze input). In some implementations, in response to detecting a third input (e.g., a navigation input) directed to the first hardware input device, the computer system positions (optionally in conjunction with positioning a visual focus indicator representing the input focus) the input focus on a first selectable option of a plurality of selectable options (e.g., for configuring one or more interaction models). For example, in the scenario of fig. 19C-19P, each of the options 1902-1908 similarly applies to the barrier-free configuration menu 1900, and similarly applies to the vision submenus 1902a, 1902b, and 1902d, the motion submenus 1904a, 1904b, and 1904C, the auditory submenu 1906a, and/or the cognitive submenu 1908a, each of which detects a rotational input on the hardware input element 7108 (e.g., button, crown, or rotatable and pressable input element) advances through (e.g., navigates or scrolls) the control option, and the pressing input detected on the hardware input element 7108 (e.g., button, crown, or rotatable and pressable input element) selects the corresponding control option with input focus. In some embodiments, positioning the input focus on the first selectable option includes moving the input focus from a second selectable option of the plurality of selectable options (optionally in combination with moving a visual focus indicator representing the input focus) (optionally, sequentially browsing the first selectable option and the selectable option). In some embodiments, the third input is a rotational input on a hardware input device (e.g., a hardware input device including a rotatable input mechanism, such as a digital crown). Moving the input focus between selectable options in the menu for configuring different interaction models with the computer system in response to input via the hardware input device enables navigation within the menu to be performed without displaying additional controls, which reduces the amount and/or extent and/or amount of time required for the user or someone assisting the user to set a computer system having an interaction model appropriate for the user.

In some embodiments, the computer system detects a fourth input (e.g., a selection input optionally detected after detecting the first input (e.g., a first input that causes the computer system to display a configuration menu for the barrier-free interaction model)) directed to a second hardware input device of the one or more input devices (e.g., a hardware input device that is the same as or different from the first hardware input device). In some implementations, in response to detecting a fourth input (e.g., a selection input) directed to the second hardware input device, the computer system selects a first selectable option (e.g., having an input focus) of the plurality of selectable options (e.g., for configuring the one or more interaction models). In some embodiments, selecting the first selectable option includes activating or enabling a respective interaction model of the one or more interaction models. In some implementations, the fourth input is a press input on a hardware button (e.g., a side button). For example, in the scenario of fig. 19C, each of the options 1902-1908 similarly applicable to the barrier-free configuration menu 1900, the corresponding one of the options 1902-1908 with input focus may be selected using a press input detected on a hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element). Selecting the option in the menu to configure a different interaction model with the computer system in response to input via the hardware input device reduces the amount and/or extent of input and/or the amount of time required for the user or someone assisting the user to set a computer system with an interaction model appropriate for the user.

In some implementations, the computer system detects a third input (e.g., the third input is a navigation input or a selection input) directed to a hardware input device (e.g., the first input device or another hardware input device) of the one or more input devices. In some embodiments, in response to detecting a third input directed to a respective hardware input device (e.g., the hardware input device has a rotatable input mechanism and/or a depressible input mechanism), the computer system selects a second selectable option (e.g., having an input focus) of the plurality of selectable options (e.g., for configuring one or more interaction models) in accordance with determining that the third input satisfies a second input criterion (e.g., the first input is a different second type of input via the same hardware input device, such as a selection input, such as a pressing or clicking input), in accordance with determining that the third input satisfies a first input criterion (e.g., the first input is a first type of input via the hardware input device, such as a navigation input, such as a rotation input). For example, in the scenario of fig. 19C-19P, the unobstructed configuration menu 1900 is controlled with input detected on hardware input element 7108 (e.g., button, crown, or rotatable and pressable input element), or interacted with the input (e.g., browses the unobstructed configuration menu 1900 and selects an option in the unobstructed configuration menu) (e.g., using rotational input on a rotatable mechanism and pressing input on pressable input mechanism hardware input element 7108 (e.g., button, crown, or rotatable and pressable input element) (e.g., without using other input detected on other devices). In some embodiments, the first selectable option is the same as the second selectable option. In some embodiments, the first selectable option and the second selectable option are different. In some implementations, the third input is a subsequent input detected after the first input is detected (e.g., the first input causes the computer system to display a configuration menu for the barrier-free interaction model). In some embodiments, a configuration menu for the barrier-free interaction model may be navigated and/or interacted with a single input device, optionally with a rotatable input mechanism and/or a depressible input mechanism. Providing an input device that is operable to perform both navigational inputs and selection inputs within an options menu (which is used to configure different interaction models with the computer system) reduces the amount and/or extent of input and/or the amount of time required for a user or someone assisting the user to set a computer system having an interaction model that is appropriate for the user.

In some embodiments, positioning the input focus on a first selectable option of the plurality of selectable options is performed in response to detecting a rotational input on the hardware input device. In some implementations, the third input that satisfies the first input criteria is a navigational input that includes rotation in one or more directions (e.g., clockwise or counterclockwise) on a rotatable mechanism of the hardware input device (e.g., the first input criteria requires the third input to include rotation in one or more directions in order for the third input to satisfy the first input criteria). For example, in the scenario of fig. 19C, the options 1902-1908 of the barrier-free configuration menu 1900 are scrolled using a rotatable input detected on a hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element). Providing an input device that can be rotated to navigate within a menu of options for configuring different interaction models with a computer system reduces the amount and/or extent of input and/or the amount of time required for a user or someone assisting the user to set a computer system having an interaction model that is appropriate for the user.

In some implementations, selecting a second selectable option (e.g., having an input focus) of the plurality of selectable options is performed in response to detecting a press input on a hardware input device (e.g., a first input device or another hardware input device) (e.g., a second input criterion requires a third input to include a press input on the hardware input device in order for the third input to satisfy the second input criterion). For example, a press input 1930 on a hardware input element 7108 (e.g., a button, crown, or rotatable and depressible input element) in fig. 19C selects option 1902 of the unobstructed configuration menu 1900. Providing an input device that can be pressed to select or activate options within a menu for configuring different interaction models with a computer system (e.g., where the input device optionally can also be rotated to navigate within the menu) reduces the amount and/or extent and/or amount of time required for a user or someone assisting the user to set a computer system having an interaction model appropriate for the user.

In some embodiments, in conjunction with positioning the input focus (optionally, in conjunction with positioning a visual focus indicator representing the input focus) on a first selectable option of a plurality of selectable options (e.g., for configuring one or more interaction models), the computer system outputs an audio description of the first selectable option of the plurality of selectable options (e.g., what type of user interface element is the first selectable option, what is the function associated with the first selectable option, and/or a name or label associated with the first selectable option). For example, in the scenario of fig. 19B-19C, in conjunction with displaying the unobstructed configuration menu 1900, the computer system 101 generates and optionally outputs a verbal description of the unobstructed configuration menu 1900, optionally including verbal descriptions of the entire unobstructed configuration menu 1900 and/or verbal descriptions of options 1902-1908 (e.g., regardless of whether the user 7002 browses options 1902-1908). In some implementations, when a third input is detected (e.g., a third input that causes the computer system to focus the input on the first selectable option), a respective unobstructed mode (e.g., a "speakable" mode) in which verbal descriptions of virtual objects (e.g., user interfaces, user interface elements in the user interfaces, and/or other virtual objects) are provided in response to the user input is inactive. In some implementations, the computer system detects a fourth input (optionally, the third input is a subsequent input detected after the first input) directed to the first hardware input device (e.g., the first input device or another hardware input device) (e.g., the third input is a navigation input). and, in response to detecting a fourth input (e.g., a navigation input) directed to the first hardware input device, the computer system outputs an audio description (e.g., verbal description of what type of user interface element is the different selectable option, what functionality is associated with the different selectable option, and/or a name or tag associated with the different selectable option) of the different selectable options in conjunction with positioning the input focus (optionally, positioning a visual focus indicator representing the input focus) over the different selectable options of the plurality of selectable options (e.g., for configuring one or more interaction models), wherein the respective unobstructed mode (e.g., speakable mode) is active. In some embodiments, an audio description of selectable options for configuring one or more interaction models is provided while the configuration of the computer system is being performed (e.g., and/or a configuration menu for an unobstructed interaction model is being browsed). For example, the computer system outputs an audio description of one or more selectable options (e.g., including a first selectable option) that are browsed in response to detecting one or more navigation inputs for navigating one or more of the plurality of one or more selectable options. In some embodiments, the computer system outputs an audio description of the browsed one or more selectable options (e.g., including the first selectable option) regardless of whether an unobstructed mode of the one or more unobstructed modes is active or inactive (e.g., even if "speaks" is inactive). During interaction with a menu of options for configuring different interaction models with a computer system, providing an audio description of a currently focused user interface element or input target, regardless of whether a "read-aloud" barrier-free mode is enabled, provides improved feedback that reduces the amount and/or extent and/or amount of time required for a user or someone assisting the user to set a computer system with an interaction model appropriate to the user.

In some embodiments, the computer system displays controls (e.g., user interface elements such as a dwell control indicator) for activating a dwell control mode while the configuration of the computer system is being performed. In some implementations, the stay control mode is an unobstructed mode in which the user's gaze and/or head direction is used to perform various actions that are otherwise performed with a mouse, keyboard, touch gesture, and/or air gesture (e.g., without requiring the use of hardware input devices and/or the user's hands). In some embodiments, the user's gaze is set on the stay-action control for a respective amount of time (e.g., a stay threshold amount of time) and after the respective amount of time has elapsed (e.g., when the user's gaze time continues to be set on the stay-action), the operation associated with the stay-action control (e.g., clicking, dragging, scrolling, and/or other actions) is performed in some embodiments, the computer system detects a gaze input directed to the control for activating the stay-control mode, and in response to detecting a gaze input directed to the control for activating the stay-control mode (e.g., after the gaze input is directed to the control for activating the control mode for more than a threshold amount of time (optionally without moving the gaze input in a different direction, e.g., beyond the control for activating the control mode)), the stay-control mode is activated automatically (e.g., without satisfying additional user inputs and/or other conditions), e.g., in the context of fig. 19C, the computer system 101 directs and remains directed to the stay-control 1901 for a threshold amount of time in accordance with determining that the gaze input 1901 is directed and remains directed to the stay-control for a threshold amount of time (e.g., remains directed to stay-control for stay-control mode before the threshold amount of time has elapsed, and does not change in accordance with the gaze input is determined to be directed to the control for activating the control mode (e.g., the control for activating the control mode), the stay control mode is activated (e.g., a predefined amount of time since the gaze input is directed to the control for activating the control mode, or an amount of time dynamically determined by the computer system, such as 3 seconds, 4 seconds, 5 seconds, 6 seconds, or other threshold amount of time), and the computer system relinquishes activating the stay control mode in accordance with determining that the gaze input moved in a direction other than the stay control for activating the stay control mode before the threshold amount of time passed. The direct activation of the "stay control" barrier-free mode, in which a user interface element may be activated in response to input directed to the user interface element that automatically lasts at least a threshold amount of time in response to such input directed to a control for activating the "stay control" mode, provides a means for a user to independently enable an interaction model appropriate for the user without assistance.

In some embodiments, after the configuration of the computer system is complete (e.g., after the initial configuration of the computer system has ended), the computer system detects a subsequent input directed to the first input device (e.g., the subsequent input is the same as the first input (e.g., another instance of the same type of input), but is performed at a different time (such as after the configuration of the computer system is complete), while the first input is performed while the configuration is being performed). In some embodiments, in response to detecting a subsequent input to the first input device after configuration of the computer system is complete, the computer system discards displaying a menu (e.g., a configuration menu for the barrier-free interaction model) that includes a plurality of selectable options for configuring one or more interaction models. For example, after the configuration process is complete and the user 7002 exits the barrier-free configuration menu 1900, input directed to the hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element) no longer invokes the barrier-free configuration menu 1900, and the same input on the hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element) that invokes the barrier-free configuration menu 1900 is reserved for different functionality, as described in further detail with reference to fig. 7A-7D, 8A-8G, 9A-9D, 10A-10D, 11A-11F, 12A-12G, and tables 1,2,3, and 4. Enabling the menu of options for configuring different interaction models with the computer system to be invoked in response to input via a particular input device during configuration of the computer system and not outside of the configuration of the computer system (e.g., after the configuration has ended) reduces the chance of accidentally invoking the menu and/or changing interaction model settings during subsequent use of the computer system outside of the initial configuration process.

In some embodiments, in response to detecting a subsequent input to the first input device after configuration of the computer system is complete, the computer system performs an operation distinct from displaying a menu (e.g., a configuration menu for the barrier-free interaction model) that includes a plurality of selectable options for configuring one or more interaction models. In some embodiments, the operation is a preset operation. In some embodiments, the preset operation is different depending on the number of presses detected on the depressible mechanism of the hardware input device, depending on whether a hold of a press is detected and/or for which time periods and/or depending on the state of the computer system, depending on the direction of the rotational input, and other criteria, as described in further detail herein with reference to tables 1,2,3, and 4. Example preset operations include taking a screenshot, powering down the device, restarting the device, entering a hardware reset mode, answering a phone call, recording video or audio, changing immersion levels, displaying a main user interface, and/or other operations. In some embodiments, the operation depends on what mode (such as what unobstructed mode) is active when the first input is detected. In some embodiments, the operation (optionally, additionally) depends on what element has an input focus. For example, after the configuration process is complete and the user 7002 exits the barrier-free configuration menu 1900, the same inputs on the hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element) that invoke the barrier-free configuration menu 1900, as well as other inputs detected on the hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element), are reserved for different functionalities, as described in further detail with reference to fig. 7A-7D, 8A-8G, 9A-9D, 10A-10D, 11A-11F, 12A-12G, and tables 1,2,3, and 4. During configuration of the computer system and not outside of the configuration of the computer system (e.g., after the configuration has ended), the menu of options for configuring the different interaction models with the computer system is enabled to be invoked in response to input via the particular input device, making the input device available for other operations when not needed for configuring the computer system, such that more operations can be performed without additional controls being displayed.

In some embodiments, after the configuration of the computer system is complete (e.g., after the initial configuration of the computer system has ended), the computer system detects a press input (e.g., including one or more presses or clicks) to a first input device (e.g., the first input device is a hardware input device having a depressible mechanism). In some implementations, in response to detecting a press input to the first input device, the computer system activates a respective unobstructed function (e.g., including activating a "read aloud" mode in which verbal descriptions of virtual objects (e.g., user interfaces, user interface elements in the user interfaces, and/or other virtual objects) are provided in response to user input (e.g., navigation input that moves the focus selector forward or backward sequentially across multiple user interface levels of a single application and/or across multiple applications). For example, in the scenario of fig. 19B, three presses or clicks on hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element) invoke a speakable mode. Activating the barrier-free function in response to input via the same input device used to invoke and/or interact with the options menu (which is used to configure a different interaction model with the computer system) during initial configuration of the computer system provides the user or someone assisting the user with an input shortcut that reduces the amount and/or extent and/or amount of time of input required to enable the user or assistant to set features of the computer system with an interaction model appropriate to the user, and may intuitively provide the input shortcut using the same input device as previously used to configure the barrier-free features without displaying additional controls.

In some implementations, the computer system detects a fifth input on the first input device (e.g., a rotational input detected on a hardware input device (such as the first input device) having a ratable input mechanism (e.g., a digital crown). In some embodiments, in response to detecting the fifth input on the first input device, the computer system, in accordance with a determination that the fifth input was detected prior to completion of configuration of the computer system (e.g., while configuration of the computer system is being performed), positions (optionally in conjunction with positioning a visual focus indicator representing the input focus) the input focus on a respective selectable option of a plurality of selectable options (e.g., for configuring one or more interaction models) (e.g., or more generally, performs a navigation operation relative to the plurality of selectable options, such as scrolling or moving the input focus, during configuration of the computer system). In some embodiments, in accordance with a determination that the fifth input is detected after configuration of the computer system is complete, the computer system performs an operation (e.g., a system level operation) that is different from positioning the input focus on the respective selectable option (e.g., or more generally, performs an operation that is not a navigation operation with respect to one or more selectable options). For example, after the configuration process is complete and the user 7002 exits the barrier-free configuration menu 1900, the same inputs on the hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element) that invoke the barrier-free configuration menu 1900, as well as other inputs detected on the hardware input element 7108 (e.g., button, crown, or rotatable and depressible input element), are reserved for different functionalities, as described in further detail with reference to fig. 7A-7D, 8A-8G, 9A-9D, 10A-10D, 11A-11F, 12A-12G, and tables 1,2,3, and 4. In some implementations, a different operation than positioning the input focus on the respective selectable option is to change the immersion level (e.g., associated with a mixed reality three-dimensional environment). For example, the computer system increases the immersion level in accordance with a determination that the fifth input is a rotational input in a first direction, and decreases the immersion level in accordance with a determination that the fifth input is a rotational input in a second direction that is different from (e.g., opposite) the first direction. In some embodiments, if the fifth input is detected before configuration of the computer system is complete, the computer system scrolls through or browses the plurality of selectable options in response to the fifth input and the computer system changes the immersion level of the three-dimensional environment available for viewing via a display generation component in communication with the computer system. Performing one or more operations other than menu navigation operations in response to input via an input device used to navigate a menu of options during initial configuration of the computer system, the menu of options being used to configure a different interaction model with the computer system, makes the input device available for other types of operations when not needed to configure the computer system, such that more types of operations can be performed without displaying additional controls.

In some embodiments, the computer system displays a first user interface in a first subset of user interfaces for configuring a first interaction model of the one or more interaction models (e.g., wherein the first subset of user interfaces for configuring the first interaction model is associated with a respective (e.g., selected) selectable option of the plurality of selectable options). In some embodiments, a second subset of user interfaces for configuring a second interaction model (e.g., of the one or more interaction models) that is different from the first subset of user interfaces is associated with a different selectable option of the plurality of selectable options. In some embodiments, the first subset of user interfaces for configuring the first interaction model are ordered in a sequence such that the computer system sequentially displays respective user interfaces from the subset of user interfaces in response to the navigational input. In some embodiments, the first subset of user interfaces for configuring the first interaction model corresponds to a submenu of configuration menus for the barrier-free interaction model, and the first user interface corresponds to the first submenu. In some embodiments, the first user interface is displayed without displaying other user interfaces in the first subset of user interfaces. In some embodiments, the computer system detects one or more user inputs (e.g., selection inputs and/or navigation inputs). In some embodiments, the one or more inputs include inputs to toggle on and toggle off configuration options (e.g., enable or disable respective unobstructed modes, such as a "read-aloud" mode, a "toggle interaction mode" (e.g., a mode in which some command is performed with an external or remote hardware device in communication with the computer system), and a mode in which the virtual amplifier is active (e.g., in which the computer system automatically amplifies virtual content and/or representations of physical objects within the confines of the virtual amplifier as the point of view of the user changes). In some embodiments, the one or more inputs include an input confirming the selection. In some embodiments, in response to detecting one or more user inputs, the computer system activates a function of the first interaction model (e.g., in a scenario in which the first interaction model includes an interaction mode for a visually impaired person, activating the function includes activating a "read aloud" mode, activating a virtual amplifier, selecting a method for determining input focus (e.g., based on where a portion of a user's body is pointing)), and automatically displaying a second user interface in the first subset of user interfaces (and optionally, ceasing to display the first user interface in the first subset of user interfaces). In some embodiments, after browsing the last user interface in the first subset of user interfaces (e.g., interfaces for configuring the first interaction model) (e.g., after scrolling through the first subset of user interfaces), the computer system redisplays a menu comprising a plurality of selectable options (e.g., configuration menu for the barrier-free interaction model). For example, in the scenario of fig. 19F, upon enabling speakable mode by toggling open control 1912b, the computer system moves to the next available menu, such as vision submenu 1902d (e.g., without requiring further user input selecting continue button 1914 b). In response to detecting an activation input for configuring a first interaction model of a plurality of interaction models capable of being configured via the provided menu, automatically proceeding to a next portion or page of a menu for configuring a different second interaction model reduces the amount and/or extent and/or amount of time required for a user or someone assisting the user to set up a computer system with an interaction model appropriate to the user through the menu.

In some embodiments, the plurality of selectable options includes a first set of one or more controls for enabling control of the focus selector with a respective portion of the user's body different from the user's eyes. In some embodiments, the first set of one or more controls includes a first control for controlling the focus indicator using the user's gaze (optionally selected by default), a second control for controlling the focus indicator using the user's head direction (e.g., including orientation and/or height), and/or a third control for controlling the focus indicator using the user's head. In some embodiments, the first set of one or more controls for controlling a focus selector (e.g., for controlling the position of the focus selector in a mixed reality three-dimensional environment) is an alternative method for controlling a focus indicator. In some implementations, the computer system detects gaze input, and in response to detecting gaze input, in accordance with a determination that a respective portion of the user's body other than the user's eyes cannot be utilized to control the focus selector, the computer system positions the focus selector in accordance with the gaze input, wherein the device does not respond to gaze input by positioning the focus selector in response to gaze input when the focus selector is enabled to be controlled with a respective portion of the user's body other than the user's eyes. In some implementations, in accordance with a determination to enable control of the focus selector with a respective portion of the user's body that is different from the user's eyes, the computer system forgoes positioning the focus selector in accordance with the gaze input (e.g., the computer system is not responsive to the gaze input). For example, in fig. 19G, vision submenu 1902d includes three alternatives for controlling a focus indicator, such as gaze cursor control 1911d, head cursor control 1912d, and wrist cursor control 1913d. Providing a menu of options for different interaction models with the computer system, where the menu may be used to select alternatives that use the user's gaze to determine which user interface element(s) should have focus or be the target of the input, may be particularly beneficial to users with blindness or low vision or whose gaze is otherwise difficult to track, thereby making user interaction with the computer system more unobstructed to a wider population.

In some embodiments, a menu including a plurality of selectable options for configuring one or more interaction models is displayed prior to performing (e.g., starting or completing) a calibration process of the user's gaze (e.g., prior to the user's gaze being calibrated and/or registered for use as an input modality). For example, in the scenario of fig. 19B-19C, the unobstructed configuration menu 1900 is displayed prior to calibrating and/or registering the user's gaze so that the computer system 101 can respond to gaze-based or gaze-including input. In some implementations, the calibration process involves detecting movement of the user's gaze using one or more cameras and using the movement information to interpret further movement of the user's gaze to determine what input, if any, was performed based on the user's gaze. Providing a menu of options for different interaction models with the computer system (where the menu may be used to select an alternative to using the user's gaze to determine which user interface element(s) should have focus or be the target of the input) prior to calibrating the computer system for the particular user's gaze enables the user to more efficiently access the menu and reduces the number of inputs required to activate the menu, thereby making user interaction with the computer system more unobstructed for a wider population.

In some embodiments, the plurality of selectable options includes a second set of one or more controls corresponding to a set of one or more input models that enable control of the device with alternative inputs that are different from the air gestures (e.g., the motion submenus 1904a, 1904b, and 1904c in fig. 19I-19K). In some implementations, the second set of one or more controls includes a first control for enabling/disabling dwell control (e.g., wherein an action is performed in response to gaze input remaining at the respective action control for more than a threshold amount of time), a second control for enabling/disabling voice control (e.g., the computer system accepts and/or responds to voice commands as input models), and a third control for enabling/disabling switch control (e.g., a separate hardware device (e.g., a wireless switch device) for operating the computer system, optionally in combination with other input models and/or other input models). In some embodiments, the computer system detects an air gesture, and in response to detecting the air gesture, in accordance with a determination that the device is enabled to be controlled with the air gesture (e.g., button presses, knob rotations, verbal inputs, and/or other inputs), the computer system performs an operation in accordance with (e.g., in response to) the air gesture, wherein the device does not respond to the air gesture by performing the operation when the device is enabled to be controlled with the alternative inputs. In some implementations, in accordance with a determination that the device is enabled to be controlled with input that does not require an air gesture, the computer system forgoes performing operations in accordance with the air gesture (e.g., the computer system does not respond to the air gesture). Providing a menu of options for different models of interaction with the computer system (where the menu may be used to select alternatives for providing input other than using air gestures so that the user may provide input using a non-motion based modality) may be particularly beneficial to users with movement disorders, thereby making user interaction with the computer system more unobstructed to a wider population.

In some implementations, a first control in the second set of one or more controls corresponds to a control for activating (e.g., a toggle control for enabling and/or disabling) a stay control mode (e.g., a stay control 1901 in fig. 19C). In some implementations, when the dwell control mode is active, the respective action is performed in response to the gaze input being directed at the respective action control beyond a dwell threshold amount of time. Providing a menu of options for different interaction models with a computer system, where the menu may be used to toggle on and/or toggle off a stay control such that a user may provide input using a non-motion-based modality (such as gaze), reduces the amount and/or extent of input and/or the amount of time required to perform an operation, which may be particularly beneficial to users with movement disorders, thereby making user interaction with the computer system more unobstructed to a wider population.

In some embodiments, a second control in the second set of one or more controls corresponds to a control (e.g., a toggle control for enabling and/or disabling) for activating a switch control mode (e.g., or a switch interaction mode) (e.g., in fig. 19I, switch access control option 1942a for enabling and/or disabling the switch control mode). In some embodiments, when the switch control mode is active, the respective actions are performed in response to one or more inputs detected on a different (e.g., remote or separate) hardware device (e.g., an auxiliary device) (e.g., the hardware device optionally communicates wirelessly with the computer system), wherein the respective actions are optionally performed in other ways with one or more air gestures (e.g., when the switch control mode is inactive, or when the computer system is in a normal mode (e.g., wherein the barrier-free mode, particularly the barrier-free mode related to athletic functionality, is inactive)). In some implementations, the switched interaction mode of the computer system is enabled (e.g., automatically enabled) in response to detecting that communication is established between the computer system and the auxiliary input device (e.g., upon detecting that the auxiliary input device is connected to the computer system, the device automatically becomes responsive to input via the auxiliary input device). In the switch control mode, a target location in a three-dimensional environment is selected for interaction using ray and point scanning. Providing a menu of options for different interaction models with a computer system, where the menu may be used to enable use of a physical input device such as a physical button, a microphone, a straw, and/or optionally another input device that does not require hand movement, such that a user may use a non-gesture-based modality (e.g., use of a physical input device) to provide input) reduces the amount and/or extent and/or amount of time required to perform an operation, which may be particularly beneficial to users with movement disorders, thereby making user interaction with a computer system more unobstructed to a wider population.

In some embodiments, a computer system detects an input selecting a control for activating a switch control mode, and in response to detecting an input selecting a control for activating a switch control mode, the computer system activates the switch control mode, and displays a corresponding menu for configuring a wireless connection with a hardware input device (e.g., a wireless switch accessory) for providing input in the switch control mode. For example, in the scenario of fig. 19I, in response to selecting switch access control option 1942a, the switch control mode is activated and computer system 101 automatically displays a corresponding menu for configuring a wireless connection with an auxiliary input device (e.g., a hardware input device). In some implementations, the options in the menu for configuring the wireless connection with the hardware input device are verbally described (e.g., speaks) by the computer system in response to user input (e.g., rotatable input directed to the first input device) to browse the options. In response to a user enabling a physical input device, automatically providing a menu (e.g., optionally, a speech providing menu) for setting and/or connecting the physical input device (such as a physical button, a microphone, a straw, and/or another input device optionally requiring no hand movement) such that the user may use a non-gesture based modality (e.g., use the physical input device) to provide input reduces the amount and/or extent of input and/or the amount of time required to perform an operation, which may be particularly beneficial to a user with movement impairment, thereby making user interaction with the computer system more unobstructed to a wider population.

In some embodiments, a menu comprising a plurality of selectable options for configuring one or more interaction models is displayed prior to performing (e.g., starting or completing) a calibration process of the user's hand (e.g., prior to the user's hand and/or gaze being calibrated for use as an input modality, as described herein with reference to fig. 19B). In some embodiments, the calibration process involves detecting movement of the user's hand using one or more cameras and/or sensors, and using the movement information to interpret further movement of the user's hand to determine what, if any, input (e.g., an air gesture) was performed. Before calibrating the computer system for a particular user's hand, providing a menu of options for different interaction models with the computer system (where the menu may be used to select an alternative to using the user's gaze to determine which user interface element(s) should have focus or be the target of the input) enables the user to more efficiently access the menu and reduces the number of inputs required to activate the menu, thereby making user interaction with the computer system more unobstructed for a wider population.

In some embodiments, the computer system detects an input selecting a first option of the plurality of selectable options that corresponds to a vision-unobstructed mode. In some embodiments, the computer system activates the vision-unobstructed mode in response to detecting an input selecting a first option corresponding to the vision-unobstructed mode. For example, the vision submenu 1902b is used to configure a speakable mode (fig. 19E). In some implementations, the vision-unobstructed mode corresponds to a "speakable" mode in which verbal descriptions of virtual objects (e.g., user interfaces, user interface elements in a user interface, and/or other virtual objects) are provided in response to user input (e.g., navigation input that moves a focus selector forward or backward sequentially across multiple user interface levels of a single application and/or across multiple applications). Providing a menu of options that can be used to activate a screen reader and/or other visually unobstructed modes can be particularly beneficial to users that are blind or have low vision or whose gaze is otherwise difficult to track, thereby making user interaction with the computer system more unobstructed to a wider population.

In some embodiments, the computer system detects an input selecting a second option of the plurality of selectable options that corresponds to the hearing unobstructed mode. In some embodiments, the computer system activates the auditory clear mode in response to detecting an input selecting a second option corresponding to the auditory clear mode. For example, the auditory submenu 1906a is used to configure auxiliary features such as live subtitles and closed captioning (fig. 19M-19N). In some implementations, activating the auditory unobstructed mode includes activating one or more of mono audio, live subtitles, and/or closed captioning. Providing a menu of options that can be used to activate a subtitle display and/or other auditory unobstructed modes can be particularly beneficial to users that are deaf or auditory difficult or otherwise prefer to view text subtitles rather than just listen to audio, thereby making user interaction with the computer system more unobstructed to a wider population.

In some embodiments, the computer system detects an input selecting a third option of the plurality of selectable options corresponding to the display setting, and in response to detecting the input selecting the third option corresponding to the display setting, the computer system activates the display setting. For example, in the scenario of fig. 19D, vision submenu 1902a is a submenu for configuring the functionality of content (e.g., passthrough and/or virtual content) in enlarged view 7000'. In some implementations, the display settings correspond to one or more color filters and/or virtual amplifiers (e.g., virtual amplifiers are used to automatically amplify virtual content and/or representations of physical objects within the bounds of the virtual amplifiers as the viewpoint of the user changes). In some embodiments, the positioning of the virtual amplifier is controlled by user input (e.g., pinching and holding, followed by a movement input and/or another user input). In some embodiments, the positioning of the virtual amplifier is controlled by movement of a portion of the user's body (e.g., based on the orientation, direction, pose, and/or other characteristics of a corresponding portion of the user's body, such as the head, eyes, wrists, and/or other portions). Providing a menu of options that can be used to select different display settings (e.g., to cause the computer system to display different sets of colors and/or to apply color filters and/or to zoom in on displayed content) can be particularly beneficial to users with blind or low vision or whose gaze is otherwise difficult to track, thereby making user interaction with the computer system more unobstructed to a wider population.

It should be understood that the particular order in which the operations in fig. 20 are described is merely an example and is not intended to suggest that the order is the only order in which the operations may be performed. Those of ordinary skill in the art will recognize a variety of ways to reorder the operations described herein. Additionally, it should be noted that the details of other processes described herein with respect to other methods described herein (e.g., methods 13000, 14000, 15000, 16000, 17000, and 18000) are likewise applicable in a similar manner to method 20000 described above with respect to fig. 20. For example, the gestures, gaze inputs, physical objects, user interface objects, controls, movements, criteria, three-dimensional environments, display generating components, surfaces, representations of physical objects, virtual objects, audio output patterns, reference frames, viewpoints, physical environments, representations of physical environments, views of three-dimensional environments, immersion levels, visual effects, and/or animations described above with reference to method 20000 optionally have one or more of the features of gestures, gaze inputs, physical objects, user interface objects, controls, movements, criteria, three-dimensional environments, display generating components, surfaces, representations of physical objects, virtual objects, audio output patterns, reference frames, viewpoints, physical environments, representations of physical environments, views of three-dimensional environments, immersion levels, visual effects, and/or animations described with reference to other methods described herein (e.g., methods 13000, 14000, 15000, 16000, 17000, and 18000). For the sake of brevity, these details are not repeated here.

In some embodiments, aspects/operations of methods 13000, 14000, 15000, 16000, 17000, 18000, and 20000 can be interchanged, substituted, and/or added between the methods. For example, the method of displaying a main menu user interface in a three-dimensional environment as described with reference to method 13000 is optionally used to display the main menu user interface at varying immersion levels in method 17000, or is optionally used to control the display of a shared application and the display of a private application in method 15000. For the sake of brevity, these details are not repeated here.

The foregoing description, for purposes of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and its practical application to thereby enable others skilled in the art to best utilize the invention and various described embodiments with various modifications as are suited to the particular use contemplated.

As described above, one aspect of the present technology is to collect and use data from various sources to improve the XR experience of the user. The present disclosure contemplates that in some instances, such collected data may include personal information data that uniquely identifies or may be used to contact or locate a particular person. Such personal information data may include demographic data, location-based data, telephone numbers, email addresses, tweet IDs, home addresses, data or records related to the user's health or fitness level (e.g., vital sign measurements, medication information, exercise information), date of birth, or any other identification or personal information.

The present disclosure recognizes that the use of such personal information data in the present technology may be used to benefit users. For example, personal information data may be used to improve the XR experience of the user. In addition, the present disclosure contemplates other uses for personal information data that are beneficial to the user. For example, the health and fitness data may be used to provide insight into the general health of the user, or may be used as positive feedback to individuals who use the technology to pursue health goals.

The present disclosure contemplates that entities responsible for the collection, analysis, disclosure, delivery, storage, or other use of such personal information data will adhere to sophisticated privacy policies and/or privacy measures. In particular, such entities should exercise and adhere to the use of privacy policies and measures that are recognized as meeting or exceeding industry or government requirements for maintaining the privacy and security of personal information data. Such policies should be convenient for the user to access and should be updated as the collection and/or use of data changes. Personal information from users should be collected for legitimate and reasonable physical uses and must not be shared or sold outside of these legitimate uses. Further, such collection/sharing should be performed after receiving the user's informed consent. Additionally, such entities should consider taking any necessary steps for protecting and securing access to such personal information data and ensuring that other entities having access to the personal information data adhere to the privacy policies and regulations of other entities. Moreover, such entities may subject themselves to third party evaluations to prove compliance with widely accepted privacy policies and privacy practices. In addition, policies and practices should be adapted to the particular type of personal information data collected and/or accessed, and to applicable laws and standards including consideration of particular jurisdictions. For example, in the united states, the collection or acquisition of certain health data may be governed by federal and/or state law, such as the health insurance circulation and liability act (HIPAA), while health data in other countries may be subject to other regulations and policies and should be treated accordingly. Thus, different privacy measures should be claimed for different personal data types in each country.

Regardless of the foregoing, the present disclosure also contemplates embodiments in which a user selectively blocks use or access to personal information data. That is, the present disclosure contemplates that hardware elements and/or software elements may be provided to prevent or block access to such personal information data. For example, with respect to an XR experience, the present technology may be configured to allow a user to choose to "opt-in" or "opt-out" to participate in the collection of personal information data during or at any time after registration with a service. As another example, the user may choose not to provide data for service customization. For another example, the user may choose to limit the length of time that data is maintained or to prohibit development of the customized service altogether. In addition to providing the "opt-in" and "opt-out" options, the present disclosure contemplates providing notifications related to accessing or using personal information. For example, the user may be notified that his personal information data will be accessed when the application is downloaded, and then be reminded again just before the personal information data is accessed by the application.

Furthermore, it is intended that personal information data should be managed and processed in a manner that minimizes the risk of inadvertent or unauthorized access or use. Once the data is no longer needed, risk can be minimized by limiting the collection and deletion of data. Further, and when applicable, including in certain health-related applications, data de-identification may be used to protect the privacy of the user. De-identification may be facilitated by removing specific identifiers (e.g., date of birth), controlling the amount or specificity of stored data (e.g., collecting location data at a city level instead of at an address level), controlling how data is stored (e.g., aggregating data among users), and/or other methods, as appropriate.

Thus, while the present disclosure broadly covers the use of personal information data to implement one or more of the various disclosed embodiments, the present disclosure also contemplates that the various embodiments may be implemented without the need to access such personal information data. That is, various embodiments of the present technology do not fail to function properly due to the lack of all or a portion of such personal information data. For example, an XR experience may be generated by inferring preferences based on non-personal information data or absolute minimum metrics of personal information, such as content requested by a device associated with the user, other non-personal information available to the service, or publicly available information.

Claims

1. A method, comprising:

At a device comprising or in communication with one or more display generating components and one or more input devices:

while displaying an application user interface via the one or more display generating components, detecting a first input to an input device of the one or more input devices, the input device being disposed on a housing of the device including the one or more display generating components;

In response to detecting the first input to the input device disposed on the housing of the device:

replacing display of at least a portion of the application user interface by displaying a main menu user interface via the one or more display generation components; and

while displaying the main menu user interface via the one or more display generating components, detecting a second input to the input device disposed on the housing of the device;

In response to detecting the second input to the input device disposed on the housing of the device:

The main menu user interface is canceled.

2. A method according to claim 1, wherein the device is a head-mounted device including the input device and the one or more display generating components, and the method includes: generating a user interface, which is visible to the user when the head-mounted device is positioned on the user's head so as to cover the user's eyes.

3. A method according to any one of claims 1 to 2, wherein the main menu user interface is presented substantially in a central portion of a field of view of a user of the device.

4. The method of any one of claims 1 to 3, wherein the input device is a hardware button or a solid-state button.

5 . The method according to claim 4 , further comprising: detecting a rotation input to the hardware button; and in response to detecting the rotation input, performing a second operation different from displaying or canceling the main menu user interface.

6. The method according to any one of claims 1 to 5, comprising: in response to detecting the first input to the input device, canceling the application user interface before displaying the main menu user interface or while displaying the main menu user interface.

7. The method according to claim 5, further comprising:

Prior to detecting the first input to the input device of the one or more input devices, generating and displaying a first user interface object associated with the application user interface; and

In response to detecting the first input to the input device: maintaining display of the first user interface object while canceling the application user interface.

8. The method according to claim 7 further includes: before detecting the first input, generating and displaying the first user interface object associated with the application user interface by extracting the first user interface object from the application user interface based on a third input pointing to the application user interface.

9. The method of any one of claims 7 to 8, further comprising: in response to detecting the second input, canceling both the first user interface object and the main menu user interface.

10. According to the method described in any one of claims 7 to 9, the method also includes: when the main menu user interface and the first user interface object are displayed via the one or more display generating components, detecting a fourth input pointing to a representation of a second application displayed on the main menu user interface, and in response to detecting the fourth input, displaying an application user interface of the second application while displaying the first user interface object.

11. The method according to claim 10, further comprising:

detecting a fifth input moving the first user interface object to the application user interface of the second application; and

In response to detecting the fifth input, an operation is performed in the second application based on the first user interface object.

12. The method according to any one of claims 1 to 11, wherein canceling the main menu user interface comprises: replacing the display of the main menu user interface with a presentation of a transparent portion of the physical environment of the device via the one or more display generation components.

13. The method according to any one of claims 1 to 12, wherein canceling the main menu user interface comprises: stopping displaying the virtual environment in which the main menu user interface is displayed.

14. The method of claim 13, further comprising: detecting a sixth input on a representation of a first virtual environment displayed in the main menu user interface; and in response to detecting the sixth input on the representation of the first virtual environment displayed in the main menu user interface: replacing any currently displayed virtual environment with the first virtual environment.

15. The method according to any one of claims 1 to 13, further comprising:

displaying, in the main menu user interface, representations of software applications executable on the device;

detecting a seventh input directed to a respective representation of a software application among the representations of software applications executable on the device displayed in the main menu user interface; and

In response to detecting the seventh input directed to the corresponding representation of the software application: displaying an application user interface of the software application.

16. The method according to any one of claims 1 to 13, the method further comprising: displaying a first representation of a first person and a second representation of a second person in the main menu user interface, the first representation and the second representation being used to initiate communication with the first person and the second person, respectively;

detecting an eighth input directed toward the first representation of the first person; and

In response to detecting the eighth input directed toward the first representation of the first individual: displaying a communication user interface for initiating a communication session with the first individual.

17. The method according to any one of claims 1 to 13, further comprising:

detecting a ninth input directed to a representation of a collection displayed in the main menu user interface; and

In response to detecting the ninth input directed to the representation of the collection:

Representations of one or more virtual three-dimensional environments or one or more augmented reality environments are displayed.

18. According to any one of claims 1 to 17, the method further includes: when the main menu user interface is displayed, detecting a tenth input; and in response to detecting the tenth input: scrolling through the main menu user interface based on the tenth input so that the first content in at least a portion of the main menu user interface is replaced by the second content.

19. The method according to any one of claims 1 to 18, further comprising:

While displaying the main menu user interface having the first section, detecting an eleventh input; and

In response to detecting the eleventh input: displaying a second section of the main menu user interface based on the eleventh input, the first section being different from the second section.

20. The method according to any one of claims 1 to 19, further comprising:

while displaying the first section of the main menu user interface, detecting a twelfth input to the input device disposed on the housing of the device, and in response to detecting the twelfth input to the input device disposed on the housing of the device: canceling the main menu user interface; and

Detecting a thirteenth input to the input device disposed on the housing of the device, and in response to detecting the thirteenth input to the input device disposed on the housing of the device: displaying the first section of the main menu user interface based on the thirteenth input.

21. The method according to claim 20, further comprising:

Based on determining that a time difference between detecting the twelfth input and detecting the thirteenth input is within a time threshold, displaying the first section of the main menu user interface based on the thirteenth input, and

Based on determining that the time difference exceeds the time threshold, the display of the main menu user interface is reset to a predetermined section.

22. The method according to any one of claims 1 to 21, wherein:

Displaying the application user interface via the one or more display generation components includes: displaying a first application user interface of a media content playback application, and

The method comprises:

When playing media content using the media content playing application and displaying the first application user interface of the media content playing application, detecting the first input to the input device; and

In response to detecting the first input to the input device:

The main menu user interface is displayed via the one or more display generation components, and the display of the first application user interface of the media content playback application is replaced with a second application user interface of the media content playback application, wherein the second application user interface of the media content playback application is smaller in size than the first application user interface of the media content playback application.

23. The method of claim 22, wherein:

Replacing the display of the first application user interface of the media content playback application with the second application user interface of the media content playback application includes: displaying a media player; and

The second application user interface includes one or more of: a representation of media content playing on the media content playing application; and playback controls for the media content playing application.

24. The method according to any one of claims 22 to 23, further comprising:

In response to detecting the second input to the input device while the main menu user interface is displayed, the main menu user interface is cancelled and the second application user interface of the media content playback application continues to be displayed.

25. The method according to any one of claims 1 to 24, further comprising:

Detecting inputs to a first number of input devices disposed on the housing of the device within a first time period, and displaying an application management user interface in response to detecting inputs to the first number of input devices disposed on the housing of the device within the first time period.

26. The method according to any one of claims 1 to 25, further comprising:

while displaying a system user interface via the one or more display generation components, detecting a corresponding input to the input device disposed on the housing of the device, the corresponding input being the same type of input as the first input to the input device, and

In response to detecting the corresponding input to the input device disposed on the housing of the device while the system user interface is displayed:

Display of at least a portion of the system user interface is replaced by displaying the main menu user interface via the one or more display generation components.

27. The method according to any one of claims 1 to 26, further comprising:

After canceling the main menu user interface and while the main menu user interface is not displayed, detecting a fourteenth input to the input device disposed on the housing of the device;

In response to detecting the fourteenth input to the input device disposed on the housing of the device:

The main menu user interface is redisplayed via the one or more display generation components.

28. A computer-readable storage medium storing one or more programs configured to be executed by one or more processors of a computer system communicating with one or more display generating components and one or more input devices, the one or more programs comprising instructions for executing the method according to any one of claims 1 to 27.

29. A computer system in communication with one or more display generation components and one or more input devices, the computer system comprising:

one or more processors; and

A memory storing one or more programs configured to be executed by the one or more processors, the one or more programs comprising instructions for executing the method according to any one of claims 1 to 27.

30. A computer system in communication with one or more display generation components and one or more input devices, the computer system comprising:

Components for carrying out the method according to any one of claims 1 to 27.

31. A computer-readable storage medium storing one or more programs configured to be executed by one or more processors of a computer system in communication with one or more display generation components and one or more input devices, the one or more programs comprising instructions for:

In response to detecting the second input to the input device disposed on the housing of the device: canceling the main menu user interface.

32. A computer system in communication with one or more display generation components and one or more input devices, the computer system comprising:

one or more processors; and

A memory storing one or more programs configured to be executed by the one or more processors, the one or more programs including instructions for:

The main menu user interface is canceled.

33. A computer system in communication with one or more display generation components and one or more input devices, the computer system comprising:

means for detecting a first input to an input device of the one or more input devices, the input device being disposed on a housing of the device including the one or more display generating components, enabled when an application user interface is displayed via the one or more display generating components;

A component enabled in response to detecting the first input to the input device disposed on the housing of the device, the component comprising:

means for replacing display of at least a portion of the application user interface by displaying a main menu user interface via the one or more display generation components; and

means for detecting a second input to the input device disposed on the housing of the device enabled when the main menu user interface is displayed via the one or more display generating components;

Means for dismissing the main menu user interface is enabled in response to detecting the second input to the input device disposed on the housing of the device.

34. A method comprising:

At a computer system including or in communication with a display generation component and one or more input devices:

While displaying an application user interface via the display generation component, detecting a first input to an input device of the one or more input devices; and

In response to detecting the first input to the input device:

Based on determining that the application user interface is in the first display mode, displaying the application user interface in a second display mode via the display generation component, wherein the first display mode includes an immersive mode that displays only the content of the application user interface, and the second display mode includes a non-immersive mode that displays the corresponding content of the application user interface and other content at the same time; and

Based on determining that the application user interface is in the second display mode, display of at least a portion of the application user interface is replaced by displaying a main menu user interface via the display generation component.

35. The method of claim 34, further comprising:

While displaying the main menu user interface via the display generation component, detecting a second input to the input device; and

In response to detecting the second input to the input device, the main menu user interface is cancelled.

36. A method according to any one of claims 34 or 35, wherein displaying the application user interface in the non-immersive mode includes: simultaneously displaying the virtual environment and the application user interface, and in response to detecting the first input to the input device when the application user interface is displayed in the non-immersive mode, continuing to display at least a portion of the virtual environment.

37. The method of any one of claims 35 or 36, further comprising continuing to display at least the portion of the virtual environment while displaying the main menu user interface.

38. The method according to any one of claims 36 or 37, further comprising:

displaying representations of two or more virtual environments in the main menu user interface; and

In response to detecting a selection of a first virtual environment of the two or more virtual environments: replacing at least a corresponding portion of the virtual environment with the first virtual environment.

39. The method according to any one of claims 34 to 38, further comprising:

displaying in the main menu user interface representations of software applications executable on the computer system;

detecting a third input directed to a corresponding representation of a software application among the representations of software applications executable on the computer system displayed in the main menu user interface; and

In response to detecting the third input directed to the corresponding representation of the software application: displaying an application user interface of the software application.

40. The method according to any one of claims 34 to 38, further comprising:

displaying, in the main menu user interface, a first representation of a first person and a second representation of a second person, the first representation and the second representation being used to initiate communication with the first person and the second person, respectively;

detecting a fourth input directed toward the first representation of the first person; and

In response to detecting the fourth input directed toward the first representation of the first individual: displaying a communication user interface for initiating a communication session with the first individual.

41. The method according to any one of claims 34 to 38, further comprising:

displaying representations of one or more virtual three-dimensional environments or one or more extended reality environments in the main menu user interface;

detecting a fifth input directed toward a corresponding one of the representations of the one or more virtual three-dimensional environments or the one or more extended reality environments; and

In response to detecting the fifth input directed toward the respective one of the representations of one or more virtual three-dimensional environments or one or more extended reality environments:

Any currently displayed virtual environment is replaced with the virtual three-dimensional environment or the augmented reality environment associated with the corresponding representation.

42. A computer-readable storage medium storing one or more programs configured to be executed by one or more processors of a computer system in communication with a display generation component and one or more input devices, the one or more programs comprising instructions for executing a method according to any one of claims 34 to 41.

43. A computer system in communication with a display generation component and one or more input devices, the computer system comprising:

one or more processors; and

A memory storing one or more programs configured to be executed by the one or more processors, the one or more programs comprising instructions for executing the method according to any one of claims 34 to 41.

44. A computer system in communication with a display generation component and one or more input devices, the computer system comprising:

Means for carrying out the method according to any one of claims 34 to 41.

45. A computer-readable storage medium storing one or more programs configured to be executed by one or more processors of a computer system in communication with a display generation component and one or more input devices, the one or more programs comprising instructions for:

In response to detecting the first input to the input device:

46. A computer system in communication with a display generation component and one or more input devices, the computer system comprising:

one or more processors; and

In response to detecting the first input to the input device:

47. A computer system in communication with a display generation component and one or more input devices, the computer system comprising:

means for detecting a first input to an input device of the one or more input devices enabled when an application user interface is displayed via the display generation component; and

In response to detecting the first input to the input device, means for:

48. A method comprising:

while displaying an application user interface of an application via the display generation component, detecting a first input to an input device of the one or more input devices;

In response to detecting the first input to the input device:

displaying a main menu user interface via the display generation component;

Based on determining that the application is currently being shared in a content sharing session,

wherein the content of the application is simultaneously visible to a plurality of participants in the content sharing session, and display of at least a portion of the application user interface is maintained while the main menu user interface is displayed; and

According to determining that the application is not shared in the content sharing session, the display of the application user interface is stopped.

49. The method of claim 48, further comprising: sharing the application currently in the content sharing session with the plurality of participants in a real-time communication session.

50. The method of claim 49, wherein the application user interface of the application currently being shared in the content sharing session or elements or corresponding portions of the application user interface of the application currently being shared in the content sharing session have a shared spatial relationship, and wherein one or more user interface objects visible to the multiple participants in the content sharing session have a consistent spatial relationship from different viewpoints of the multiple participants in the content sharing session.

51. The method of claim 50, wherein the shared spatial relationship is such that:

a spatial relationship between a first user interface object that represents the corresponding content to a first participant and a viewpoint of the first participant from the perspective of the first participant is consistent with a spatial relationship between a second user interface object that represents the corresponding content to a second participant and a representation of the first participant from the perspective of the second participant; and

The spatial relationship between the second user interface object that represents the corresponding content to the second participant and the viewpoint of the second participant from the perspective of the second participant is consistent with the spatial relationship between the first user interface object that represents the corresponding content to the first participant and the representation of the second participant from the perspective of the first participant.

52. The method of claim 51, further comprising:

detecting input by the first of the plurality of participants to move the application user interface of the application currently being shared in the content sharing session; and

In response to detecting the input of moving the application user interface by the first participant, moving the application user interface of the application currently shared in the content sharing session or the element or the corresponding portion of the application user interface of the application currently shared in the content sharing session for both the first participant and the second participant among the multiple participants.

53. The method of any one of claims 48 to 52, further comprising: displaying the main menu user interface in front of the application user interface of the application.

54. The method according to any one of claims 48 to 53, the method further comprising: simultaneously displaying application user interfaces of two or more applications.

55. The method according to claim 54, the method further comprising: in response to the first input: stopping displaying the corresponding application user interfaces of the two or more applications while continuing to display another application user interface of the two or more applications.

56. The method of claim 54, further comprising: in response to the first input: ceasing to display a first plurality of applications among the two or more applications while continuing to display at least one application among the two or more applications.

57. The method of claim 54, further comprising: in response to the first input: maintaining display of a second plurality of applications among the two or more applications while ceasing to display at least one application among the two or more applications.

58. The method according to any one of claims 48 to 57, further comprising:

detecting a second input while both the main menu user interface and at least the portion of the application user interface of the application currently being shared in the content sharing session are displayed; and

In response to detecting the second input:

Stopping the display of the main menu user interface; and

The portion of the application user interface of the application currently being shared in the content sharing session is maintained for display while the main menu user interface is not displayed.

59. The method of claim 58, further comprising: simultaneously displaying, via the display generation component, the application currently being shared in the content sharing session and a pass-through portion of the physical environment of the computer system.

60. The method according to any one of claims 48 to 57, further comprising:

while the main menu user interface is displayed, detecting movement of the application user interface by the second participant of the plurality of participants; and

In response to detecting the movement of the application user interface by the second participant:

The application user interface is moved for the plurality of participants including the first participant and the second participant based on the movement.

61. The method of any one of claims 48 to 60, wherein the first input to the input device comprises a press input on a hardware button or solid-state button.

62. A computer-readable storage medium storing one or more programs configured to be executed by one or more processors of a computer system in communication with a display generation component and one or more input devices, the one or more programs comprising instructions for executing a method according to any one of claims 48 to 61.

63. A computer system in communication with a display generation component and one or more input devices, the computer system comprising:

one or more processors; and

A memory storing one or more programs configured to be executed by the one or more processors, the one or more programs comprising instructions for executing the method according to any one of claims 48 to 61.

64. A computer system in communication with a display generation component and one or more input devices, the computer system comprising:

Means for carrying out the method according to any one of claims 48 to 61.

65. A computer-readable storage medium storing one or more programs configured to be executed by one or more processors of a computer system in communication with a display generation component and one or more input devices, the one or more programs comprising instructions for:

In response to detecting the first input to the input device:

displaying a main menu user interface via the display generation component;

Based on determining that the application is currently being shared in a content sharing session, wherein content of the application is simultaneously visible to a plurality of participants in the content sharing session, maintaining display of at least a portion of the application user interface while displaying the main menu user interface; and

66. A computer system in communication with a display generation component and one or more input devices, the computer system comprising:

one or more processors; and

In response to detecting the first input to the input device:

displaying a main menu user interface via the display generation component;

67. A computer system in communication with a display generation component and one or more input devices, the computer system comprising:

means for detecting a first input to an input device of the one or more input devices enabled when an application user interface of an application is displayed via the display generation component; and

In response to detecting the first input to the input device, means for:

displaying a main menu user interface via the display generation component;

68. A method comprising:

When the computer system is in operation, detecting, via an input device of the one or more input devices, a first input of a first type of input, wherein the first type of input is determined based on a position and/or movement of a first biometric feature of a user:

in response to detecting the first input via the input device, performing a first operation based on the first input, wherein the operation is determined at least in part by first input registration information from a previous input registration process for the first type of input;

After performing the first operation according to the first input, detecting a second input of a second type of input via an input device of the one or more input devices; and

In response to detecting the second input, an input registration process for the first type of input is initiated.

69. A method according to claim 68, wherein the first type of input includes the user's gaze, the first biometric characteristic includes the positioning and/or movement of the user's eyes, and the input device of the first input via which the first type of input is detected includes a camera.

70. A method according to claim 68, wherein the first type of input includes movement of the user's hand, the first biometric feature includes the positioning and/or movement of one or more parts of the user's hand, and the input device of the first input via which the first type of input is detected includes a camera.

71. A method according to any one of claims 68 to 70, wherein initiating the input registration process for the first type of input includes: presenting instructions for input registration for the first type of input to the user, and collecting second input registration information for the first type of input based on user actions performed according to the presented instructions.

72. The method of claim 71, further comprising:

detecting, via the input device of the one or more input devices, a third input of the first type of input;

In response to detecting the third input via the input device, a second operation is performed based on the third input, wherein the second operation is determined at least in part by the second input registration information for the first type of input.

73. A method according to any one of claims 68 to 72, wherein the input device comprises a button.

74. The method of claim 73, wherein the button is further configured to turn the computer system on or off, and the method comprises:

detecting a fourth input on the button when the computer system is not in operation; and

In response to detecting the fourth input on the button: turning on the computer system.

75. The method according to any one of claims 73 or 74, wherein the method comprises:

When the computer system is in sleep mode, detecting a fifth input on the button; and

In response to detecting the fifth input on the button, waking the computer system from the sleep mode.

76. A method according to any one of claims 73 to 75, wherein the method comprises:

When the computer system is in operation, detecting a sixth input on the button; and

In response to detecting the sixth input on the button:

Media rendered visible via the display generation component is captured.

77. The method of any one of claims 73 to 76, wherein the method comprises:

detecting a seventh input on the button in conjunction with detecting an eighth input on the second input device; and

In response to detecting the seventh input to the button in conjunction with the eighth input on the second input device, performing one or more system operations.

78. The method of claim 77, wherein the one or more system operations are elements selected from the group consisting of: taking a screenshot, restarting the computer system, or resetting the computer system.

79. A computer-readable storage medium storing one or more programs configured to be executed by one or more processors of a computer system in communication with a display generation component and one or more input devices, the one or more programs comprising instructions for executing a method according to any one of claims 68 to 78.

80. A computer system in communication with a display generation component and one or more input devices, the computer system comprising:

one or more processors; and

A memory storing one or more programs configured to be executed by the one or more processors, the one or more programs comprising instructions for executing the method according to any one of claims 68 to 78.

81. A computer system in communication with a display generation component and one or more input devices, the computer system comprising:

Means for carrying out the method according to any one of claims 68 to 78.

82. A computer-readable storage medium storing one or more programs configured to be executed by one or more processors of a computer system in communication with a display generation component and one or more input devices, the one or more programs comprising instructions for:

83. A computer system in communication with a display generation component and one or more input devices, the computer system comprising:

one or more processors; and

84. A computer system in communication with a display generation component and one or more input devices, the computer system comprising:

A component for detecting a first input of a first type of input via an input device of the one or more input devices enabled when the computer system is in operation, wherein the first type of input is determined based on a position and/or movement of a first biometric feature of a user, the component comprising:

means enabled in response to detecting the first input via the input device for performing a first operation based on the first input, wherein the operation is determined at least in part by first input registration information from a previous input registration process for input of the first type;

means for detecting a second input of a second type of input via an input device of the one or more input devices, enabled after performing the first operation based on the first input; and

Means for initiating an input registration process for input of the first type enabled in response to detecting the second input.

85. A method comprising:

detecting a first input on a rotatable input mechanism of an input device among the one or more input devices;

In response to detecting the first input on the rotatable input mechanism:

According to determining that the first input is an input of a first type:

changing an immersion level associated with a display of an extended reality (XR) environment generated by the display generation component to a first immersion level,

In the first immersion level, the display of the XR environment includes both virtual content from the application and a pass-through portion of the physical environment of the computer system; and

According to determining that the first input is an input of the second type:

An operation other than changing the immersion level associated with display of the XR environment is performed.

86. The method of claim 85, further comprising, in response to a second input of the first type of input, changing the immersion level associated with the display of the XR environment generated by the display generation component to a second immersion level, wherein the display of the XR environment simultaneously includes virtual content that is different from, or displayed at a different level of fidelity than, the virtual content displayed when the first immersion level is associated with the display of the XR environment.

87. The method of any one of claims 85 to 86, wherein the second type of input comprises a press input, the method further comprising:

detecting a third input provided to the rotatable input mechanism; and

In response to the rotatable input mechanism detecting the third input as a press input, performing a corresponding operation selected from the group consisting of: canceling the active application; canceling the virtual object displayed via the display generation component; displaying the application manager user interface; enabling accessibility mode; and redisplaying multiple previously displayed user interface elements in the XR environment.

88. The method of any one of claims 85 to 87, wherein changing the immersion level associated with display of the XR environment is based on detecting a rotational input to the rotatable input mechanism.

89. The method of claim 88, wherein changing the immersion level associated with display of the XR environment based on detecting the rotational input comprises:

increasing the immersion level based on determining that the first input is a rotational input in a first direction; and

Based on determining that the first input is a rotational input in a second direction different from the first direction, the immersion level is reduced.

90. The method of any one of claims 85 to 88, wherein the first type of input comprises a rotational input of the rotatable input mechanism, and the second type of input comprises a pressing input of the rotatable input mechanism.

91. The method of claim 90, the method comprising: in response to detecting the first input:

performing a first operation based on determining that the first input is the second type of input and includes a first number of press inputs, and

Based on determining that the first input is the second type of input and includes a second number of press inputs different from the first number, a second operation different from the first operation is performed.

92. The method of claim 91, comprising:

detecting the first number of press inputs directed to the rotatable input mechanism; and

In response to detecting the first number of press inputs directed to the rotatable input mechanism, the active application is cancelled by causing the active application to run in the background and/or displaying a main menu user interface via the display generation component.

93. The method of claim 92, comprising:

detecting the second number of press inputs directed to the rotatable input mechanism; and

In response to detecting the second number of press inputs directed to the rotatable input mechanism, an application manager user interface is displayed.

94. The method according to any one of claims 92 to 93, comprising:

detecting a third number of press inputs directed to the rotatable input mechanism; and

In response to detecting the third number of press inputs directed to the rotatable input mechanism, an accessibility mode operation is performed or enabled.

95. The method of any one of claims 92 to 94, comprising:

detecting a fourth number of press inputs directed to the rotatable input mechanism; and

In response to detecting the fourth number of press inputs directed toward the rotatable input mechanism, the virtual object is dismissed by displaying a corresponding pass-through portion of the physical environment of the computer system.

96. The method of claim 85, the method comprising: in response to detecting the first input:

In response to determining that the first input is an input of the second type and has a duration that satisfies a first criterion, performing a first operation, and

Based on determining that the first input is an input of the second type and has a duration that satisfies a second criterion different from the first criterion, a second operation different from the first operation is performed.

97. The method of any one of claims 85 to 91 or 93 to 96, the method comprising: displaying a main menu user interface in the XR environment based on the determination that the first input is the second type of input.

98. The method of any one of claims 85 to 97, wherein the method comprises:

detecting a fourth input of the second type of input in conjunction with detecting a fifth input on the second input device; and

In response to detecting the fourth input of the second type of input in conjunction with the fifth input on the second input device, performing one or more third operations.

99. The method of claim 98, wherein a corresponding third operation of the one or more third operations is selected from the group consisting of: taking a screenshot, powering off the computer system, restarting the computer system, and entering a hardware reset mode of the computer system.

100. A computer-readable storage medium storing one or more programs configured to be executed by one or more processors of a computer system that communicates with a display generation component and one or more input devices, the one or more programs comprising instructions for executing a method according to any one of claims 85 to 99.

101. A computer system in communication with a display generation component and one or more input devices, the computer system comprising:

one or more processors; and

A memory storing one or more programs configured to be executed by the one or more processors, the one or more programs comprising instructions for executing the method according to any one of claims 85 to 99.

102. A computer system in communication with a display generation component and one or more input devices, the computer system comprising:

Means for performing the method according to any one of claims 85 to 99.

103. A computer-readable storage medium storing one or more programs configured to be executed by one or more processors of a computer system in communication with a display generation component and one or more input devices, the one or more programs comprising instructions for:

In response to detecting the first input on the rotatable input mechanism:

According to determining that the first input is an input of a first type:

changing an immersion level associated with a display of an extended reality (XR) environment generated by the display generation component to a first immersion level in which the display of the XR environment includes both virtual content from an application and a pass-through portion of a physical environment of the computer system; and

According to determining that the first input is an input of the second type:

104. A computer system in communication with a display generation component and one or more input devices, the computer system comprising:

one or more processors; and

In response to detecting the first input on the rotatable input mechanism:

According to determining that the first input is an input of a first type:

According to determining that the first input is an input of the second type:

105. A computer system in communication with a display generation component and one or more input devices, the computer system comprising:

means for detecting a first input on a rotatable input mechanism of an input device of the one or more input devices;

means enabled in response to detecting the first input on the rotatable input mechanism for:

According to determining that the first input is an input of a first type:

According to determining that the first input is an input of the second type:

106. A method comprising:

At a wearable device comprising or in communication with a display generating component and one or more input devices:

When the corresponding session is active in the corresponding application and when the wearable device is being worn, detecting a first signal indicating that the wearable device has been taken off:

In response to detecting the first signal:

causing the corresponding session of the corresponding application to become inactive; and

detecting, when the corresponding application is inactive, a second signal indicating that the wearable device is being worn;

In response to detecting the second signal:

Determine if the corresponding criteria are met:

Resuming the corresponding session of the corresponding application;

Based on the determination that the corresponding criteria are not met:

Resuming the corresponding session of the corresponding application is abandoned, wherein the corresponding criteria include criteria that are satisfied when a current user of the wearable device is determined to be an authorized user of the wearable device.

107. The method of claim 106, wherein the corresponding criteria comprises a type of the corresponding session satisfying a predefined criteria relative to a set of predefined session types, and the method comprises:

In response to determining that the respective criterion is satisfied because the respective session of the respective application is a session of the first type: resuming the respective session of the respective application; and

Based on determining that the corresponding criterion is not satisfied because the corresponding session of the corresponding application is a session of the second type: abandoning restoring the corresponding session of the corresponding application.

108. The method of claim 107, wherein the respective criteria are satisfied when the respective session of the respective application is configured to deliver media content to the authorized user of the wearable device; and

The corresponding criterion is met when the corresponding session of the corresponding application is configured to allow real-time audio data or real-time video data of the participants to be generated by the participants of the corresponding session and the corresponding session is configured to provide information about the positioning of the participants in the three-dimensional environment.

109. The method of claim 108, wherein the corresponding criteria are not satisfied when the corresponding application includes a record of content generated during the corresponding session, thereby abandoning recovery of the corresponding session of the corresponding application.

110. The method of any one of claims 106 to 109, wherein:

When the time between detecting the first signal and detecting the second signal is less than a predetermined threshold, the corresponding criterion is satisfied, thereby resuming the corresponding session of the corresponding application; and

When the time between detecting the first signal and detecting the second signal is equal to or greater than the predetermined threshold, the corresponding criterion is not satisfied, thereby giving up resuming the corresponding session of the corresponding application.

111. The method of any one of claims 106, 107, 109 or 110, wherein:

Causing the corresponding session of the corresponding application to become inactive includes pausing playback of media content from the corresponding session of the corresponding application.

112. The method of any one of claims 106 to 111, wherein:

Causing the respective session of the respective application to become inactive includes at least one of: muting audio data associated with the respective session of the respective application; or pausing video recording of content generated in the respective session of the respective application.

113. The method of any one of claims 106 to 112, wherein:

Causing the respective session of the respective application to become inactive includes pausing mirroring of output from the display generation component of the wearable device on a different device.

114. The method of claim 113, further comprising:

In conjunction with pausing the mirroring of the output from the display generation component of the wearable device on the different device, an indication to pause the mirroring of the output from the display generation component is displayed via the display generation component.

115. The method according to any one of claims 106 to 114, further comprising:

After the first signal has been detected, a context of the wearable device is monitored using one or more sensors included with or in communication with the wearable device.

116. The method of claim 115, further comprising:

The one or more sensors are used to detect characteristics of a physical environment of the wearable device to monitor the context of the wearable device.

117. The method of claim 115, further comprising:

The one or more sensors are used to detect biometric features to monitor the context of the wearable device.

118. The method according to any one of claims 115 to 117, further comprising:

Based on determining that a threshold amount of time has elapsed since the first signal was detected and the second signal was not detected:

The wearable device is transitioned to an operational sleep state in which the wearable device reduces the frequency of using the one or more sensors to monitor the context of the wearable device.

119. The method of claim 118, further comprising:

When the wearable device is in the sleep state, detecting an upward displacement of at least a portion of the wearable device; and

In response to detecting the upward displacement of at least the portion of the wearable device, the wearable device is transitioned from an active state to a standby state of operation.

120. The method of claim 118, further comprising:

When the wearable device is in the sleep state, detecting a first input to the one or more input devices; and

In response to detecting the first input, the wearable device is transitioned from the sleep state to a standby state of operation.

121. A computer-readable storage medium storing one or more programs configured to be executed by one or more processors of a wearable device that communicates with a display generation component and one or more input devices, the one or more programs comprising instructions for executing a method according to any one of claims 106 to 120.

122. A wearable device in communication with a display generation component and one or more input devices, the wearable device comprising:

one or more processors; and

A memory storing one or more programs configured to be executed by the one or more processors, the one or more programs comprising instructions for executing the method according to any one of claims 106 to 120.

123. A wearable device in communication with a display generation component and one or more input devices, the wearable device comprising:

Components for performing the method according to any one of claims 106 to 120.

124. A computer-readable storage medium storing one or more programs configured to be executed by one or more processors of a wearable device in communication with a display generation component and one or more input devices, the one or more programs comprising instructions for:

In response to detecting the first signal:

In response to detecting the second signal:

Determine if the corresponding criteria are met:

Resuming the corresponding session of the corresponding application;

Based on the determination that the corresponding criteria are not met:

125. A wearable device in communication with a display generation component and one or more input devices, the wearable device comprising:

one or more processors; and

In response to detecting the first signal:

In response to detecting the second signal:

Determine if the corresponding criteria are met:

Resuming the corresponding session of the corresponding application;

Based on the determination that the corresponding criteria are not met:

126. A wearable device in communication with a display generation component and one or more input devices, the wearable device comprising:

Means for detecting a first signal indicating that the wearable device has been removed enabled when the corresponding session is active in the corresponding application and when the wearable device is being worn:

a component enabled in response to detecting the first signal, the component comprising:

means for causing said corresponding session of said corresponding application to become inactive; and

means for detecting a second signal indicating that the wearable device is being worn, enabled when the corresponding application is inactive;

means for enabling, in response to detecting the second signal, to:

Determine if the corresponding criteria are met:

resuming the corresponding session of the corresponding application; and

Based on the determination that the corresponding criteria are not met:

127. A method comprising:

At a computer system in communication with one or more display generation components and one or more input devices:

When the configuration of the computer system is being executed, detecting a first input directed to a first input device of the one or more input devices, wherein the computer system includes one or more sensors to detect input, the input including one or more of an air gesture and a gaze input; and

In response to detecting the first input to the first input device, a menu including a plurality of selectable options for configuring one or more interaction models is displayed.

128. The method of claim 127, wherein the first input device is a hardware input device that is a hardware button.

129. The method of claim 127, wherein the first input device is a hardware input device comprising a rotatable input mechanism.

130. A method according to any one of claims 127 to 129, wherein the one or more input devices comprises a second input device different from the first input device, and the method comprises:

detecting a second input to the second input device;

In response to detecting the second input to the second input device, activating a first accessibility mode in which a verbal description of the virtual object is provided in response to the user input.

131. A method according to any one of claims 127 to 130, wherein the first input comprises two or more presses on the first input device.

132. The method of any one of claims 127 to 131, comprising:

detecting a third input directed to a first hardware input device of the one or more input devices; and

In response to detecting the third input directed to the first hardware input device, input focus is positioned on a first selectable option of the plurality of selectable options.

133. The method of claim 132, comprising:

detecting a fourth input directed to a second hardware input device of the one or more input devices; and

In response to detecting the fourth input directed to the second hardware input device, the first of the plurality of selectable options is selected.

134. The method of any one of claims 127 to 131, comprising:

detecting a third input directed to a corresponding hardware input device of the one or more input devices; and

In response to detecting the third input directed to the hardware input device:

Based on determining that the third input satisfies the first input criteria, positioning input focus on a first selectable option among the plurality of selectable options; and

Based on determining that the third input satisfies second input criteria, a second selectable option from the plurality of selectable options is selected.

135. The method of claim 134, wherein positioning the input focus on the first of the plurality of selectable options is performed in response to detecting a rotational input on the hardware input device.

136. A method according to any one of claims 134 to 135, wherein selecting the second of the plurality of selectable options is performed in response to detecting a press input on the hardware input device.

137. The method of any one of claims 132 to 136, comprising:

In conjunction with positioning the input focus on the first one of the plurality of selectable options, an audio description of the first one of the plurality of selectable options is output.

138. The method of any one of claims 127 to 137, comprising:

While said configuration of said computer system is being performed:

Displays controls for activating the stay control mode;

detecting a gaze input directed toward the control for activating the dwell control mode; and

In response to detecting the gaze input directed toward the control for activating the dwell control mode, the dwell control mode is automatically activated.

139. The method of any one of claims 127 to 138, comprising:

After the configuration of the computer system is complete, detecting subsequent input directed to the first input device; and

In response to detecting the subsequent input to the first input device after the configuration of the computer system is completed, displaying the menu including the plurality of selectable options for configuring the one or more interaction models is foregone.

140. The method of claim 139, comprising:

In response to detecting the subsequent input to the first input device after the configuration of the computer system is complete, performing an operation other than displaying the menu including the plurality of selectable options for configuring the one or more interaction models.

141. The method of any one of claims 139 to 140, comprising:

After the configuration of the computer system is completed, detecting a press input to the first input device; and

In response to detecting the press input to the first input device, a corresponding accessibility function is activated.

142. The method of any one of claims 127 to 131, comprising:

detecting a fifth input on the first input device; and

In response to detecting the fifth input on the first input device:

Based on determining that the fifth input is detected before the configuration of the computer system is complete, positioning input focus on a corresponding selectable option among the plurality of selectable options; and

Based on determining that the fifth input is detected after the configuration of the computer system is complete, performing an operation different from positioning the input focus on the corresponding selectable option.

143. The method of any one of claims 127 to 142, comprising:

displaying a first user interface of a first subset of user interfaces for configuring a first interaction model of the one or more interaction models;

detecting one or more user inputs;

In response to detecting the one or more user inputs:

A function of the first interaction model is activated, and a second user interface in the first subset of user interfaces is automatically displayed.

144. The method of any one of claims 127 to 143, wherein:

The plurality of selectable options includes a first set of one or more controls for enabling control of a focus selector using a corresponding portion of a user's body other than an eye of the user;

And the method comprises:

detecting corresponding gaze input; and

In response to detecting the corresponding gaze input:

Based on determining that the focus selector cannot be controlled using the corresponding part of the user's body that is different from the user's eyes, the focus selector is positioned according to the corresponding gaze input, wherein when the focus selector can be controlled using the corresponding part of the user's body that is different from the user's eyes, the computer system does not respond to the corresponding gaze input by positioning the focus selector in response to the corresponding gaze input.

145. A method according to claim 144, wherein the menu including the plurality of selectable options for configuring the one or more interaction models is displayed before performing a calibration process of the user's gaze.

146. The method of any one of claims 127 to 145, wherein:

the plurality of selectable options comprising a second set of one or more controls corresponding to a set of one or more input models, the set of one or more input models enabling control of the device using an alternative input other than an air gesture;

And the method comprises:

detecting mid-air gestures; and

In response to detecting the mid-air gesture:

Based on determining that the computer system is enabled to be controlled using an air gesture, the operation is performed based on the air gesture, wherein when the computer system is enabled to be controlled using the alternative input, the computer system does not respond to the air gesture by performing the operation.

147. The method of claim 146, wherein a first control in the second set of one or more controls corresponds to a control for activating a stay control mode.

148. The method of any one of claims 146 to 147, wherein a second control in the second set of one or more controls corresponds to a control for activating a switch control mode.

149. The method of claim 148, comprising:

detecting an input selecting the control for activating the switch control mode; and

In response to detecting the input selecting the control for activating the switch control mode:

activating the switch control mode; and

A corresponding menu is displayed for configuring a wireless connection with a hardware input device for providing input in the switch control mode.

150. A method according to any one of claims 127 to 149, wherein the menu comprising the plurality of selectable options for configuring the one or more interaction models is displayed before performing a calibration process for the user's hand.

151. The method of any one of claims 127 to 150, comprising:

detecting an input selecting a first option corresponding to a visual accessibility mode among the plurality of selectable options; and

In response to detecting the input selecting the first option corresponding to the vision accessibility mode, activating the vision accessibility mode.

152. The method of any one of claims 127 to 150, comprising:

detecting an input selecting a second option corresponding to the hearing accessibility mode from among the plurality of selectable options; and

In response to detecting the input selecting the second option corresponding to the hearing accessibility mode, activating the hearing accessibility mode.

153. The method of any one of claims 127 to 150, comprising:

detecting an input selecting a third option among the plurality of selectable options corresponding to the display setting; and

In response to detecting the input selecting the third option corresponding to the display setting, activating the display setting.

154. A computer-readable storage medium storing one or more programs configured to be executed by one or more processors of a computer system that communicates with one or more display generating components and one or more input devices, the one or more programs comprising instructions for executing a method according to any one of claims 127 to 153.

155. A computer system in communication with one or more display generation components and one or more input devices, the computer system comprising:

one or more processors; and

A memory storing one or more programs configured to be executed by the one or more processors, the one or more programs comprising instructions for executing the method according to any one of claims 127 to 153.

156. A computer system in communication with one or more display generation components and one or more input devices, the computer system comprising:

Components for performing a method according to any one of claims 127 to 153.

157. A computer-readable storage medium storing one or more programs configured to be executed by one or more processors of a computer system in communication with one or more display generation components and one or more input devices, the one or more programs comprising instructions for:

158. A computer system in communication with one or more display generation components and one or more input devices, the computer system comprising:

one or more processors; and

159. A computer system in communication with one or more display generation components and one or more input devices, the computer system comprising:

means for detecting a first input directed to a first input device of the one or more input devices enabled when the configuration of the computer system is being executed, wherein the computer system includes one or more sensors to detect input, the input including one or more of an air gesture and a gaze input; and

Means for displaying a menu including a plurality of selectable options for configuring one or more interaction models is enabled in response to detecting the first input to the first input device.