DeepMetricEye: Metric Depth Estimation in Periocular VR Imagery
Presented at ISMAR 2023

https://doi.ieeecomputersociety.org/10.1109/ISMAR59233.2023.00058    Read Paper Online





Virtual Reality (VR) technology has advanced rapidly, offering immersive experiences across applications including gaming, medical, education, and training simulations. The recent re-imagination of the 'Digital Universe' concept has illuminated a compelling vision of globally interconnected interactions. Despite the continual enhancement in content quality, the use of VR headsets persists in causing physiological discomfort to users, leading to substantially reduced usage duration. A significant proportion of unsatisfactory experiences by VR users can be attributed to digital eye strain (DES), dry eye, and visual impairment resulting from excessive artificial light stimulation from VR displays, as well as periocular swelling, increased intraocular pressure, and muscle displacement induced by the pressure exerted by the headset's face mask. However, these visual health issues have not yet received attention proportionate to the development of VR technology.

Recent VR headsets, increasingly equipped with eye-oriented monocular cameras, are designed to segment periocular feature maps, annotate the edge of the pupil, and detect gaze direction to enhance content interaction. While these methods offer a preliminary insight into eye activity during VR usage, they are insufficient for establishing connections with medical standards, for instance, the light stimulus calculation protocols and periocular condition medical guidelines, needed for meticulous visual health assessments and advanced user interaction studies. The fundamental issue lies in the inability of current methods to convert the segmented 2D relative feature annotations (such as pupil edge segmentation) into spatial metrics (pupil diameter), essential for strict standards. Proposed solutions, for instance, incorporating stereo cameras and depth cameras for metric size acquisition, present substantial challenges in terms of cost, computational power, battery life, and hardware design of VR headsets.

To convert 2D periocular feature annotations into 3D metric dimensions, we propose a framework that only utilises an eye-oriented monocular camera, present in various VR headsets, to estimate the measurable periocular depth map. This framework, built on a U-Net 3+ deep learning backbone, re-optimised by us, aims to accurately estimate depth maps while maintaining the lightweight processing demands suitable for VR deployment. To alleviate the difficulty in collecting facial data for training, we introduce a Dynamic Periocular Data Generation (DPDG) environment that leverages a small quantity of real facial scan data to generate thousands of synthetic periocular images and corresponding ground truth depth maps using Unreal Engine (UE) MetaHuman.

The main contributions of this study are as follows:

  • We introduce a lightweight depth estimation framework for VR headsets to reconstruct periocular depth maps. The aim is to provide features' metric size for light stimulus standards calculation and periocular condition monitoring.

  • Addressing the challenge of facial data collection, our DPDG environment, based on UE MetaHuman, generates thousands of periocular training images and depth maps from limited facial scans.

  • We evaluate our method's accuracy and usability with two tasks: 1) evaluating global precision of periocular area , and 2) assessing pupil diameter.

  • We have open-sourced the DPDG environment, the code and dataset for the depth estimation model, and all metadata from the experiments.


Dataset Generation Demo Video: