
- Pod
- RGB camera for texture
- a pair of monochrome NIR cameras for stereo
- ➡️create depth maps at 60Hz by incorporating information from overlapping time windows of 5 NIR image pairs

- 3D Face Tracking
- Eye locations: determine stereo viewpoints for rendering
- Mouth position: enable beamforming in audio cpture
- 4 synchronized monochrome cameras detect the face and locate 34 facial landmarks
- Determine the 2D locations of fice features(eyes, mouth and ears) as weighted combinations of nearby landmarks
- For each feature, they used trangulation to obtain its 3D position
- Mitigate the latency
- Extrapolate the 3D positions of the tracked features
- Apply double exponential smoothing
- remove this small noise using a “change band” hysteresis filter
- Compression
- They transmit multiple color images and stereoreconstructed depth maps using traditional video compression
- Both the color and depth streams are encoded using the H.265 codec with YUV420 chroma subsampling.
- reduce encoding and decoding latency by omitting bidirectionally encoded frames.
- and delay their “fusion” until the rendering (Section 4.6) of the left and right eye views in the receiving client
- Transmission
- For each frame, we gather the encoded video packets from all 7 video streams (as well as the tracked face points) into a single data payload, and transmit it usingWebRTC
- Rendering
- On the receiving client, decompress the 3 depth maps and 4 color images
- (1) for each of the 4 color cameras, compute a shadow map using raycasting by finding for each ray the first intersection with a surface fused from the input depth maps,
- (2) for each of the 2 user views (left and right eye), compute an output depth map using the same raycasting algorithm
- (3) for each output depth map point, compute a weighted color blend of the images determined visible by the shadow maps computed in step 1.
Reference)
https://research.google/pubs/pub50903/
Project Starline: A high-fidelity telepresence system – Google Research
We present a real-time bidirectional communication system that lets two people, separated by distance, experience a face-to-face conversation as if they were copresent. It is the first telepresence system that is demonstrably better than 2D videoconferenci
research.google