empowering live perception with mediapipe · 2019. 10. 30. · ming guang yong, google research...

21
Proprietary + Confidential Ming Guang Yong, Google Research [link to presentation] [email protected] mediapipe.dev MediaPipe Empowering Live Perception with

Upload: others

Post on 05-Oct-2020

13 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Empowering Live Perception with MediaPipe · 2019. 10. 30. · Ming Guang Yong, Google Research [link to presentation] mgyong@google.com mediapipe.dev MediaPipe Empowering Live Perception

Proprietary + Confidential

Ming Guang Yong, Google Research [link to presentation]

[email protected]

mediapipe.dev

MediaPipe

Empowering Live Perception with

Page 2: Empowering Live Perception with MediaPipe · 2019. 10. 30. · Ming Guang Yong, Google Research [link to presentation] mgyong@google.com mediapipe.dev MediaPipe Empowering Live Perception

Proprietary + Confidential

Outline

● Introduction

● Example Pipeline for Live Perception

● MediaPipe Toolkit

Page 3: Empowering Live Perception with MediaPipe · 2019. 10. 30. · Ming Guang Yong, Google Research [link to presentation] mgyong@google.com mediapipe.dev MediaPipe Empowering Live Perception

Proprietary + Confidential Proprietary + Confidential

Widely used at Google in research & products to process

and analyze video, audio and sensor data:

● Dataset preparation pipelines for ML training

● ML inference pipelines

● Media processing pipelines

MediaPipe is Google's open source

cross-platform framework for building perception pipelines

Introduction

Page 4: Empowering Live Perception with MediaPipe · 2019. 10. 30. · Ming Guang Yong, Google Research [link to presentation] mgyong@google.com mediapipe.dev MediaPipe Empowering Live Perception

Proprietary + Confidential Proprietary + Confidential

AR:

YouTube, ARCore, Duo

Mobile:

Visual Search, Lens

Cross-platform:

Motion Photos

Server:

Video Previews

Android

Server

Page 5: Empowering Live Perception with MediaPipe · 2019. 10. 30. · Ming Guang Yong, Google Research [link to presentation] mgyong@google.com mediapipe.dev MediaPipe Empowering Live Perception

Proprietary + Confidential Proprietary + Confidential

MediaPipe OSS

● Initial release to GitHub in June,

2019

● 3200+ stars on GitHub to date

● Platforms: Mobile/Desktop

○ Linux (C++ Native APIs)

○ iOS (Obj C APIs)

○ Android (Java APIs)

● ML Acceleration

○ EdgeTPU

○ Mobile GPU

○ Desktop GPU (mesa libraries)

mediapipe.dev (github.com/google/mediapipe)

Page 6: Empowering Live Perception with MediaPipe · 2019. 10. 30. · Ming Guang Yong, Google Research [link to presentation] mgyong@google.com mediapipe.dev MediaPipe Empowering Live Perception

Proprietary + Confidential Proprietary + Confidential

docs.mediapipe.dev viz.mediapipe.dev

Page 7: Empowering Live Perception with MediaPipe · 2019. 10. 30. · Ming Guang Yong, Google Research [link to presentation] mgyong@google.com mediapipe.dev MediaPipe Empowering Live Perception

Proprietary + Confidential Proprietary + Confidential

Mobile Examples

Page 8: Empowering Live Perception with MediaPipe · 2019. 10. 30. · Ming Guang Yong, Google Research [link to presentation] mgyong@google.com mediapipe.dev MediaPipe Empowering Live Perception

Proprietary + Confidential Proprietary + Confidential

Live Perception Example

Page 9: Empowering Live Perception with MediaPipe · 2019. 10. 30. · Ming Guang Yong, Google Research [link to presentation] mgyong@google.com mediapipe.dev MediaPipe Empowering Live Perception

Proprietary + Confidential Proprietary + Confidential

Goal: Hand Tracking

Page 10: Empowering Live Perception with MediaPipe · 2019. 10. 30. · Ming Guang Yong, Google Research [link to presentation] mgyong@google.com mediapipe.dev MediaPipe Empowering Live Perception

Proprietary + Confidential Proprietary + Confidential

ML Model to Localize Hand Landmarks

Model

Image/Video Landmarks

There's more to it

Page 11: Empowering Live Perception with MediaPipe · 2019. 10. 30. · Ming Guang Yong, Google Research [link to presentation] mgyong@google.com mediapipe.dev MediaPipe Empowering Live Perception

Proprietary + Confidential Proprietary + Confidential

Simple Perception Pipeline Video

Video

ImageTransform

IMAGE

TRANSFORMED_IMAGE

ImageToTensor

IMAGE

TENSOR

Inference

TENSORS

TENSORS

TensorToLandmar

ks

TENSORS

LANDMARKS

Renderer RENDERED_IMAGE

LANDMARKS IMAGE

Geometric transformation to resize and rotate etc

For a given ML model, runs ML inference

Renders landmarks onto associated images

Model

Video input, e.g., from live camera viewfinder

Decodes tensors into high-level metadata

Video output, e.g., display

Converts images to tensors

Page 12: Empowering Live Perception with MediaPipe · 2019. 10. 30. · Ming Guang Yong, Google Research [link to presentation] mgyong@google.com mediapipe.dev MediaPipe Empowering Live Perception

Proprietary + Confidential Proprietary + Confidential

Pipeline via MediaPipe Video

Video

ImageTransform

IMAGE

TRANSFORMED_IMAGE

ImageToTensor

IMAGE

TENSOR

Inference

TENSORS

TENSORS

TensorToLandmar

ks

TENSORS

LANDMARKS

Renderer RENDERED_IMAGE

LANDMARKS IMAGE

Model

A MediaPipe Graph represents a perception

pipeline

Each node in the pipeline is a MediaPipe Calculator

Two nodes can be connected by a Stream, which

carries a sequence of Packets with ascending

timestamps

Page 13: Empowering Live Perception with MediaPipe · 2019. 10. 30. · Ming Guang Yong, Google Research [link to presentation] mgyong@google.com mediapipe.dev MediaPipe Empowering Live Perception

Proprietary + Confidential Proprietary + Confidential

Calculator Video

Video

ImageTransform

IMAGE

TRANSFORMED_IMAGE

ImageToTensor

IMAGE

TENSOR

Inference

TENSORS

TENSORS

TensorToLandmar

ks

TENSORS

LANDMARKS

Renderer RENDERED_IMAGE

LANDMARKS IMAGE

Model

● Written in C++

● Declares input/output ports

○ Type of packet payload through the port

○ Ports can be declared as optional

● Implements Open/Process/Close methods, called by

framework

○ Open - Before a full graph run

○ Process - Repeatedly with incoming packet(s)

ready to be processed

○ Close - After a full graph run

Page 14: Empowering Live Perception with MediaPipe · 2019. 10. 30. · Ming Guang Yong, Google Research [link to presentation] mgyong@google.com mediapipe.dev MediaPipe Empowering Live Perception

Proprietary + Confidential Proprietary + Confidential

Built-in Calculators Video

Video

ImageTransform

IMAGE

TRANSFORMED_IMAGE

ImageToTensor

IMAGE

TENSOR

Inference

TENSORS

TENSORS

TensorToLandmar

ks

TENSORS

LANDMARKS

Renderer RENDERED_IMAGE

LANDMARKS IMAGE

Model

Family of image and media

processing calculators

Native integration with TensorFlow and TF Lite

for ML inference

Family of ML post-processing calculators for

common ML tasks, e.g., detection, segmentation

and classification

Family of utility calculators for, e.g., image

annotation, flow control

Page 15: Empowering Live Perception with MediaPipe · 2019. 10. 30. · Ming Guang Yong, Google Research [link to presentation] mgyong@google.com mediapipe.dev MediaPipe Empowering Live Perception

Proprietary + Confidential Proprietary + Confidential

Synchronization Video

Video

ImageTransform

IMAGE

TRANSFORMED_IMAGE

ImageToTensor

IMAGE

TENSOR

Inference

TENSORS

TENSORS

TensorToLandmar

ks

TENSORS

LANDMARKS

Renderer RENDERED_IMAGE

LANDMARKS IMAGE

Model

Synchronization handled by MediaPipe framework

to align time-series data

● By default all inputs to a calculator are

synchronized

● Process method in a calculator is invoked with

packets with the same timestamp across inputs

Page 16: Empowering Live Perception with MediaPipe · 2019. 10. 30. · Ming Guang Yong, Google Research [link to presentation] mgyong@google.com mediapipe.dev MediaPipe Empowering Live Perception

Proprietary + Confidential Proprietary + Confidential

Subgraph Video

Video

ImageTransform

IMAGE

TRANSFORMED_IMAGE

ImageToTensor

IMAGE

TENSOR

Inference

TENSORS

TENSORS

TensorToLandmar

ks

TENSORS

LANDMARKS

Renderer RENDERED_IMAGE

LANDMARKS IMAGE

Video

Video

HandLandmark

IMAGE

LANDMARKS

Renderer RENDERED_IMAGE

LANDMARKS IMAGE

A (partial) graph can be declared as a

subgraph and used as a regular calculator in

other graphs Model

Page 17: Empowering Live Perception with MediaPipe · 2019. 10. 30. · Ming Guang Yong, Google Research [link to presentation] mgyong@google.com mediapipe.dev MediaPipe Empowering Live Perception

Proprietary + Confidential Proprietary + Confidential

Simple Perception Pipeline Video

Video

HandLandmark

IMAGE

LANDMARKS

Renderer RENDERED_IMAGE

LANDMARKS IMAGE

Remaining issues:

● In practice, a hand can appear anywhere in an

image, at very different scales

● Takes a lot of model capacity to deal with the

large variation in location & scale

● The model is either large/slow or inaccurate

Page 19: Empowering Live Perception with MediaPipe · 2019. 10. 30. · Ming Guang Yong, Google Research [link to presentation] mgyong@google.com mediapipe.dev MediaPipe Empowering Live Perception

Proprietary + Confidential Proprietary + Confidential

MediaPipe Toolkit

Page 20: Empowering Live Perception with MediaPipe · 2019. 10. 30. · Ming Guang Yong, Google Research [link to presentation] mgyong@google.com mediapipe.dev MediaPipe Empowering Live Perception

Proprietary + Confidential

MediaPipe Tech Stack

Graphs

Cross-platform Framework (C++)

TensorFlow,

OpenCV,

OpenGL/Metal,

Eigen etc

Graph

Construction

API

(Protobuf)

Calculator

API (C++)

Graph

Execution

API (C++,

Java, Obj-C)

Helper

Utils

(Android,

iOS)

Example

Applications

Example

Graphs

Built-in

Calculators Calculators

Graphs

Applications (Desktop/Server, Android, iOS, Embedded)

Page 21: Empowering Live Perception with MediaPipe · 2019. 10. 30. · Ming Guang Yong, Google Research [link to presentation] mgyong@google.com mediapipe.dev MediaPipe Empowering Live Perception

Proprietary + Confidential Proprietary + Confidential

Thank you!

mediapipe.dev