empowering live perception with mediapipe · 2019. 10. 30. · ming guang yong, google research...

Proprietary + Confidential

Ming Guang Yong, Google Research [link to presentation]

[email protected]

mediapipe.dev

MediaPipe

Empowering Live Perception with

https://docs.google.com/presentation/d/1V7QlrP0RJuJM9tESG-2102biNa3rqewjXCqqiqPtttc/edit

http://mediapipe.dev


Outline

● Introduction

● Example Pipeline for Live Perception

● MediaPipe Toolkit

Proprietary + Confidential Proprietary + Confidential

Widely used at Google in research & products to process

and analyze video, audio and sensor data:

● Dataset preparation pipelines for ML training

● ML inference pipelines

● Media processing pipelines

MediaPipe is Google's open source

cross-platform framework for building perception pipelines

Introduction


AR:

YouTube, ARCore, Duo

Mobile:

Visual Search, Lens

Cross-platform:

Motion Photos

Server:

Video Previews

Android

Server


MediaPipe OSS

● Initial release to GitHub in June,

2019

● 3200+ stars on GitHub to date

● Platforms: Mobile/Desktop

○ Linux (C++ Native APIs)

○ iOS (Obj C APIs)

○ Android (Java APIs)

● ML Acceleration

○ EdgeTPU

○ Mobile GPU

○ Desktop GPU (mesa libraries)

mediapipe.dev (github.com/google/mediapipe)

https://mediapipe.dev


docs.mediapipe.dev viz.mediapipe.dev

https://docs.mediapipe.dev

https://viz.mediapipe.dev


Mobile Examples


Live Perception Example


Goal: Hand Tracking


ML Model to Localize Hand Landmarks

Model

Image/Video Landmarks

There's more to it


Simple Perception Pipeline Video

Video

ImageTransform

IMAGE

TRANSFORMED_IMAGE

ImageToTensor

IMAGE

TENSOR

Inference

TENSORS

TENSORS

TensorToLandmar

ks

TENSORS

LANDMARKS

Renderer RENDERED_IMAGE

LANDMARKS IMAGE

Geometric transformation to resize and rotate etc

For a given ML model, runs ML inference

Renders landmarks onto associated images

Model

Video input, e.g., from live camera viewfinder

Decodes tensors into high-level metadata

Video output, e.g., display

Converts images to tensors


Pipeline via MediaPipe Video

Video

ImageTransform

IMAGE

TRANSFORMED_IMAGE

ImageToTensor

IMAGE

TENSOR

Inference

TENSORS

TENSORS

TensorToLandmar

ks

TENSORS

LANDMARKS


LANDMARKS IMAGE

Model

A MediaPipe Graph represents a perception

pipeline

Each node in the pipeline is a MediaPipe Calculator

Two nodes can be connected by a Stream, which

carries a sequence of Packets with ascending

timestamps


Calculator Video

Video

ImageTransform

IMAGE

TRANSFORMED_IMAGE

ImageToTensor

IMAGE

TENSOR

Inference

TENSORS

TENSORS

TensorToLandmar

ks

TENSORS

LANDMARKS


LANDMARKS IMAGE

Model

● Written in C++

● Declares input/output ports

○ Type of packet payload through the port

○ Ports can be declared as optional

● Implements Open/Process/Close methods, called by

framework

○ Open - Before a full graph run

○ Process - Repeatedly with incoming packet(s)

ready to be processed

○ Close - After a full graph run


Built-in Calculators Video

Video

ImageTransform

IMAGE

TRANSFORMED_IMAGE

ImageToTensor

IMAGE

TENSOR

Inference

TENSORS

TENSORS

TensorToLandmar

ks

TENSORS

LANDMARKS


LANDMARKS IMAGE

Model

Family of image and media

processing calculators

Native integration with TensorFlow and TF Lite

for ML inference

Family of ML post-processing calculators for

common ML tasks, e.g., detection, segmentation

and classification

Family of utility calculators for, e.g., image

annotation, flow control


Synchronization Video

Video

ImageTransform

IMAGE

TRANSFORMED_IMAGE

ImageToTensor

IMAGE

TENSOR

Inference

TENSORS

TENSORS

TensorToLandmar

ks

TENSORS

LANDMARKS


LANDMARKS IMAGE

Model

Synchronization handled by MediaPipe framework

to align time-series data

● By default all inputs to a calculator are

synchronized

● Process method in a calculator is invoked with

packets with the same timestamp across inputs


Subgraph Video

Video

ImageTransform

IMAGE

TRANSFORMED_IMAGE

ImageToTensor

IMAGE

TENSOR

Inference

TENSORS

TENSORS

TensorToLandmar

ks

TENSORS

LANDMARKS


LANDMARKS IMAGE

Video

Video

HandLandmark

IMAGE

LANDMARKS


LANDMARKS IMAGE

A (partial) graph can be declared as a

subgraph and used as a regular calculator in

other graphs Model


Simple Perception Pipeline Video

Video

HandLandmark

IMAGE

LANDMARKS


LANDMARKS IMAGE

Remaining issues:

● In practice, a hand can appear anywhere in an

image, at very different scales

● Takes a lot of model capacity to deal with the

large variation in location & scale

● The model is either large/slow or inaccurate


"On-Device, Real-Time Hand Tracking with MediaPipe", Aug '19

Google AI Blog

http://ai.googleblog.com/2019/08/on-device-real-time-hand-tracking-with.html







MediaPipe Toolkit


MediaPipe Tech Stack

Graphs

Cross-platform Framework (C++)

TensorFlow,

OpenCV,

OpenGL/Metal,

Eigen etc

Graph

Construction

API

(Protobuf)

Calculator

API (C++)

Graph

Execution

API (C++,

Java, Obj-C)

Helper

Utils

(Android,

iOS)

Example

Applications

Example

Graphs

Built-in

Calculators Calculators

Graphs

Applications (Desktop/Server, Android, iOS, Embedded)


Thank you!

mediapipe.dev

https://mediapipe.dev

empowering live perception with mediapipe · 2019. 10. 30. · ming guang yong, google research...

Documents