mpeg araf tutorial @ ismar 2014

Post on 02-Jul-2015

344 Views

Category:

Science

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

A set of slides introducing ARAF - An MPEG standard for Mixed Reality

TRANSCRIPT

MPEG for Augmented Reality

ISMAR, September 9, 2014, Munich

AR Standards Community Meeting September 12, 2014

Marius Preda, MPEG 3DG Chair

Institut Mines TELECOM

http://www.slideshare.net/MariusPreda/mpeg-augmented-reality-tutorial

What you will learn today

• Who is MPEG and why MPEG is doing AR

• MPEG ARAF design principles and the main features

• Create ARAF experiences: two exercises

Tidy City

Portal Hunt

Elements

ARQuiz

Augmented Books

Event LOOV

Available on AppStore, AndroidStores and MyMultimediaWorld.com

• Collecting virtual money in real world for buying real

services and products

Summer School (1 week) Games

What is common in these "games" ?

Based on MPEG ARAF

Augmented Reality Application Format

Why MPEG AR?

MPEG Augmented Reality

Answers to (some) of Christine’s (non-technical) questions

• Who is MPEG?

• What MPEG does successfully?

• Who are the members?

• IPR policy

What is MPEG?

A suite of ~130 ISO/IEC standards for:

•Coding/compression of elementary media: • Audio (MPEG-1, 2 and 4), Video (MPEG-1, 2 and 4), 2D/3D graphics (MPEG-4)

• Transport • MPEG-2 Transport, File Format, Dynamic Adaptive Streaming over HTTP (DASH)

• Hybrid (natural & synthetic) scene description, user interaction (MPEG-4)• Metadata (MPEG-7)• Media management and protection (MPEG-21)• Sensors and actuators, Virtual Worlds (MPEG-V)• Advanced User interaction (MPEG-U)• Media-oriented middleware (MPEG-M)

More ISO/IEC standards under development for• Coding and Delivery in Heterogeneous Environments (incl.)• 3DVideo •…

• A standardization activity continuing for 25 years,

– Supported by several hundreds companies/organisations from ~25 countries

– ~500 experts participating in quarterly meetings

– More than 2300 active contributors

– Many thousands experts working in companies

• A proven manner to organize the work to deliver useful and used standards

– Developing standards by integrating individual technologies

– Well defined procedures

– Subgroups with clear objectives

– Ad hoc groups continuing coordinated work between meetings

• MPEG standards are widely referenced by industry

– 3GPP, ARIB, ATSC, DVB, DVD-Forum, BDA, EITSI, SCTE, TIA, DLNA, DECE, OIPF…

• Billions of software and hardware devices built on MPEG technologies

– MP3 players, cameras, mobile handsets, PCs, DVD/Blue-Ray players, STBs, TVs, …

• Business friendly IPR policy established at ISO level

What is MPEG?

MPEG technologies related to AR: 1st pillar

MPEG-1/2(AV content)

1992/4

VRML

1997

• Part 11 - BIFS:-Binarisation of VRML-Extensions for streaming-Extensions for server command-Extensions for 2D graphics- Real time augmentation with

audio & video• Part 2 - Visual:

- 3D Mesh compression- Face animation

1998

• Part 2 – Visual- Body animation

1999

MPEG-4 v.1

MPEG-4 v.2

First form of broadcast signal augmentation

MPEG-4

2003

•AFX 2nd Edition:- Animation by

morphing- Multi-texturing

2005

• AFX 3rd Edition- WSS for terrain

and cities- Frame based

animation

2007

MPEG-4

MPEG-4

• Part 16 - AFX:- A rich set of 3D

graphics tools- Compression of

geometry, appearance,animation

• AFX 4th Edition- Scalable complexity

mesh coding

2011

MPEG-4

A rich set of Scene and Graphics

representation and compression tools

MPEG technologies related to AR: 1st pillar

MPEG technologies related to AR: 2nd pillar

MPEG-V - Media Context and Control

2011

• 2nd Edition:- GPS- Biosensors- 3D Camera

2013

• Compression of video + depth

2014

MPEG-V

- 3D Video

• 1st Edition - Sensors and

actuators- Interoperability

between VirtualWorlds

• Feature-point based descriptors for image recognition

201x

CDVS

MPEG-U –Advanced User Interface

2012

A rich set of Sensors and Actuators

- 3D Audio

MPEG-H

MPEG technologies related to AR: 2nd pillar

MPEG-V – Media Context and Control

ActuatorsLight

Flash

Heating

Cooling

Wind

Vibration

Sprayer

Scent

Fog

Color correction

Initialize color correction parameter

Rigid body motion

Tactile

Kinesthetic

Global position command

SensorsLight

Ambient noise

Temperature

Humidity

Distance

Atmospheric pressure

Position

Velocity

Acceleration

Orientation

Angular velocity

Angular acceleration

Force

Torque

Pressure

Motion

Intelligent camera type

Multi Interaction point

Gaze tracking

Wind

Dust

Body height

Body weight

Body temperature

Body fat

Blood type

Blood pressure

Blood sugar

Blood oxygen

Heart rate

Electrograph

EEG , ECG, EMG, EOG , GSR

Weather

Facial expression

Facial morphology

Facial expression characteristics

Geomagnetic

Global position

Altitude

Bend

Gas

MPEG technologies related to AR: 2nd pillar

MPEG-V – Media Context and Control

• All AR-related data is available from MPEG standards

• Real time composition of synthetic and natural objects

• Access to

– Remotely/locally stored scene/compressed 2D/3D mesh objects

– Streamed real-time scene/compressed 2D/3D mesh objects

• Inherent object scalability (e.g. for streaming)

• User interaction & server generated scene changes

• Physical context

– Captured by a broad range of standard sensors

– Affected by a broad range of standard actuators

Main features of MPEG AR technologies

MPEG vision on AR

MPEG-4/MPEG-7/MPEG-21/MPEG-U/MPEG-V

MPEG Player

CompressionAuthoring Tool

Produce

Download

ARAF

MPEG vision on AR

MPEG-4/MPEG-7/MPEG-21/MPEG-U/MPEG-V

ARAF Browser

CompressionAuthoring Tool

Produce

Download

ARAF

End to end chain

ARAF Browser

MediaServers

ServiceServers

User

LocalSensors & Actuators

RemoteSensors & Actuators

MPEG ARAF

Local Real World

Environment

Remote Real World

Environment

AuthoringTools

• A set of scene graph nodes/protos as defined in MPEG-4 Part 11

– Existing nodes : Audio, image, video, graphics, programming, communication, user interactivity, animation

– New standard PROTOs : Map, MapMarker, Overlay, Local & Remote Recognition, Local & Remote Registration, CameraCalibration, AugmentedRegion, Point of Interest

• Connection to sensors and actuators as defined in MPEG-V

– Orientation, Position, Angular Velocity, Acceleration, GPS, Geomagnetic, Altitude

– Local or/and remote camera sensor

– Flash, Heating, Cooling, Wind, Sprayer, Scent, Fog, RigidBodyMotion, Kinestetic

• Compressed media

Three main components: scene, sensors/actuators, media

MPEG-A Part 13 ARAF

Scene: 73 XML Elements

MPEG-A Part 13 ARAF

Documentation available online:

http://wg11.sc29.org/augmentedReality/

Event LOOV, how it looks like?

Exercises

MPEG-A Part 13 ARAF

AR Quiz Augmented Book

Exercises

MPEG-A Part 13 ARAF

AR Quiz Augmented Book

http://youtu.be/LXZUbAFPP-Yhttp://youtu.be/la-Oez0aaHE

AR Quiz setting, preparing the medias

MPEG-A Part 13 ARAF

images, videos, audios, 2D/3D assets

GPS location

AR Quiz XML inspection

MPEG-A Part 13 ARAF

http://tiny.cc/MPEGARQuiz

AR Quiz Authoring Tool

MPEG-A Part 13 ARAF

www.MyMultimediaWorld.com go to Create / Augmented Reality

Augmented Book setting

MPEG-A Part 13 ARAF

images, audios

Augmented Book XML inspection

MPEG-A Part 13 ARAF

http://tiny.cc/MPEGAugBook

Augmented Book Authoring Tool

MPEG-A Part 13 ARAF

www.MyMultimediaWorld.com go to Create / Augmented Books

• ARAF Browser is Open Source

– iOS, Android, WS, Linux

– distributed at www.MyMultimediaWorld.com

• ARAF V1 published early 2014

• ARAF V2 in progress

– Visual Search (client side and server side)

– 3D Video, 3D Audio

– Connection to Social Networks

– Connection to POI servers

Conclusions

• Other slides that may help

MPEG 3DG Report

ARAF 2nd Edition

MPEG 3DG Report

ARAF 2nd Edition, items under discussion

1. Local vs Remote recognition and tracking

2. Social Networks

3. 3D video

4. 3D audio

MPEG 3DG Report

Server side object recognition: a real system*

Client Server

Query image

[Extraction]Descriptors

[Detection]Key points

HTTP POST(binary descriptor +

key points)

Query descriptors

DB descriptors

Matching

ID

Corresponding Information

Error/no message

Data as String

Parse and display the

answer

Decode

Decode

(1)

(2.2)

(2.1)

(3.1)

(3.2)

HTTP Response

Descriptors, images and information

[DB]

(4)

(5.1)

(5.2)(6)

(7)

(8’)

(8’’)

(9)(10)

Binary Data

* Wine recognizer : GooT and IMT

MPEG 3DG Report

Server side object recognition: ARAF version

MAR Scene

ARAF Browser

End-user Device

Video stream Video

source

Source(video URL)

optional: recognition region

Processing Server URLs

Video stream

ProcessingServers

Media data

Binary (base64) key points + descriptors

Detection Library

Detection Library

Detection Library

Image Recognition

Libraries

MAR Experience Creator + Content Creator

Large Image DB

Corresponding media

DB

ORB

MPEG 3DG Report

Server side object recognition: ARAF version

Discussions on:

- Does the content creator specify the form of request (full image or descriptors) or the browser will take the best decision?

- Is the server’s answer formalized in ARAF?

MPEG 3DG Report

ARAF – Social Network Data in ARAF scene

Scenario: display posts from SN in a geo-localized manner

ARAF can do this directly by programming the access to the SN service at the scene level

MPEG 3DG Report

ARAF – Social Network Data in ARAF scene

At minimum, user login to SN - at maximum : the MPEG UD

MPEG 3DG Report

ARAF – Social Network Data in ARAF scene

Connect to an UD server to get all the necessary data

Two categories of “SNS Data”

– Static data• Name, photo, email, phone number, address,

sex, interest, …– Social Network related activity

• Reported location, SNS post title, SNS text, SNS media, SNS media

MPEG 3DG Report

ARAF – Social Network scenario

Obtained from the UD server

MPEG 3DG Report

ARAF 2nd Edition – introducing 3D Video

Modeling of 3 AR classes for 3D video:

1.Pre-created 3D model of the environment, using visual search and other sensors to obtain camera position and orientation; 3D video used for handle occlusions

2.No a priori 3D model of the scene, depth captured in real-time and used to handle occlusions at the rendering step

3.No a priori model of the scene but created during AR experience (SLAM – Simultaneous Location and Mapping)

MPEG 3DG Report

ARAF – 3DAudio : local spatialisation

MAR

Experience

Creator +

Content Creator

Scene

ARAF Browser

Mobile device

Camera

Video/audio

stream

Coordination

mapping

Sensed

data

Position & orientationsensor

3D Audio

Engine

Relative sound location + (Acoustic scene) + audio

sourceSpatialized

audio sourceV

ideo

/aud

io

stream

User location & direction + sound location

ARAF file

Microphone

MixerSynthesized audio stream

MPEG 3DG Report

ARAF – 3DAudio : remote spatialisation

Scene

ARAF Browser

Mobile device

Camera

Video/audio

stream

Coordination

mapping

Sensed

data

Position & orientationsensor

video

/aud

io

stream

ProxyServer

Detection Library

Detection Library

Detection Library

3D Audio Engine

Relative sound location + Audio source + (Acoustic scene)

Spatialized audio source

MAR

Experience

Creator +

Content

Creator

Processing Server URL

ARAF file

User location & direction + sound location

Microphone

MixerSynthesized audio stream

MAR Experience Creator +

Content Creator

Target Resources or descriptors

Scene

ARAF Browser

Mobile device

Microphone/audio stream

Target Resources

ID Mask

Audio source

Source (microphone/audio URL) Detection Library

Detection Library

Detection Library

Audio Detection

Libraryoptional: detection window, sampling rate, detection delay

MPEG 3DG Report

ARAF – Audio recognition: local

MAR Experience Creator + Content Creator

Target Resources or descriptors

Scene

ARAF Browser

Mobile device

Microphone/audio stream

Audio source

Source (microphone/audio URL)

optional: detection window, sampling rate, detection delay

ProxyServer

Detection Library

Detection Library

Detection Library

AudioDetection

Library

ID Mask

URL of Processing Server

Target Resources or descriptors + IDs+ optional detection window, sampling rate, detection delay

MPEG 3DG Report

ARAF – Audio recognition: local

MAR Experience Creator + Content Creator

Target Resources or descriptors

Scene

ARAF Browser

Mobile device

Audio source

Source (microphone/audio URL)

optional: detection window, sampling rate, detection delay

ProcessingServer

Detection Library

Detection Library

Detection Library

AudioDetection

Library

ID Mask

URL of Processing ServerDescriptor Extraction

Microphone/audio stream Descriptors

Target Resources or descriptors + IDs+ optional detection window, sampling rate, detection delay

MPEG 3DG Report

ARAF – Audio recognition: local

MPEG 3DG Report

ARAF – joint meeting with 3DAudio

Spatialisation Recognition

• The 3D audio renderer

needs an API to get the

user position and

orientation

• It may be more

complex to update in

real time position and

orientation of all the

acoustic objects

• MPEG-7 has several

tools for audio

fingerprint

• Investigate the

ongoing work on

“Audio

synchronisation” and

check if it is suitable

for AR

top related