mpeg araf tutorial @ ismar 2014
DESCRIPTION
A set of slides introducing ARAF - An MPEG standard for Mixed RealityTRANSCRIPT
MPEG for Augmented Reality
ISMAR, September 9, 2014, Munich
AR Standards Community Meeting September 12, 2014
Marius Preda, MPEG 3DG Chair
Institut Mines TELECOM
http://www.slideshare.net/MariusPreda/mpeg-augmented-reality-tutorial
What you will learn today
• Who is MPEG and why MPEG is doing AR
• MPEG ARAF design principles and the main features
• Create ARAF experiences: two exercises
Tidy City
Portal Hunt
Elements
ARQuiz
Augmented Books
Event LOOV
Available on AppStore, AndroidStores and MyMultimediaWorld.com
• Collecting virtual money in real world for buying real
services and products
Summer School (1 week) Games
What is common in these "games" ?
Based on MPEG ARAF
Augmented Reality Application Format
Why MPEG AR?
MPEG Augmented Reality
Answers to (some) of Christine’s (non-technical) questions
• Who is MPEG?
• What MPEG does successfully?
• Who are the members?
• IPR policy
What is MPEG?
A suite of ~130 ISO/IEC standards for:
•Coding/compression of elementary media: • Audio (MPEG-1, 2 and 4), Video (MPEG-1, 2 and 4), 2D/3D graphics (MPEG-4)
• Transport • MPEG-2 Transport, File Format, Dynamic Adaptive Streaming over HTTP (DASH)
• Hybrid (natural & synthetic) scene description, user interaction (MPEG-4)• Metadata (MPEG-7)• Media management and protection (MPEG-21)• Sensors and actuators, Virtual Worlds (MPEG-V)• Advanced User interaction (MPEG-U)• Media-oriented middleware (MPEG-M)
More ISO/IEC standards under development for• Coding and Delivery in Heterogeneous Environments (incl.)• 3DVideo •…
• A standardization activity continuing for 25 years,
– Supported by several hundreds companies/organisations from ~25 countries
– ~500 experts participating in quarterly meetings
– More than 2300 active contributors
– Many thousands experts working in companies
• A proven manner to organize the work to deliver useful and used standards
– Developing standards by integrating individual technologies
– Well defined procedures
– Subgroups with clear objectives
– Ad hoc groups continuing coordinated work between meetings
• MPEG standards are widely referenced by industry
– 3GPP, ARIB, ATSC, DVB, DVD-Forum, BDA, EITSI, SCTE, TIA, DLNA, DECE, OIPF…
• Billions of software and hardware devices built on MPEG technologies
– MP3 players, cameras, mobile handsets, PCs, DVD/Blue-Ray players, STBs, TVs, …
• Business friendly IPR policy established at ISO level
What is MPEG?
MPEG technologies related to AR: 1st pillar
MPEG-1/2(AV content)
1992/4
VRML
1997
• Part 11 - BIFS:-Binarisation of VRML-Extensions for streaming-Extensions for server command-Extensions for 2D graphics- Real time augmentation with
audio & video• Part 2 - Visual:
- 3D Mesh compression- Face animation
1998
• Part 2 – Visual- Body animation
1999
MPEG-4 v.1
MPEG-4 v.2
First form of broadcast signal augmentation
MPEG-4
2003
•AFX 2nd Edition:- Animation by
morphing- Multi-texturing
2005
• AFX 3rd Edition- WSS for terrain
and cities- Frame based
animation
2007
MPEG-4
MPEG-4
• Part 16 - AFX:- A rich set of 3D
graphics tools- Compression of
geometry, appearance,animation
• AFX 4th Edition- Scalable complexity
mesh coding
2011
MPEG-4
A rich set of Scene and Graphics
representation and compression tools
MPEG technologies related to AR: 1st pillar
MPEG technologies related to AR: 2nd pillar
MPEG-V - Media Context and Control
2011
• 2nd Edition:- GPS- Biosensors- 3D Camera
2013
• Compression of video + depth
2014
MPEG-V
- 3D Video
• 1st Edition - Sensors and
actuators- Interoperability
between VirtualWorlds
• Feature-point based descriptors for image recognition
201x
CDVS
MPEG-U –Advanced User Interface
2012
A rich set of Sensors and Actuators
- 3D Audio
MPEG-H
MPEG technologies related to AR: 2nd pillar
MPEG-V – Media Context and Control
ActuatorsLight
Flash
Heating
Cooling
Wind
Vibration
Sprayer
Scent
Fog
Color correction
Initialize color correction parameter
Rigid body motion
Tactile
Kinesthetic
Global position command
SensorsLight
Ambient noise
Temperature
Humidity
Distance
Atmospheric pressure
Position
Velocity
Acceleration
Orientation
Angular velocity
Angular acceleration
Force
Torque
Pressure
Motion
Intelligent camera type
Multi Interaction point
Gaze tracking
Wind
Dust
Body height
Body weight
Body temperature
Body fat
Blood type
Blood pressure
Blood sugar
Blood oxygen
Heart rate
Electrograph
EEG , ECG, EMG, EOG , GSR
Weather
Facial expression
Facial morphology
Facial expression characteristics
Geomagnetic
Global position
Altitude
Bend
Gas
MPEG technologies related to AR: 2nd pillar
MPEG-V – Media Context and Control
• All AR-related data is available from MPEG standards
• Real time composition of synthetic and natural objects
• Access to
– Remotely/locally stored scene/compressed 2D/3D mesh objects
– Streamed real-time scene/compressed 2D/3D mesh objects
• Inherent object scalability (e.g. for streaming)
• User interaction & server generated scene changes
• Physical context
– Captured by a broad range of standard sensors
– Affected by a broad range of standard actuators
Main features of MPEG AR technologies
MPEG vision on AR
MPEG-4/MPEG-7/MPEG-21/MPEG-U/MPEG-V
MPEG Player
CompressionAuthoring Tool
Produce
Download
ARAF
MPEG vision on AR
MPEG-4/MPEG-7/MPEG-21/MPEG-U/MPEG-V
ARAF Browser
CompressionAuthoring Tool
Produce
Download
ARAF
End to end chain
ARAF Browser
MediaServers
ServiceServers
User
LocalSensors & Actuators
RemoteSensors & Actuators
MPEG ARAF
Local Real World
Environment
Remote Real World
Environment
AuthoringTools
• A set of scene graph nodes/protos as defined in MPEG-4 Part 11
– Existing nodes : Audio, image, video, graphics, programming, communication, user interactivity, animation
– New standard PROTOs : Map, MapMarker, Overlay, Local & Remote Recognition, Local & Remote Registration, CameraCalibration, AugmentedRegion, Point of Interest
• Connection to sensors and actuators as defined in MPEG-V
– Orientation, Position, Angular Velocity, Acceleration, GPS, Geomagnetic, Altitude
– Local or/and remote camera sensor
– Flash, Heating, Cooling, Wind, Sprayer, Scent, Fog, RigidBodyMotion, Kinestetic
• Compressed media
Three main components: scene, sensors/actuators, media
MPEG-A Part 13 ARAF
Scene: 73 XML Elements
MPEG-A Part 13 ARAF
Documentation available online:
http://wg11.sc29.org/augmentedReality/
Event LOOV, how it looks like?
Exercises
MPEG-A Part 13 ARAF
AR Quiz Augmented Book
Exercises
MPEG-A Part 13 ARAF
AR Quiz Augmented Book
http://youtu.be/LXZUbAFPP-Yhttp://youtu.be/la-Oez0aaHE
AR Quiz setting, preparing the medias
MPEG-A Part 13 ARAF
images, videos, audios, 2D/3D assets
GPS location
AR Quiz XML inspection
MPEG-A Part 13 ARAF
http://tiny.cc/MPEGARQuiz
AR Quiz Authoring Tool
MPEG-A Part 13 ARAF
www.MyMultimediaWorld.com go to Create / Augmented Reality
Augmented Book setting
MPEG-A Part 13 ARAF
images, audios
Augmented Book XML inspection
MPEG-A Part 13 ARAF
http://tiny.cc/MPEGAugBook
Augmented Book Authoring Tool
MPEG-A Part 13 ARAF
www.MyMultimediaWorld.com go to Create / Augmented Books
• ARAF Browser is Open Source
– iOS, Android, WS, Linux
– distributed at www.MyMultimediaWorld.com
• ARAF V1 published early 2014
• ARAF V2 in progress
– Visual Search (client side and server side)
– 3D Video, 3D Audio
– Connection to Social Networks
– Connection to POI servers
Conclusions
• Other slides that may help
MPEG 3DG Report
ARAF 2nd Edition
MPEG 3DG Report
ARAF 2nd Edition, items under discussion
1. Local vs Remote recognition and tracking
2. Social Networks
3. 3D video
4. 3D audio
MPEG 3DG Report
Server side object recognition: a real system*
Client Server
Query image
[Extraction]Descriptors
[Detection]Key points
HTTP POST(binary descriptor +
key points)
Query descriptors
DB descriptors
Matching
ID
Corresponding Information
Error/no message
Data as String
Parse and display the
answer
Decode
Decode
(1)
(2.2)
(2.1)
(3.1)
(3.2)
HTTP Response
Descriptors, images and information
[DB]
(4)
(5.1)
(5.2)(6)
(7)
(8’)
(8’’)
(9)(10)
Binary Data
* Wine recognizer : GooT and IMT
MPEG 3DG Report
Server side object recognition: ARAF version
MAR Scene
ARAF Browser
End-user Device
Video stream Video
source
Source(video URL)
optional: recognition region
Processing Server URLs
Video stream
ProcessingServers
Media data
Binary (base64) key points + descriptors
Detection Library
Detection Library
Detection Library
Image Recognition
Libraries
MAR Experience Creator + Content Creator
Large Image DB
Corresponding media
DB
ORB
MPEG 3DG Report
Server side object recognition: ARAF version
Discussions on:
- Does the content creator specify the form of request (full image or descriptors) or the browser will take the best decision?
- Is the server’s answer formalized in ARAF?
MPEG 3DG Report
ARAF – Social Network Data in ARAF scene
Scenario: display posts from SN in a geo-localized manner
ARAF can do this directly by programming the access to the SN service at the scene level
MPEG 3DG Report
ARAF – Social Network Data in ARAF scene
At minimum, user login to SN - at maximum : the MPEG UD
MPEG 3DG Report
ARAF – Social Network Data in ARAF scene
Connect to an UD server to get all the necessary data
Two categories of “SNS Data”
– Static data• Name, photo, email, phone number, address,
sex, interest, …– Social Network related activity
• Reported location, SNS post title, SNS text, SNS media, SNS media
MPEG 3DG Report
ARAF – Social Network scenario
Obtained from the UD server
MPEG 3DG Report
ARAF 2nd Edition – introducing 3D Video
Modeling of 3 AR classes for 3D video:
1.Pre-created 3D model of the environment, using visual search and other sensors to obtain camera position and orientation; 3D video used for handle occlusions
2.No a priori 3D model of the scene, depth captured in real-time and used to handle occlusions at the rendering step
3.No a priori model of the scene but created during AR experience (SLAM – Simultaneous Location and Mapping)
MPEG 3DG Report
ARAF – introducing 3D Audio
Spatialisation Recognition
Use sounds
from the real
world to trigger
events in an AR
scene
MPEG 3DG Report
ARAF – 3DAudio : local spatialisation
MAR
Experience
Creator +
Content Creator
Scene
ARAF Browser
Mobile device
Camera
Video/audio
stream
Coordination
mapping
Sensed
data
Position & orientationsensor
3D Audio
Engine
Relative sound location + (Acoustic scene) + audio
sourceSpatialized
audio sourceV
ideo
/aud
io
stream
User location & direction + sound location
ARAF file
Microphone
MixerSynthesized audio stream
MPEG 3DG Report
ARAF – 3DAudio : remote spatialisation
Scene
ARAF Browser
Mobile device
Camera
Video/audio
stream
Coordination
mapping
Sensed
data
Position & orientationsensor
video
/aud
io
stream
ProxyServer
Detection Library
Detection Library
Detection Library
3D Audio Engine
Relative sound location + Audio source + (Acoustic scene)
Spatialized audio source
MAR
Experience
Creator +
Content
Creator
Processing Server URL
ARAF file
User location & direction + sound location
Microphone
MixerSynthesized audio stream
MAR Experience Creator +
Content Creator
Target Resources or descriptors
Scene
ARAF Browser
Mobile device
Microphone/audio stream
Target Resources
ID Mask
Audio source
Source (microphone/audio URL) Detection Library
Detection Library
Detection Library
Audio Detection
Libraryoptional: detection window, sampling rate, detection delay
MPEG 3DG Report
ARAF – Audio recognition: local
MAR Experience Creator + Content Creator
Target Resources or descriptors
Scene
ARAF Browser
Mobile device
Microphone/audio stream
Audio source
Source (microphone/audio URL)
optional: detection window, sampling rate, detection delay
ProxyServer
Detection Library
Detection Library
Detection Library
AudioDetection
Library
ID Mask
URL of Processing Server
Target Resources or descriptors + IDs+ optional detection window, sampling rate, detection delay
MPEG 3DG Report
ARAF – Audio recognition: local
MAR Experience Creator + Content Creator
Target Resources or descriptors
Scene
ARAF Browser
Mobile device
Audio source
Source (microphone/audio URL)
optional: detection window, sampling rate, detection delay
ProcessingServer
Detection Library
Detection Library
Detection Library
AudioDetection
Library
ID Mask
URL of Processing ServerDescriptor Extraction
Microphone/audio stream Descriptors
Target Resources or descriptors + IDs+ optional detection window, sampling rate, detection delay
MPEG 3DG Report
ARAF – Audio recognition: local
MPEG 3DG Report
ARAF – joint meeting with 3DAudio
Spatialisation Recognition
• The 3D audio renderer
needs an API to get the
user position and
orientation
• It may be more
complex to update in
real time position and
orientation of all the
acoustic objects
• MPEG-7 has several
tools for audio
fingerprint
• Investigate the
ongoing work on
“Audio
synchronisation” and
check if it is suitable
for AR