course outline - electrical & computer engineering dept...
TRANSCRIPT
1
EE8108
Multimedia Processing & Communications
Course Instructor:
Prof. Ling GuanDepartment of Electrical & Computer Engineering
Room 315, ENG BuildingTel: (416)979-5000 ext 6072Email: [email protected]
Participating Instructor:
Dr. Yifeng He11/22/2018
Course Outline
Introduction and the MPEG standardsIntroduction to statistical pattern recognition & neural networksMultimodal information fusion
• Fusion: Why and How?• Data/Feature level• Interaction level• Score/Decision level
Media indexing and retrieval• Past, present and future• Content‐based retrieval (CBR)• Metasearch engines
21/22/2018
2
Course Outline (2)
Human‐signature recognition
• Overview
• Human body movement analysis and recognition
• Human emotion recognition
• Human hand gesture recognition
Media transmission
• Distributed visual media communications
• Dynamic resource allocation in media transmission
Extra material (If time permits)
• Multimedia in immersive environment
31/22/2018
Lectures and Assessment
Lecture time 3 hours/week from week 2‐ week 11 (including a one week break)
Assessment Project 60%
• Presentation 10%• Report 50%
In Class Test (Test 1) 20% Final Test (Test 2) 20%
• Project Choose your own topic Speak to me if you cannot find a suitable topic Submit your topic and a one page proposal after the Reading Week Presentation time: Week 14 class time Report due: to be determined
Test 1: week 8 ‐ the week after reading week (1 hour, in classroom) Test 2: week 13 (1 hour, in classroom)
41/22/2018
3
Teaching Material
Lecture notes will be available at the course website. Check your EE8108 D2L
References Multimedia Image and Video Processing, Edited by L. Guan, Y.
He and S.‐Y. Kung, CRC Press 2012, 2nd edition
IEEE Transactions on Multimedia
ACM Multimedia
Other IEEE/ACM Transactions (talk to me if you need more information)
Proceedings IEEE Int. Conf. on Multimedia and Expo (ICME)
Proceedings ACM Multimedia Conference
51/22/2018
Project Requirement
You are required to work on a technical topic, either chosen by yourself in consultation with the instructor, or provided by the instructor. You are encouraged to choose your own topic.
The topic of your project could be one of the following:– comparison of two or more methods you found in the literature
– further development/analysis of an existing method/idea
– novel approach/technique, analysis or algorithm
You may use any programming language, MatLab, C/C++, etc. Your choice.
61/22/2018
4
Project Requirement (2)
You are required to demonstrate that your system and/or algorithm works as described in your report/presentation. Ideally, you demonstrate at the presentation time.
You are encouraged to work in a team of two students.
71/22/2018
A Note on Academic Integrity
• Please be advised to get yourself familiarized with Ryerson’s Regulation on Academic Integrity by
– Reading Ryerson SENATE POLICY 60: ACADEMIC INTEGRITY: Pages 1 – 4 and acting accordingly.— http://www.ryerson.ca/senate/policies/pol60.pdf
– Attending the mandatory departmental graduate seminar series which is offered every semester, covering research methods, research writing, library, ethics and integrity.
81/22/2018
5
Introduction and the MPEG Standards
91/22/2018
What is Multimedia?
What is multimedia?o A brief history of multimedia available at
http://people.ucalgary.ca/~edtech/688/hist.htm
What is multimedia processing & communications (MMPC)?
What impact has signal processing brought to multimedia technology?
Where are the multimedia technologies taking us?
…?
101/22/2018
6
Are These the Answers?
Multimedia is a domain of multi‐facets
Easy to define each facet individually, but challenging to consider them as a combined identity
Coherent integration of media contents obtained from different sources/sensors
Humans are natural and generic multimedia processing machines (human intelligent)
Can we teach computers/machines to do the same via artificial intelligence?
111/22/2018
What Are We Sure about MMPC?
It offers a forum for interaction among researchers in several media processing areas
MMPC opens up opportunities for information processing that falls in‐between the domains of traditional areas, such as speech, audio, music, text, graphics, image and video
MMPC brings together the signal processing community with computer, communication and systems engineers IEEE Conference on Multimedia & Expo ACM Multimedia Conference Various IEEE and ACM Transactions and Journals …..
121/22/2018
7
Current Trend in MMPC
Single media vs. multimedia: about 50% of the research in multimedia is still concerned with single media
Due to the maturity of standards, coding somehow dictates the direction of research in multimedia
Multiple media vs. multimedia
Real multimedia
Multimedia in immersive environment (VR/AR)
So plenty of room for new research, and your participation and contribution to this important area are very welcome
131/22/2018
What can be categorized as MMPC? Media coding and compression
Media compression Compressed domain processing Joint audio‐video coding and processing
Multimedia databases Indexing, retrieval, archiving, and management Authoring, sharing and editing Content recommendation
Digital library
Multimodal information fusion Fusibility Fusion levels Fusion of methodology
Human‐machine interaction and perception Content recognition/analysis/synthesis Emotion/intention and attention recognition Analysis and recognition of human gestures and activities Perceptual quality and human factors
141/22/2018
8
What can be categorized as MMPC (2)? Multimedia Communications
Transport protocols QoS control Media streaming Error concealment and loss recovery Rate control and hierarchical coding Multimedia cloud computing
Media security and watermarking Multimedia applications Standards and related issues
ITU‐T H‐series Standards for a/v communications MPEG Standards JPEG Standards Convergence of ITU‐T H‐series and MPEG –> H.264 MHEG, MJEPG, HTML, VRML and more
151/22/2018
Why Standards?
Instead of hiding and protecting your inventions, you publicly share your ideas with your colleagues
Standards encourage collaborations of experts to jointly work on a particular topic
Due to increased commercial interest in video communications, the need for image/video compression standards arose
The exercise in standardization proves that it can provide a powerful vehicle to promote new technology
Competition is very intense
161/22/2018
9
The MPEG Standards
Coding & multimedia standards developed and managed by Motion Picture Experts Group (MPEG) MPEG‐1: VCD
MPEG‐2: DVD, HDTV
MPEG‐3:???
MPEG‐4: Content‐based video coding
MPEG‐7: Multimedia indexing and retrieval
MPEG‐21:??? MPEG‐A/B/C/D/E/V/M/U/H/DASH FTV Standard
For more information on MPEG standards: http://en.wikipedia.org/wiki/Moving_Picture_Experts_Group
171/22/2018
The MPEG‐1 Standard
Released in 1992
A standard for coded representation of Moving pictures
Associated audio
And their combination
When used for storage and retrieval on digital media with bit rate of up to1.5 Mbit/s
Typical application – video CD (VCD)
181/22/2018
10
The MPEG‐2 Standard
Released in 1994, still one of the most popular standards
A standard to provide video quality not lower than NTSC/PAL with bit rates target between 2‐10 Mbit/s
Applications Digital cable TV distribution Networked database service via ATM Digital video tape recorder (VTR) Satellite and terrestrial digital broadcasting distribution
It also supports HDTV applications, and so pre‐emptied MPEG‐3 standard
Lost to JPEG‐2000 (MJEPG) in coding competition for digital cinema in 2002
191/22/2018
The MPEG‐4 Standard
First released in 1998, and targeted at content‐based multimedia applications and low bit‐rate video coding.
Algorithms and tools for coding and flexible representation of audio/video to meet the challenges of multimedia applications
It addresses the needs for Universal accessibility and robustness in error‐prone environment High interactive functionality Coding of natural and synthetic data (image/graphics), setting the stage for AR Scalable coding High compression efficiency
Bit rates: PSTN – 5‐64 kbit/s
TV/film – 4 Mbit/s
Ironically, the objective of low bit‐rate video coding was later accomplished by H.264, the convergence of ITU‐T H.263 and MPEG‐2.
201/22/2018
11
The MPEG‐7 Standard
First released in 2001
Official name: Multimedia Content Description Interface
Objective: To allow efficient search for multimedia content using standardized
descriptors
The main research issues: Optimum search engine
(Content‐based) feature analysis & query design
Unfortunately, the challenges were under‐estimated…
211/22/2018
MPEG-7 Working Group focuses on description interchange (the normative components -shaded blocks in the XM)
The rest is left for open competition from industry and research organizations
MPEG-7 Architecture
AV
Decoder
Feature
Extraction
Coding
Scheme
Decoding
Scheme
Media
Data
AV
File
D/
DSMPEG-7
File
Matching
and Filtering
MPEG-7 Experimental Model (XM) Architecture
12
MPEG‐7 Research Issues
Optimal search (retrieval) engine in Internet/wireless multimedia communications
Feature extraction and query design in information retrieval, especially image/video retrieval.
Feature Extraction Standard Description Search Engine
Scope of MPEG-7
The MPEG‐21 Standard
MPEG for the 21st Century! Aim at defining a normative open framework for
multimedia delivery and consumption for use by all the players in the delivery and consumption chain.
Provide content creators, producers, distributors and service providers with equal opportunities in the MPEG‐21 enabled open market.
Benefit the content consumer by providing them access to a large variety of content in an interoperable manner
Never got close to this bold objective! Abolished in 2009.
http://www.chiariglione.org/mpeg/standards/mpeg‐21/mpeg‐21.htm
13
MPEG‐A
• Multimedia Application Formats (MAFs)
– Facilitate the swift development of multimedia applications and services.
– Standardize MAFs for multimedia products and software.
– Stimulate the increased use of MPEG technology through interoperability of different media types.
251/22/2018
MPEG‐B
• Systems technologies
• MPEG‐B Part 1: Binary MPEG format for XML
– Standard defining a generic binary format for encoding XML documents
– Relies on schema knowledge between encoder and decoder in order for high compression efficiency.
– Provides fragmentation mechanisms for ensuring transmission flexibility.
261/22/2018
14
MPEG‐E
• Multimedia middleware (M3W)
• Interfaces of audio/video broadcast decoding, processing, and rendering.
• Interfaces of support API (application program interface): interaction with remote services, resource management, component download, faulty management, integrity management.
271/22/2018
MPEG‐V
• Media context and control
• Provides architecture and specifies associated information between
– virtual worlds (digital content, gaming simulation), and the real world (sensors, vision, rendering, robotics).
• A well‐defined connection between the virtual and the real world for better design methodology and tools.
281/22/2018
15
MPEG‐DASH
• Suite of standards for efficient streaming of multimedia, using existing internet infrastructure.
– Servers, CDNs, as well as proxies, caches
• Support on‐demand and live streaming.
• Provide MPEG‐4 file format and MPEG‐2 Transport Streams.
• Control streaming sessions with DASH client.
• Enable dynamic ad‐insertion and on‐demand content.
291/22/2018
Other MPEG Standards
• MPEG‐C: a suite of video standards that do not fall in other well‐established MPEG standards.
• MPEG‐D: a suite of standards for Audio technologies that do not fall in other MPEG standards
• MPEG‐M: a suit of standards to enable design and implementation of media‐handling value chains.
• MPEG‐U: provides a general purpose technology with innovative functionality that enable its use in heterogeneous scenarios such as broadcast, mobile, home network and web domains:
• MPEG‐H: Suite of standards for heterogeneous environment delivery of audio‐visual information compressed with high efficiency.
301/22/2018
16
FTV Standard
FTV ‐ Free Viewpoint TV
Started in January, 2004
Objective: To achieve an efficient and standard method for coding and view
generation of FTV
1st phase standardization Standardize the coding part of FTV as Multiview Video Coding (MVC)
Completed in May, 2009
2nd phase standardization Targeted the standardization of 3‐D Video (3DV)
In progress
311/22/2018
MVC Standard
To remove the correlation among multiview video data
Major approaches: Combining interview and
temporal prediction
Motion compensation
20‐30% bitrate savings compared to simulcast coding using H.264
Inter-view prediction structure for MVC
321/22/2018
17
3DV Standard
Motivation: Decouple production from coding format
MVC only optimized for 2D color video, but not for depth information
The main research issues: Data format
3D video coding method
Intermediate view generation
Current progress: Define FTV reference model
Adopt N view + N depth format as FTV Data Unit (FDU)
331/22/2018
An Instance
34
“Bullet time” from “The Matrix”. Warner Bros. 2000
18
Multimedia Information Retrieval(Driven by Machine Learning)
• Content‐based Image and Video Retrieval– Low‐level visual features
– Relevance feedback
• Information Fusion for Retrieval– Combining visual and audio information
– Combining audio/visual and contextual information
• Visual Re‐ranking
• Large‐scale Search – Data driven approaches
– Annotation and indexing
351/22/2018
36Computer-Computer Interactions
Semantic Gaps
Human-Computer Interactions
Research Challenges Ahead
19
Large‐scale Visual Indexing, Search and Application
• Large‐scale visual search and analysis is important.
• Efficient feature extraction.
• Suitable Data structure
– Bag‐of‐visualwords
– Tree structure
• Search mechanism: term frequency‐inverse document frequency (TF‐IDF)
• Applications
371/22/2018
Iris recognition Fingerprints recognition Speech recognition Face recognition Hand gesture recognition Emotion recognition Human movement modeling and recognition HSR for multimodal HCI and modality fusion
381/22/2018
20
Iris Recognition
• Iris – The most accurate and reliable biometric• 249 degrees of freedom (DOF) and good discrimination entropy
• Little changes with aging• Reliably recognizing 9 million with no false positive [Daugman 2002]
• The projection – 1 in 10 billion false positive: more than the population of the planet
• Requiring the complete co‐operation of the people being screened – highly invasive
391/22/2018
Fingerprint Recognition
• Also very accurate and inexpensive
• Used extensively by police and in security check (try get into US border today!)
• Artificial fingers made of cheap and readily available gelatin can cause serious flaw
• Highly invasive
401/22/2018
21
Face Recognition
• One of the most actively studied areas in HSR• Potential applications (commercial and law reinforcement):
– Allow to access an ATM machine– Control entry to restricted areas– Recognize people in specific areas (bank, store)– Retrieve people in a specific database (police)
• The accuracy is not up to required level, and it is at least semi‐invasive
• The most popular methods in face recognition:– EigenFaces– Hidden Markov Model Recognition– Compressed Sensing
411/22/2018
Speech Recognition
• One of the most actively studied areas in HSR
• The modeling is very elegant in English language: it is based on 50 smallest contrastive phonetic units ‐phonemes
• Prosodic and phonetic features
• Hidden Markov Model in Recognition
• Could be very accurate with millions of features and lengthy training
421/22/2018
22
Hand Gesture Recognition
• One of the most studied human body movement area
• It has found many applications
• It contributed significantly to computer vision‐based full body human movement recognition
431/22/2018
Marker-based gesture HCI
Natural free‐hand HCI (Magic Leap)
Vocal emotion recognition Emotion recognition using visual cues Bimodal emotion recognition 3D techniques in emotion recognition Language, culture and context independence Realistic data collection It is not a well studies field
Human Emotion Recognition
441/22/2018
23
Marker/tracker-based approaches (since 1970), then requesting
Extensive hardware setup (lessened recently) Significant setup time (lessened recently) Invasive in nature
Computer vision-based approaches
Mainly software based 2D vs 3D Rigid model vs non-rigid model Highly depending on state-of-the-art in image
processing and computer vision
Human Movement Recognition
451/22/2018
Dynamic fusion of mulitmodal biometricsCombine face, fingerprints, emotion, hand
gesture, speech, and gait/action.Dynamically select the level of multimodalityThe significance
The difficulty to simultaneously forge multiple biometrics
More effective HCI in the design of immersive systems
HSR in Multimodal HCI
461/22/2018
24
Example Areas HCI Plays a Critical Role
• Multimedia information mining
• Media indexing and retrieval
• Media manipulation
• Smart city/smart home
• Security and surveillance
• Learning of special needs
• Bioinformatics
• Creation of lifelike experience
471/22/2018
Transmission of Multimedia Data
Coding and compression
Distributed video coding
Multiple‐description coding
Streaming media
Peer‐to‐peer networks
Multi‐path transmission
Resource allocation
Multimedia in the cloud
481/22/2018
25
H.264 PFGS Video Codec
IntraPrediction
FrameBuffer1
MC
MEVideo
MC
LoopFilter
DCTBit
PlaneVLC
IDCT
DCTQ
IDCTQ
EntropyCoding
+
FrameBuffer0
LoopFilter
+
+
+
+
Base-Layer
Enhancement-Layer
UEP ChannelCoder
UEP ChannelCoder
H.264 PFGS Encoder (Courtesy Microsoft Research) 491/22/2018
50
The Original image One level DWT
1/22/2018
26
Close Relationship Between Content & Traffic
• “Forrest Gump” (Bocheck‐Chang)
511/22/2018
Dynamic Resource Allocation System
VBR Streams
Resource Allocation
and Admission Control
Link
Network
Buffer/ Scheduler
Source1
Source2
Source N
.
.
.
.
.
.
Need to determine
•determine renegotiation time•estimate how much resource to request
Renegotiation Points
time
Reserved Bandwidth
521/22/2018
27
Cloud Computing
A model for enabling convenient, on‐demand network access to a shared pool of configurable computing resources (for ex., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.
Source: National Institute of Standards and Technology (NIST) http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf
53
1/22/2018
Other Research Issues
Media security
Data hiding & water marking
Multimedia for security/surveillance
Multimedia computing in the immersive environment
Wireless multimedia
Multimedia information mining
Hardware design for multimedia
541/22/2018
28
Other Research Issues (2)
Pattern recognition/computer vision
Image processing and analysis
Speech processing/recognition/synthesis
Information and pattern mining
Artificial/Computational intelligence
Bioinformatics
551/22/2018
My Perception on Multimedia
Indexing & retrieval: Arguably the core You get what you want
Multimodality and fusion: Real multimedia
Human‐computer interaction: User friendly
Coding & transmission: Efficiency & quality in storage & delivery of multimedia data
Immersive 3D: Creation of lifelike experience
Media Security: IP and business consideration
Wireless multimedia: Make it handy
561/22/2018
29
Applications of Multimedia
Healthcare & telemedicine
Life science
Arts and cultural heritage
Digital asset management
Security/Surveillance/Military
Education (distance education)
Business/service on demand
Entertainment (Digital Cinema)
Gaming
Smart universe/city/grid…
571/22/2018
Application in Distributed Environment
Server 1
Client 2
Client 1
Server 2 Client 3
Client 4
Server 3
Backbone NetworkExternal Network
581/22/2018