multimedia data management m - unibo.it · pdf filewith respect to hypertext, an hypermedia...

33
ALMA MATER STUDIORUM - UNIVERSITÀ DI BOLOGNA Multimedia Data Management M Second cycle degree programme (LM) in Computer Engineering University of Bologna Multimedia Data and Data Types Classification Home page: http://www-db.disi.unibo.it/courses/MDM/ Electronic version: 1.01.MultimediaDataTypes.pdf Electronic version: 1.01.MultimediaDataTypes-2p.pdf I. Bartolini

Upload: dinhduong

Post on 26-Mar-2018

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Multimedia Data Management M - unibo.it · PDF fileWith respect to hypertext, an hypermedia system is not ... There is no separation between the data and the ... I. Bartolini Multimedia

ALMA MATER STUDIORUM - UNIVERSITÀ DI BOLOGNA

Multimedia

Data Management M

Second cycle degree programme (LM) in Computer Engineering

University of Bologna

Multimedia Data and Data Types Classification

Home page: http://www-db.disi.unibo.it/courses/MDM/

Electronic version: 1.01.MultimediaDataTypes.pdf

Electronic version: 1.01.MultimediaDataTypes-2p.pdf

I. Bartolini

Page 2: Multimedia Data Management M - unibo.it · PDF fileWith respect to hypertext, an hypermedia system is not ... There is no separation between the data and the ... I. Bartolini Multimedia

Multimedia (MM) data and applications

Basics on structured data models

Basics on semi-strucutred data models

Basics on unstructured data models

Outline

2 I. Bartolini Multimedia Data Management M

Page 3: Multimedia Data Management M - unibo.it · PDF fileWith respect to hypertext, an hypermedia system is not ... There is no separation between the data and the ... I. Bartolini Multimedia

Media (or medium)

A way to distribute and represent information such as

books, newspapers, music, radio news, TV news, etc.

E.g.: text, graphics, images, voice, sound, music, animation,

video, etc.

text sound image graphic video animation

3 I. Bartolini Multimedia Data Management M

Page 4: Multimedia Data Management M - unibo.it · PDF fileWith respect to hypertext, an hypermedia system is not ... There is no separation between the data and the ... I. Bartolini Multimedia

Media description

Perception

auditory media (voice, audio, music)

visual media (text, graphics, images, moving images)

Representation

ASCII (text), JPEG (images), MP3 (audio), etc.

Presentation

input: keyboard, mouse, digital camera, scanner

output: paper, monitor, printer, speaker

Storage

disks (floppy, hard, optical), magnetic tapes, CD-ROM, DVD-ROM

Transmission

coaxial cable, optical fiber, satellite

Information exchange

CD, JAZ-Drives, optical fiber

4 I. Bartolini Multimedia Data Management M

Page 5: Multimedia Data Management M - unibo.it · PDF fileWith respect to hypertext, an hypermedia system is not ... There is no separation between the data and the ... I. Bartolini Multimedia

Media types (1)

5

discrete

(or static)

continuous

(or time-based)

still images text

graphics

moving

images

sound

animations

created

using a PC captured from

real world

digital

music

5 I. Bartolini Multimedia Data Management M

digital

movie

video

Page 6: Multimedia Data Management M - unibo.it · PDF fileWith respect to hypertext, an hypermedia system is not ... There is no separation between the data and the ... I. Bartolini Multimedia

Media types (2)

Represented in term of the dimensions of the space the

data are in:

0-dimensional data: this type of data is the regular, alphanumeric

data (e.g., text)

1-dimensional data: this type of data has one dimension (i.e., time)

of the space imposed into them (e.g., audio)

2-dimensional data: this type of data has two dimensions (i.e., x, y)

of the space imposed into them (e.g., images and graphics)

3-dimensional data: this type of data has tree dimensions

(i.e., x, y, and time) of a space imposed into them

(e.g., video and animation)

6 I. Bartolini Multimedia Data Management M

Page 7: Multimedia Data Management M - unibo.it · PDF fileWith respect to hypertext, an hypermedia system is not ... There is no separation between the data and the ... I. Bartolini Multimedia

Multimedia data

Multimedia data: a combination of a number of media objects (i.e., text,

graphics, sound, animation, video, etc.) that must be presented in a

coherent, synchronized way

It must contains at least a discrete and a continuous media

Multimedia system/application:

a system/application that uses both discrete and continuous media

7 I. Bartolini Multimedia Data Management M

Page 8: Multimedia Data Management M - unibo.it · PDF fileWith respect to hypertext, an hypermedia system is not ... There is no separation between the data and the ... I. Bartolini Multimedia

Multimedia applications

When a multimedia application involves

the user and includes a navigation structure (i.e., linked elements)

we obtain a Hypermedia system =

(interactive multimedia application + navigation structure)

(= multimedia + user interactivity)

8 I. Bartolini Multimedia Data Management M

Page 9: Multimedia Data Management M - unibo.it · PDF fileWith respect to hypertext, an hypermedia system is not ... There is no separation between the data and the ... I. Bartolini Multimedia

Hypermedia (1)

Hypertext :

text that contains links to other texts

it signifies the overcoming of the previous linear constraints of written text

With respect to hypertext, an hypermedia system is not constrained to be

text-based. It can include graphics, images and especially continuous

media such as sound and video

The author Ted Nelson coined both terms in 1963

9

Hypertext

Multimedia

Hypermedia

I. Bartolini Multimedia Data Management M

Page 10: Multimedia Data Management M - unibo.it · PDF fileWith respect to hypertext, an hypermedia system is not ... There is no separation between the data and the ... I. Bartolini Multimedia

Hypermedia (2)

Users can interact with digital multimedia in novel ways,

leading to non-linear structures

Linear structures in conventional media

Non-linear structures in digital MM

!Digital multimedia can interact with other sorts of data and

computation, also serving as a user interface to databases

and applications

10 I. Bartolini Multimedia Data Management M

Page 11: Multimedia Data Management M - unibo.it · PDF fileWith respect to hypertext, an hypermedia system is not ... There is no separation between the data and the ... I. Bartolini Multimedia

Application domains (1)

An effective and efficient management of MM data is

required in a variety of application domains, including

“general purpose” applications

Web search engines (e.g., Bing, Google, Yahoo, etc.)

E-commerce (where electronic catalogues have to be browsed

and/or searched)

MM digital libraries (images, audio interviews, videos etc.)

Web digital libraries such as Wikipedia, CiteSeer,

Google Scholar, …

Cultural Heritage (literary/artistic encyclopedia, virtual museums,

etc.)

11 I. Bartolini Multimedia Data Management M

Page 12: Multimedia Data Management M - unibo.it · PDF fileWith respect to hypertext, an hypermedia system is not ... There is no separation between the data and the ... I. Bartolini Multimedia

Application domains (2)

Edu-tainment (for example, to search in clipart repositories, or to

search and organize personal photo albums in mobile phones or

PDAs)

On line and print advertising

Media object classification

to search for similar logo images for copyright infringement issues

for the detection of pornography images

MM object annotation

which can be based on assigning to a unlabelled object the

semantic keywords associated to the objects most “similar”

to a given one

12 I. Bartolini Multimedia Data Management M

Page 13: Multimedia Data Management M - unibo.it · PDF fileWith respect to hypertext, an hypermedia system is not ... There is no separation between the data and the ... I. Bartolini Multimedia

Application domains (3)

“specific” (or domain dependent) applications

Medical DBs (ECG’s, X-rays, Magnetic Resonance Images (MRI))

Biometric systems (fingerprints, faces, handwriting)

Molecular DBs (DNA sequences, proteins)

Scientific DBs (sensor data, e.g., traffic control, surveillance)

Financial DBs (stock prices)

13 I. Bartolini Multimedia Data Management M

Page 14: Multimedia Data Management M - unibo.it · PDF fileWith respect to hypertext, an hypermedia system is not ... There is no separation between the data and the ... I. Bartolini Multimedia

Facilitate and improve the “access” to documentary data repositories for

general users, conjunctively exploiting:

low level features (e.g., document keywords) unstructured data

semi-automatically provided annotations semi-structured data

dedicated users manually provided metadata structured data

Let’s go back to our main goal…

Trimotore Fiat G212

Data: 1947

Collezione: Tema di cultura

industriale

Tipologia: Immagine

Aereo, Motore, Ali

Das Cabinet des Dr. Caligari

Data: 1920

Nazione: Germania

Regista: Robert Wiene

Genere: Horror

Espressionismo, Ipnosi,

Sonnambulismo

La Gioconda

Sito: Museo Louvre, Parigi

Secolo: XVI

Autore: Leonardo da Vinci

Periodo: Rinascimento

Data: 1503

Dipinto, Ritratto, Sorriso

Archivio Storico Fiat Cineteca Archivio Artistico

14 14 I. Bartolini Multimedia Data Management M

Page 15: Multimedia Data Management M - unibo.it · PDF fileWith respect to hypertext, an hypermedia system is not ... There is no separation between the data and the ... I. Bartolini Multimedia

Make a note of all the different media and combinations of

media you are exposed to in the course of a single day

Figurate some concrete examples of MM applications

(like the ones just illustrated) by separately describing, in

natural language, relevant structured, semi-structured,

and unstructured data

Exercise 1.A

Trimotore Fiat G212

Data: 1947

Collezione: Tema di cultura

industriale

Tipologia: Immagine

Aereo, Motore, Ali

Archivio Storico Fiat

15 15 I. Bartolini Multimedia Data Management M

what else? …

Page 16: Multimedia Data Management M - unibo.it · PDF fileWith respect to hypertext, an hypermedia system is not ... There is no separation between the data and the ... I. Bartolini Multimedia

Recall on structured data

Structured data base on a predefined schema able to

describe the content of the document collection

The schema does not change over time

A database (DB) can be seen as a collection of objects representing

some information of interest in a structured way (i.e., through a schema)

A relational database management systems (RDBMS or just DBMS) is a

software system able to “manage” collections of objects which can be

very large (Giga-Tera byte and more) and shared by different applications

in a persistent way (even in presence of faults)

“manage” = obtain, elaborate, maintain, produce, distribute

Examples of DBMSs: Oracle, IBM (DB2 UDB), Microsoft (SQL Server),

Sybase, mySQL, PostgreSQL, InterBase

16 I. Bartolini Multimedia Data Management M

Page 17: Multimedia Data Management M - unibo.it · PDF fileWith respect to hypertext, an hypermedia system is not ... There is no separation between the data and the ... I. Bartolini Multimedia

Relations as tables

DBMSs use the relational model (Codd, 1970) to describe the data, that

is the information is organized in tables (“relations”)

The rows of table corresponds to records, while the columns correspond

to attributes (schema)

The language to store/retrieve information from such tables is the

Structured Query Language (SQL)

Example: if we want to create a table with employees records, so that we

can store their employee number, name, age and salary, we can use the

following SQL statement:

create table EMPLOYEE (

empN integer PRIMARY KEY;

name char(50);

age integer;

salary float );

empN name age salary

17 I. Bartolini Multimedia Data Management M

Page 18: Multimedia Data Management M - unibo.it · PDF fileWith respect to hypertext, an hypermedia system is not ... There is no separation between the data and the ... I. Bartolini Multimedia

Populating and querying tables

Tables can be populated with the SQL insert command, e.g.,:

We can retrieve information using the select command. E.g., if we want

to find all the employees with salary less that 50000, we use the query:

Select *

From EMPLOYEE

Where salary <= 50000.00

empN name age salary

123 Smith, John 30 38000.00

456 Johnson, Tom 25 55000.00

insert into EMPLOYEE values (

123, ‘Smith, John’, 30, 38000.00);

insert into EMPLOYEE values (

456, ‘Johnson, Tom’, 25, 55000.00);

18 I. Bartolini Multimedia Data Management M

Page 19: Multimedia Data Management M - unibo.it · PDF fileWith respect to hypertext, an hypermedia system is not ... There is no separation between the data and the ... I. Bartolini Multimedia

Query execution

In absence of access methods (e.g., an index), the DBMS will perform a

sequential scanning, checking the salary of each and every employee

record against the desired threshold of 50000!!!

To accelerate queries execution, we can create an index (usually a B-tree index, as we will see in few minutes) with the command create index

E.g., to build an index on the employee’s salary, we would issue the SQL

statement:

In general the DBMS relies on an “optimizer” component to decide which

is the more efficient way to execute a given query

sequential vs. index-based evaluation

which index is the most appropriate

create index salIdx on EMPLOYEE(salary)

19 I. Bartolini Multimedia Data Management M

Page 20: Multimedia Data Management M - unibo.it · PDF fileWith respect to hypertext, an hypermedia system is not ... There is no separation between the data and the ... I. Bartolini Multimedia

Storage hierarchies

First level is typically main memory or core or RAM

Fast (access time of micro-seconds or faster), small, expensive

Second level (secondary store) is typically magnetic disk

Much slower (5-10 msec. access time), but much larger and cheaper

Database researchers has focused on “large databases” that do not fit in

main memory and thus have to be stored in secondary memory

Secondary store is organized into block (= pages)

The reason is that, accessing data from the disk involves the mechanical

move of the read/write head of the disk above the appropriate track on the

disk

Because these moves (‘seeks’) are slow and expensive, every time we

do a disk read we bring into main memory a whole disk block (of the

order of 1KB - 8 KB)

So, it makes a huge difference of performance if we mange to group

similar data in the same disk blocks!!

20 I. Bartolini Multimedia Data Management M

Page 21: Multimedia Data Management M - unibo.it · PDF fileWith respect to hypertext, an hypermedia system is not ... There is no separation between the data and the ... I. Bartolini Multimedia

B-tree

Access methods, like B-tree, try exactly to achieve good clustering of

data in order to minimize the number of disk-reads

5 8o o

1 3o o 6 7o o 9 12o o

oPr

Data pointer

P

Tree node pointerNull tree pointer

- Balanced tree of order p

- Node: <P1, <K1,Pr1>,

P2, <K2, Pr2>, ...Pq >

q p

21 I. Bartolini Multimedia Data Management M

Page 22: Multimedia Data Management M - unibo.it · PDF fileWith respect to hypertext, an hypermedia system is not ... There is no separation between the data and the ... I. Bartolini Multimedia

B+-tree

B-tree variant

more commonly used than B-tree

K1

Ki

Kq-1

P1

Pi

... Pq

X X X

X K1

Ki < X K

i

Ki-1

Kq-1

X

...

K1

Kq-1

Pri

... Pnext

Ki

...Pr1

Prq-1

data pointer data pointer data pointer

pointer to next leaf

node in tree

Internal node

Leaf node

- Data pointers only at the leaf nodes

- All leaf nodes linked together

° allows ordered access!

22 I. Bartolini Multimedia Data Management M

Page 23: Multimedia Data Management M - unibo.it · PDF fileWith respect to hypertext, an hypermedia system is not ... There is no separation between the data and the ... I. Bartolini Multimedia

What else…

The relational model and SQL provide a large number of

additional features, such as:

the ability to retrieve information from several tables (‘joins’); the

matching is based on values!

the ability to perform aggregate operations (e.g., sums, averages,

etc.)

We just restricted the discussion to the above few features

which are the essential ones for our purposes

For a complete treatment of the topic, please refer the course of the first

cycle degree programme in Computer Engineering “Information System

T” and the course of the second cycle degree programme (LM) in

Computer Engineering “Database and Big Data systems technologies”

23 23 I. Bartolini Multimedia Data Management M

Page 24: Multimedia Data Management M - unibo.it · PDF fileWith respect to hypertext, an hypermedia system is not ... There is no separation between the data and the ... I. Bartolini Multimedia

Recall on semi-structured data (1)

Semi-structured data is a form of structured data that does

not conform with the formal structure of data models

associated with relational databases, but nonetheless

contains tags or other markers to separate semantic

elements and enforce hierarchies of records and

fields within the data

Also known as self-describing structure

There is no separation between the data and the schema,

and the amount of structure used depends on the purpose

Among advantages of this model

More “flexible” than the “schema-based” relational one

It can represent the information of some data sources that cannot be

constrained by schema

24 I. Bartolini Multimedia Data Management M

Page 25: Multimedia Data Management M - unibo.it · PDF fileWith respect to hypertext, an hypermedia system is not ... There is no separation between the data and the ... I. Bartolini Multimedia

Recall on semi-structured data (2)

Hierarchical or graph-based data models

Among relevant models:

XML,

RDF,

OWL,

CSV

JSON

For a complete treatment of the XML instance, please refer the course

of the first cycle degree programme in Computer Engineering

“Web technologies T”

25 I. Bartolini Multimedia Data Management M

Page 26: Multimedia Data Management M - unibo.it · PDF fileWith respect to hypertext, an hypermedia system is not ... There is no separation between the data and the ... I. Bartolini Multimedia

XML by example

26

XML document example:

Specific languages (e.g., XQuery, XPath) are used for querying

<Article>

<Author>

<FirstName>Bob</FirstName>

<Surname>Smith</Surname>

</Author>

<Abstract>This paper concerns.... </Abstract>

<Section n="1">

<Title>Introduction</Title>

<Para>...

</Section>

</Article>

I. Bartolini Multimedia Data Management M

Page 27: Multimedia Data Management M - unibo.it · PDF fileWith respect to hypertext, an hypermedia system is not ... There is no separation between the data and the ... I. Bartolini Multimedia

XML: from physical to logical representation

27

There is a direct correspondence between the physical representation of

an XML document and its logical representation (or document tree)

<root>

<child>

<subchild>

</subchild>

</child>

<child>

</child>

</root>

root

child child

subchild

I. Bartolini Multimedia Data Management M

Page 28: Multimedia Data Management M - unibo.it · PDF fileWith respect to hypertext, an hypermedia system is not ... There is no separation between the data and the ... I. Bartolini Multimedia

Unstructured data

Unstructured data is data without a predefined schema

able to describe them or to assign a specific semantic

It often include text and multimedia content

E.g., e-mail messages, word processing documents, videos,

photos, audio files, presentations, Web pages and many

other kinds of business documents, etc.

Unstructured data represent the stronger data type

(more than 80% of all data)

The course mainly focuses on unstructured (and semi-

structured) data

by exploiting, possibly available structured data (metadata)

28 I. Bartolini Multimedia Data Management M

Page 29: Multimedia Data Management M - unibo.it · PDF fileWith respect to hypertext, an hypermedia system is not ... There is no separation between the data and the ... I. Bartolini Multimedia

Why unstructured data are that important? (1)

On the basis of studies conducted in 90’s, users preferred

to receive information by other people rather than using

an information retrieval system

E.g., travel booking, insurance agencies, buying airline and

train tickets

The trend has been reversed in the last 10 years thanks

to the success of Web technologies and Web search

engines

E.g., already in 2004 the 92% of the population considered

the Web a suitable source for the daily retrieve of useful

information

29 I. Bartolini Multimedia Data Management M

Page 30: Multimedia Data Management M - unibo.it · PDF fileWith respect to hypertext, an hypermedia system is not ... There is no separation between the data and the ... I. Bartolini Multimedia

Why unstructured data are that important? (2)

Let’s keep in mind that nowadays:

“85% of all stored data is held in unstructured formats”

“80% of business is conducted on unstructured data”

The unstructured data growing quickest than the other

double every 3 months!!

Exploitation of unstructured data could help in business

decision

extracting “value” from each single massive MM collection

extracting possible “correlations” among different types of

data involved in a same context

30 I. Bartolini Multimedia Data Management M

Page 31: Multimedia Data Management M - unibo.it · PDF fileWith respect to hypertext, an hypermedia system is not ... There is no separation between the data and the ... I. Bartolini Multimedia

Structured vs. unstructured data: in 1996

31 I. Bartolini Multimedia Data Management M

Page 32: Multimedia Data Management M - unibo.it · PDF fileWith respect to hypertext, an hypermedia system is not ... There is no separation between the data and the ... I. Bartolini Multimedia

Structured vs. unstructured data: in 2009

32 I. Bartolini Multimedia Data Management M

Page 33: Multimedia Data Management M - unibo.it · PDF fileWith respect to hypertext, an hypermedia system is not ... There is no separation between the data and the ... I. Bartolini Multimedia

Starting from descriptions in natural language of relevant

structured, semi-structured, and unstructured data

selected for your examples of Exercise 1.A, model the

data according to:

relational model (for structured data), and

XML model (for semi-structured data)

Provide a definition accurate as much as possible of the

low-level features you chose for describing the “content” of

involved MM data (unstructured data)

Exercise 1.B

33 33 I. Bartolini Multimedia Data Management M