multimedia data management m - unibo.it · pdf filewith respect to hypertext, an hypermedia...
TRANSCRIPT
ALMA MATER STUDIORUM - UNIVERSITÀ DI BOLOGNA
Multimedia
Data Management M
Second cycle degree programme (LM) in Computer Engineering
University of Bologna
Multimedia Data and Data Types Classification
Home page: http://www-db.disi.unibo.it/courses/MDM/
Electronic version: 1.01.MultimediaDataTypes.pdf
Electronic version: 1.01.MultimediaDataTypes-2p.pdf
I. Bartolini
Multimedia (MM) data and applications
Basics on structured data models
Basics on semi-strucutred data models
Basics on unstructured data models
Outline
2 I. Bartolini Multimedia Data Management M
Media (or medium)
A way to distribute and represent information such as
books, newspapers, music, radio news, TV news, etc.
E.g.: text, graphics, images, voice, sound, music, animation,
video, etc.
text sound image graphic video animation
3 I. Bartolini Multimedia Data Management M
Media description
Perception
auditory media (voice, audio, music)
visual media (text, graphics, images, moving images)
Representation
ASCII (text), JPEG (images), MP3 (audio), etc.
Presentation
input: keyboard, mouse, digital camera, scanner
output: paper, monitor, printer, speaker
Storage
disks (floppy, hard, optical), magnetic tapes, CD-ROM, DVD-ROM
Transmission
coaxial cable, optical fiber, satellite
Information exchange
CD, JAZ-Drives, optical fiber
4 I. Bartolini Multimedia Data Management M
Media types (1)
5
discrete
(or static)
continuous
(or time-based)
still images text
graphics
moving
images
sound
animations
created
using a PC captured from
real world
digital
music
5 I. Bartolini Multimedia Data Management M
digital
movie
video
Media types (2)
Represented in term of the dimensions of the space the
data are in:
0-dimensional data: this type of data is the regular, alphanumeric
data (e.g., text)
1-dimensional data: this type of data has one dimension (i.e., time)
of the space imposed into them (e.g., audio)
2-dimensional data: this type of data has two dimensions (i.e., x, y)
of the space imposed into them (e.g., images and graphics)
3-dimensional data: this type of data has tree dimensions
(i.e., x, y, and time) of a space imposed into them
(e.g., video and animation)
6 I. Bartolini Multimedia Data Management M
Multimedia data
Multimedia data: a combination of a number of media objects (i.e., text,
graphics, sound, animation, video, etc.) that must be presented in a
coherent, synchronized way
It must contains at least a discrete and a continuous media
Multimedia system/application:
a system/application that uses both discrete and continuous media
7 I. Bartolini Multimedia Data Management M
Multimedia applications
When a multimedia application involves
the user and includes a navigation structure (i.e., linked elements)
we obtain a Hypermedia system =
(interactive multimedia application + navigation structure)
(= multimedia + user interactivity)
8 I. Bartolini Multimedia Data Management M
Hypermedia (1)
Hypertext :
text that contains links to other texts
it signifies the overcoming of the previous linear constraints of written text
With respect to hypertext, an hypermedia system is not constrained to be
text-based. It can include graphics, images and especially continuous
media such as sound and video
The author Ted Nelson coined both terms in 1963
9
Hypertext
Multimedia
Hypermedia
I. Bartolini Multimedia Data Management M
Hypermedia (2)
Users can interact with digital multimedia in novel ways,
leading to non-linear structures
Linear structures in conventional media
Non-linear structures in digital MM
!Digital multimedia can interact with other sorts of data and
computation, also serving as a user interface to databases
and applications
10 I. Bartolini Multimedia Data Management M
Application domains (1)
An effective and efficient management of MM data is
required in a variety of application domains, including
“general purpose” applications
Web search engines (e.g., Bing, Google, Yahoo, etc.)
E-commerce (where electronic catalogues have to be browsed
and/or searched)
MM digital libraries (images, audio interviews, videos etc.)
Web digital libraries such as Wikipedia, CiteSeer,
Google Scholar, …
Cultural Heritage (literary/artistic encyclopedia, virtual museums,
etc.)
11 I. Bartolini Multimedia Data Management M
Application domains (2)
Edu-tainment (for example, to search in clipart repositories, or to
search and organize personal photo albums in mobile phones or
PDAs)
On line and print advertising
Media object classification
to search for similar logo images for copyright infringement issues
for the detection of pornography images
…
MM object annotation
which can be based on assigning to a unlabelled object the
semantic keywords associated to the objects most “similar”
to a given one
12 I. Bartolini Multimedia Data Management M
Application domains (3)
“specific” (or domain dependent) applications
Medical DBs (ECG’s, X-rays, Magnetic Resonance Images (MRI))
Biometric systems (fingerprints, faces, handwriting)
Molecular DBs (DNA sequences, proteins)
Scientific DBs (sensor data, e.g., traffic control, surveillance)
Financial DBs (stock prices)
13 I. Bartolini Multimedia Data Management M
Facilitate and improve the “access” to documentary data repositories for
general users, conjunctively exploiting:
low level features (e.g., document keywords) unstructured data
semi-automatically provided annotations semi-structured data
dedicated users manually provided metadata structured data
Let’s go back to our main goal…
Trimotore Fiat G212
Data: 1947
Collezione: Tema di cultura
industriale
Tipologia: Immagine
Aereo, Motore, Ali
Das Cabinet des Dr. Caligari
Data: 1920
Nazione: Germania
Regista: Robert Wiene
Genere: Horror
Espressionismo, Ipnosi,
Sonnambulismo
La Gioconda
Sito: Museo Louvre, Parigi
Secolo: XVI
Autore: Leonardo da Vinci
Periodo: Rinascimento
Data: 1503
Dipinto, Ritratto, Sorriso
Archivio Storico Fiat Cineteca Archivio Artistico
14 14 I. Bartolini Multimedia Data Management M
Make a note of all the different media and combinations of
media you are exposed to in the course of a single day
Figurate some concrete examples of MM applications
(like the ones just illustrated) by separately describing, in
natural language, relevant structured, semi-structured,
and unstructured data
Exercise 1.A
Trimotore Fiat G212
Data: 1947
Collezione: Tema di cultura
industriale
Tipologia: Immagine
Aereo, Motore, Ali
Archivio Storico Fiat
15 15 I. Bartolini Multimedia Data Management M
what else? …
Recall on structured data
Structured data base on a predefined schema able to
describe the content of the document collection
The schema does not change over time
A database (DB) can be seen as a collection of objects representing
some information of interest in a structured way (i.e., through a schema)
A relational database management systems (RDBMS or just DBMS) is a
software system able to “manage” collections of objects which can be
very large (Giga-Tera byte and more) and shared by different applications
in a persistent way (even in presence of faults)
“manage” = obtain, elaborate, maintain, produce, distribute
Examples of DBMSs: Oracle, IBM (DB2 UDB), Microsoft (SQL Server),
Sybase, mySQL, PostgreSQL, InterBase
16 I. Bartolini Multimedia Data Management M
Relations as tables
DBMSs use the relational model (Codd, 1970) to describe the data, that
is the information is organized in tables (“relations”)
The rows of table corresponds to records, while the columns correspond
to attributes (schema)
The language to store/retrieve information from such tables is the
Structured Query Language (SQL)
Example: if we want to create a table with employees records, so that we
can store their employee number, name, age and salary, we can use the
following SQL statement:
create table EMPLOYEE (
empN integer PRIMARY KEY;
name char(50);
age integer;
salary float );
empN name age salary
17 I. Bartolini Multimedia Data Management M
Populating and querying tables
Tables can be populated with the SQL insert command, e.g.,:
We can retrieve information using the select command. E.g., if we want
to find all the employees with salary less that 50000, we use the query:
Select *
From EMPLOYEE
Where salary <= 50000.00
empN name age salary
123 Smith, John 30 38000.00
456 Johnson, Tom 25 55000.00
insert into EMPLOYEE values (
123, ‘Smith, John’, 30, 38000.00);
insert into EMPLOYEE values (
456, ‘Johnson, Tom’, 25, 55000.00);
18 I. Bartolini Multimedia Data Management M
Query execution
In absence of access methods (e.g., an index), the DBMS will perform a
sequential scanning, checking the salary of each and every employee
record against the desired threshold of 50000!!!
To accelerate queries execution, we can create an index (usually a B-tree index, as we will see in few minutes) with the command create index
E.g., to build an index on the employee’s salary, we would issue the SQL
statement:
In general the DBMS relies on an “optimizer” component to decide which
is the more efficient way to execute a given query
sequential vs. index-based evaluation
which index is the most appropriate
…
create index salIdx on EMPLOYEE(salary)
19 I. Bartolini Multimedia Data Management M
Storage hierarchies
First level is typically main memory or core or RAM
Fast (access time of micro-seconds or faster), small, expensive
Second level (secondary store) is typically magnetic disk
Much slower (5-10 msec. access time), but much larger and cheaper
Database researchers has focused on “large databases” that do not fit in
main memory and thus have to be stored in secondary memory
Secondary store is organized into block (= pages)
The reason is that, accessing data from the disk involves the mechanical
move of the read/write head of the disk above the appropriate track on the
disk
Because these moves (‘seeks’) are slow and expensive, every time we
do a disk read we bring into main memory a whole disk block (of the
order of 1KB - 8 KB)
So, it makes a huge difference of performance if we mange to group
similar data in the same disk blocks!!
20 I. Bartolini Multimedia Data Management M
B-tree
Access methods, like B-tree, try exactly to achieve good clustering of
data in order to minimize the number of disk-reads
5 8o o
1 3o o 6 7o o 9 12o o
oPr
Data pointer
P
Tree node pointerNull tree pointer
- Balanced tree of order p
- Node: <P1, <K1,Pr1>,
P2, <K2, Pr2>, ...Pq >
q p
21 I. Bartolini Multimedia Data Management M
B+-tree
B-tree variant
more commonly used than B-tree
K1
Ki
Kq-1
P1
Pi
... Pq
X X X
X K1
Ki < X K
i
Ki-1
Kq-1
X
...
K1
Kq-1
Pri
... Pnext
Ki
...Pr1
Prq-1
data pointer data pointer data pointer
pointer to next leaf
node in tree
Internal node
Leaf node
- Data pointers only at the leaf nodes
- All leaf nodes linked together
° allows ordered access!
22 I. Bartolini Multimedia Data Management M
What else…
The relational model and SQL provide a large number of
additional features, such as:
the ability to retrieve information from several tables (‘joins’); the
matching is based on values!
the ability to perform aggregate operations (e.g., sums, averages,
etc.)
We just restricted the discussion to the above few features
which are the essential ones for our purposes
For a complete treatment of the topic, please refer the course of the first
cycle degree programme in Computer Engineering “Information System
T” and the course of the second cycle degree programme (LM) in
Computer Engineering “Database and Big Data systems technologies”
23 23 I. Bartolini Multimedia Data Management M
Recall on semi-structured data (1)
Semi-structured data is a form of structured data that does
not conform with the formal structure of data models
associated with relational databases, but nonetheless
contains tags or other markers to separate semantic
elements and enforce hierarchies of records and
fields within the data
Also known as self-describing structure
There is no separation between the data and the schema,
and the amount of structure used depends on the purpose
Among advantages of this model
More “flexible” than the “schema-based” relational one
It can represent the information of some data sources that cannot be
constrained by schema
24 I. Bartolini Multimedia Data Management M
Recall on semi-structured data (2)
Hierarchical or graph-based data models
Among relevant models:
XML,
RDF,
OWL,
CSV
JSON
…
For a complete treatment of the XML instance, please refer the course
of the first cycle degree programme in Computer Engineering
“Web technologies T”
25 I. Bartolini Multimedia Data Management M
XML by example
26
XML document example:
Specific languages (e.g., XQuery, XPath) are used for querying
<Article>
<Author>
<FirstName>Bob</FirstName>
<Surname>Smith</Surname>
</Author>
<Abstract>This paper concerns.... </Abstract>
<Section n="1">
<Title>Introduction</Title>
<Para>...
</Section>
</Article>
I. Bartolini Multimedia Data Management M
XML: from physical to logical representation
27
There is a direct correspondence between the physical representation of
an XML document and its logical representation (or document tree)
<root>
<child>
<subchild>
…
</subchild>
</child>
<child>
…
</child>
</root>
root
child child
subchild
I. Bartolini Multimedia Data Management M
Unstructured data
Unstructured data is data without a predefined schema
able to describe them or to assign a specific semantic
It often include text and multimedia content
E.g., e-mail messages, word processing documents, videos,
photos, audio files, presentations, Web pages and many
other kinds of business documents, etc.
Unstructured data represent the stronger data type
(more than 80% of all data)
The course mainly focuses on unstructured (and semi-
structured) data
by exploiting, possibly available structured data (metadata)
28 I. Bartolini Multimedia Data Management M
Why unstructured data are that important? (1)
On the basis of studies conducted in 90’s, users preferred
to receive information by other people rather than using
an information retrieval system
E.g., travel booking, insurance agencies, buying airline and
train tickets
The trend has been reversed in the last 10 years thanks
to the success of Web technologies and Web search
engines
E.g., already in 2004 the 92% of the population considered
the Web a suitable source for the daily retrieve of useful
information
29 I. Bartolini Multimedia Data Management M
Why unstructured data are that important? (2)
Let’s keep in mind that nowadays:
“85% of all stored data is held in unstructured formats”
“80% of business is conducted on unstructured data”
The unstructured data growing quickest than the other
double every 3 months!!
Exploitation of unstructured data could help in business
decision
extracting “value” from each single massive MM collection
extracting possible “correlations” among different types of
data involved in a same context
30 I. Bartolini Multimedia Data Management M
Structured vs. unstructured data: in 1996
31 I. Bartolini Multimedia Data Management M
Structured vs. unstructured data: in 2009
32 I. Bartolini Multimedia Data Management M
Starting from descriptions in natural language of relevant
structured, semi-structured, and unstructured data
selected for your examples of Exercise 1.A, model the
data according to:
relational model (for structured data), and
XML model (for semi-structured data)
Provide a definition accurate as much as possible of the
low-level features you chose for describing the “content” of
involved MM data (unstructured data)
Exercise 1.B
33 33 I. Bartolini Multimedia Data Management M