teorija informacije i kodiranje - naslovnicazeljkoj/nastava/teorija_informacije/020... · teorija...
TRANSCRIPT
TEORIJA INFORMACIJE I KODIRANJE
Željko
Jeričević, dr. sc.Zavod
za
računarstvo, Tehnički
fakultet
&
Zavod
za
biologiju
i medicinsku
genetiku, Medicinski
fakultet51000 Rijeka, Croatia
Phone: (+385) 51-651 594 E-mail: [email protected]
http://www.riteh.uniri.hr/~zeljkoj/Zeljko_Jericevic.html
11 February 2012 Zeljko Jericevic, Ph.D. 2
TEORIJA INFORMACIJE I KODIRANJE
Predavač:Željko
Jeričević
2-54 651-594 [email protected]://www.riteh.uniri.hr/~zeljkoj/Zeljko_Jericevic.html
11 February 2012 Zeljko Jericevic, Ph.D. 3
Ukratko• Preliminarno: logaritmi, vjerojatnost, itd.• Povijesni
pregled
i značajni
istraživači
• Model komunikacijskog
sustava• Što
je informacija?
• Prenos
informacija, informacijske
mjere• Komunikacijski
kanali
• Kodiranje
informacije• Kompresija• Sigurni
prenos
informacija
• Teorija
informacija
u obradi
podataka• Tok
informacija
u biološkom
sustavu
11 February 2012 4
Information theory
“Information theory is a branch of applied mathematics
and electrical engineering
involving the quantification of information. Historically, information theory was developed by Claude E. Shannon
to find fundamental limits on
compressing and reliably storing and communicating
data.”
From Wikipedia
Claude E. Shannon (1916-2001)
11 February 2012 Zeljko Jericevic, Ph.D. 5
Information theory“Since its inception it has broadened to find applications in many other areas, including statistical inference, natural language processing, cryptography
generally,
networks other than communication networks —
as in neurobiology,[1]
the evolution[2]
and function[3]
of
molecular codes, model selection[4]
in ecology, thermal physics,[5]
quantum computing, plagiarism detection[6]
and other forms of data analysis.[7]”
From Wikipedia
11 February 2012 Zeljko Jericevic, Ph.D. 6
Information theory historyKroz
povijest
informacijske
tehnologije
provlači se
činjenica
da
smo
neprestano
u potrazi
za
bržim
i boljim metodama
komunikacije, jer
glad za
informacijama
i
spremnost
ljudi
da
potroše
novac
u tu
svrhu
se čini
da nema
kraja.
Nije
slučajno
da
su
temelji
teorije
informacija
razvijeni u Bell Labs (istraživački
odjel
American Telephone &
Telegraph Company).Također
nije
slučajna
interakcija
između
Bell Labs &
Massachusetts Institute of Technology.
11 February 2012 Zeljko Jericevic, Ph.D. 7
Information theory historyOptički
telegraf:
1787 Joseph Chudy
1752-18131789 Claude Chappe
1763-1805
Napoleonska
era:10km između stanica,20s/znaku8min/210km
11 February 2012 8
Information theory historyClaude Chappe's
optical
telegraph on the Litermont near Nalbach, Germany
11 February 2012 Zeljko Jericevic, Ph.D. 9
Information theory history
Construction schematic of a Prussian optical telegraph (or semaphore) tower, C. 1835
11 February 2012 Zeljko Jericevic, Ph.D. 10
Information theory history
1832 Needle telegraph by Paul Schilling (1786-1837)
11 February 2012 Zeljko Jericevic, Ph.D. 11
Information theory history1833 Carl Friedrich Gauss (1777-1855) & Wilhelm Weber
(1804-1891) telegraph
11 February 2012 Zeljko Jericevic, Ph.D. 12
Information theory history1837 five needle
telegraphWilliam
Fothergill Cooke (1806-
1879)Charles
Wheatstone (1802-1875)
11 February 2012 13
Information theory historySamuel Morse
11 February 2012 Zeljko Jericevic, Ph.D. 14
Teorija
informacije1838 Morse je smislio
kod
za
prijenos
teksta
putem
telegrafa
koji
se i danas
koristi
u nekim
vrstama signalizacije
Morse je u tiskari
odredio
koja
se slova
najčešće
koriste u engleskom
tekstu
i konstruirao
kodove
tako
da
najčešće
upotrebljavana
slova
imaju
najkraće kodovePostigao
je uštedu
u dužini
poruke
od
otprilike
15%
Morse je empirijski
smislio
kod
po
principima
koje
je kasnije
teorijski
definirao
David Huffamn
11 February 2012
Information theory historySamuel F.B. Morse (1791-1872)Slovo
e je najčešće
upotrebljavano
11 February 2012 16
Information theory historyMorseov
telegraf, patentiran
1837, u Europi
prihvaćen
kao
standard 1851 (osim
GB)"What hath God wrought", a message in American Morse code
sent by Samuel F. B. Morse
to officially open the Baltimore- Washington telegraph
line on May 24, 1844
11 February 2012 Zeljko Jericevic, Ph.D. 17
Information theory history
11 February 2012 18
Information theory history19th Century map
showing the early telegraph cables which connected Britain with the rest of the World.
Zeljko Jericevic, Ph.D. 19
Information theory historyMajor telegraph lines in 1891
11 February 2012 Zeljko Jericevic, Ph.D. 20
Teorija
informacijeTomas Edison (1847-1931)
1874 Edison je uveo
kvadripleksni telegraf
s četiri
razine
jakosti
struje. S
ovakvim
uređajem
bilo
je moguće slati dvije
poruke
istodobno.
Empirijski
je utvrđeno
da
korištenjem dvostruko
više
simbola
možemo
prenositi
dvostruko
više
poruka.
21
Teorija
informacijeKvadripleksni
telegraf
s četiri
razine
jakosti
struje
Current Message 1 Message 2
+3 on on+1 off on-1 off off-3 on off
11 February 2012 Zeljko Jericevic, Ph.D. 22
Information theory history1866 Trans-Antlantik
kabel
1902 Trans-Pacifik
kabelOn 27 January 2006, Western Union
discontinued all
telegram and commercial messaging services, though it still offered its money transfer services.
11 February 2012 23
Information theory history
His early theoretical work on determining the bandwidth requirements for transmitting information laid the foundations for later advances by Claude Shannon, which led to the development of information theory.In 1927 Nyquist
determined that the number of independent pulses that could be put through a telegraph channel per unit time is limited to twice the bandwidth
of the channel. Nyquist
published his results in the paper Certain topics in Telegraph Transmission Theory (1928). This rule is essentially a dual
of what is now known as the Nyquist–Shannon sampling theorem.
From Wikipedia
Harry Nyquist
(1889-1976)
11 February 2012 Zeljko Jericevic, Ph.D. 24
Information theory history1917 Harry Nyquist
počinje
raditi
u American Telephone and
Telegraph Company nakon
doktoriranja
na
Yale University.Bavi
se brzinama
prenosa
telegrafskog
signala
i 1924 prvi
kvantitativno
definira
odnos
izmedu
empirijskog
opažanja
o brzini
prenosa
signala. Pokazao
je da
slanjem
k simbola
u
sekundi, gdje
k može
poprimiti
jednu
od
m različitih
vrijednosti možemo
postići
teorijsku
brzinu
prijenosa
W
2ln [ bit sec ]W k m
11 February 2012 Zeljko Jericevic, Ph.D.
Information theory history
Nyquist
je teorijski objasnio
ono
što
je
Edison empirijski učinio
sa
svojim
kvadripleksnim telegrafom
(m su
nivoi
struje, log2 m je faktor povećanja
brzine
slanja
znakova.
2log1 02 13 1.64 28 3
16 4
m m
11 February 2012 Zeljko Jericevic, Ph.D. 26
Teorija
informacijeHarry Nyquist
je proučavao
frekvencijske
komponente
signala
(Furierova
transformacija) i otkrio
da
je za prijenos
i rekonstrukciju
signala
ograničenog
frekvencijskog
pojasa
potreban
broj
uzoraka
dvostruko veći od najveće
frekvencije
signala, što
je kasnije
dokazao
Shannon (Nyquist-Shannon teorem).Nyquist
frekvencija
u Furierovoj
transformaciji
je
naviša
prisutna
frekvencija
u signalu
ograničenog frekvencijskog
pojasa
11 February 2012 27
Information theory historyRalph Hartley (1888-1970)
Hartley, R.V.L., "Transmission of Information", Bell System Technical Journal, July 1928, pp.535–563. http://www.dotrose.com/etext/90_Miscellaneo
us/transmission_of_information_1928b.pdfHartley, R.V.L., "A More Symmetrical Fourier Analysis Applied to Transmission Problems," Proc. IRE 30, pp.144–150 (1942).Discrete Hartley transform
From Wikipedia
11 February 2012 Zeljko Jericevic, Ph.D. 28
Teorija
informacijeRalph Hartley je proučavao
problem komunikacije
u
smislu
izvora
i prijamnika. Definirao
je veličinu
H – informaciju
sadržanu
u poruci
Nyquist
i Hartley su
Shannonovi
prethodnici
i bili
su važni za njegov rad
lnH n s
Information theory history
Claude Elwood Shannon (April 30, 1916 –
February 24, 2001), an
American
electronic engineer
and mathematician, is known as "the father of information theory".Shannon is famous for having founded information theory with one landmark paper published in 1948.From Wikipediahttp://cm.bell-labs.com/cm/ms/what/shannonday/paper.html
Claude E. Shannon
11 February 2012 Zeljko Jericevic, Ph.D. 30
Information theory history
Claude E. Shannon
Model komunikacijskog
kanala
11 February 2012 Zeljko Jericevic, Ph.D. 31
Informacijska entropija H
21
20
ln
ln 0lim
je vjerojatnost stanja
I
i ii
p
i
H p p
p p
p i
32
Information theory historyClaude E. Shannon… he is also credited with founding both digital computer
and digital
circuit
design theory in 1937, when, as a 21-year-old master's student at MIT, he wrote a thesis demonstrating that electrical application of Boolean algebra
could
construct and resolve any logical, numerical relationship. It has been claimed that this was the most important master's thesis of all time.From Wikipediahttp://dspace.mit.edu/bitstream/handle/1721. 1/11173/34541425.pdf?sequence=1
11 February 2012 Zeljko Jericevic, Ph.D. 33
Information theory history1950 Richard Wesley Hamming (1915-1998) prvi
specificirao
kodiranje
koje
omogućuje
automatsku korekciju
nekih
grešaka
11 February 2012 34
Information theory history
David Huffman is best known for his legendary Huffman code, a compression
scheme for lossless
variable length encoding. It was the result of a term paper he wrote while a graduate student at the Massachusetts Institute of Technology
(MIT), where he earned a D.Sc.
degree on a thesis named The Synthesis of Sequential Switching Circuits, advised by Samuel H. Caldwell
(1953)."Huffman Codes" are used in nearly every application that involves the compression
and transmission of digital data, such as fax machines, modems, computer networks, and high-definition television
(HDTV), to name a few.From Wikipedia
David A. Huffman (1925-1999)
11 February 2012 Zeljko Jericevic, Ph.D. 35
Information theory historyhttp://www.its.bldrdoc.gov/fs-1037/fs-1037c.htmFederal Standard 1037C
11 February 2012 Zeljko Jericevic, Ph.D. 36
Information theory“A key measure of information in the theory is known as entropy, which is usually expressed by the average number of bits needed for storage or communication. Intuitively, entropy quantifies the uncertainty involved when encountering a random variable. For example, a fair coin flip (2 equally likely outcomes) will have less entropy than a roll of a dice (6 equally likely outcomes).”
From Wikipedia
11 February 2012 Zeljko Jericevic, Ph.D. 37
Informacijska entropija H
21
20
ln
ln 0lim
je vjerojatnost stanja
I
i ii
p
i
H p p
p p
p i
11 February 2012 Zeljko Jericevic, Ph.D. 38
Information theory“Applications of fundamental topics of information theory include lossless data compression
(e.g. ZIP files),
lossy
data compression
(e.g. MP3s), and channel coding (e.g. for DSL
lines). The field is at the intersection of
mathematics, statistics, computer science, physics, neurobiology, and electrical engineering.”
From Wikipedia
11 February 2012 Zeljko Jericevic, Ph.D. 39
Information Entropy“Shannon's entropy represents an absolute limit on the best possible lossless
compression
of any communication, under
certain constraints: treating messages to be encoded as a sequence of independent and identically-distributed random variables, Shannon's source coding theorem
shows that, in the limit, the
average length of the shortest possible representation to encode the messages in a given alphabet is their entropy divided by the logarithm of the number of symbols in the target alphabet.”
From Wikipedia
11 February 2012 Zeljko Jericevic, Ph.D. 40
Information Entropy“A fair coin
has an entropy of one bit. However, if the coin is not
fair, then the uncertainty is lower (if asked to bet on the next outcome, we would bet preferentially on the most frequent result),
and thus the Shannon entropy is lower. ... A long string of repeating characters has an entropy rate of 0, since every character is predictable. The entropy rate
of English text is
between 1.0 and 1.5 bits per letter,[1]
or as low as 0.6 to 1.3 bits per letter, according to estimates by Shannon based on human experiments.”
From Wikipedia
11 February 2012 Zeljko Jericevic, Ph.D. 41
VježbaAko
je prosječna
entropija
slova
u engleskom
jeziku
1.6
bit, a tekstovi
se čuvaju
u formatu
gdje
svaki
znak zauzima
1 byte, kolika
je teorijski
maksimalna
kompresija?
11 February 2012 Zeljko Jericevic, Ph.D. 42
VježbaAko
je prosječna
entropija
slova
u engleskom
jeziku
1.6
bit, a tekstovi
se čuvaju
u formatu
gdje
svaki
znak zauzima
1 byte, kolika
je teorijski
maksimalna
kompresija?
8/1.6 = 5
11 February 2012 Zeljko Jericevic, Ph.D. 43
Vježba
–
histogram programfor (i=0; i<ID; i++) his[i] = 0.0;total = 0.0;
while ((c=getc(stdin)) != EOF) {his[c] += 1.0;total += 1.0;}
for (i=0; i<ID; i++) printf("%03d %e\n",i,his[i]);printf("%g\n",total);
11 February 2012 Zeljko Jericevic, Ph.D. 44
Vježba
–
entropy programlog2 = log((double) 2.0);max_entropy
= log((double)i)/log2;
for (j=0; j<i; j++) {if (his[j] > 0.0) {
hold = his[j]/total; entropy -= hold*log(hold)/log2;}
}
11 February 2012 Zeljko Jericevic, Ph.D. 45
Vježba
–
histogram programSa web stranice
projekta
Gutemberg
povucite
engleski
tekst
H.G.
Wells-a, “The Time Machine”, analizirajte
ga
uz
pomoću histogram programa
da
izračunate
informacijsku
entropiju.
Prevedite
tekst
u doc i pdf
format i ponovo
izračunajte informacijsku
entropiju.
Kompresirajte
tekst
pomoću
nekoliko
različitih
programa: zip, rar, gzip, …
i izračunajte
informacijsku
entropiju
kompresiranih
datoteka.Komentirajte
rezultate.
http://www.gutenberg.org/files/35/35.txt
11 February 2012 Zeljko Jericevic, Ph.D. 46
Data, information, knowledge, wisdomWhere is the Life we have lost in living?
Where is the wisdom we have lost in knowledge? Where is the knowledge we have lost in information?
--
from T.S. Eliot, "Choruses from 'The Rock'"
11 February 2012 Zeljko Jericevic, Ph.D. 47
Data, information, knowledge, wisdomInformation is not knowledge,
Knowledge is not wisdom, Wisdom is not truth,
Truth is not beauty, Beauty is not love,
Love is not music, and Music is THE BEST.
--
from Frank Zappa, "Packard Goose"
11 February 2012 Zeljko Jericevic, Ph.D. 48
Piramidna
prezentacija
MudrostZnanjeInformacijaPodaci
11 February 2012 Zeljko Jericevic, Ph.D. 49
Piramidna
prezentacija
Evaluate any choice
Know how (useful info.)Who, what, where,how many?Exists
11 February 2012 Zeljko Jericevic, Ph.D. 50
Grafička
prezentacija
11 February 2012 Zeljko Jericevic, Ph.D. 51
DIKW
From Futurist