formatting and source codingcontents.kocw.net/kocw/document/2014/pusan/kimjongdeok/8.pdf ·...

53
Formatting and Source Coding 부산대학교 정보컴퓨터공학부 김종덕 ([email protected])

Upload: lamthu

Post on 13-Jun-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

강의의 목표

문자, 음성, 이미지 등의 Information Digital Data Formatting 하

는 주요 기법을 이해한다.

Code, Encoding/Decoding, CODEC

한글 코드 / PCM Modulation

대역폭과 정보 표현과의 관계를 이해한다.

Data를 효율적으로 표현하기 위한 Source Coding

Compressor/Decompressor, CODEC

멀티미디어 데이터의 크기

Digital Audio / Digital Video 압축

2

Digital Info. Digital Data

Coding Schemes

Encoding/Decoding, CODEC

Alphabet, Digits and other characters…

ASCII, EBCDIC, …

MIDI

Musical Instrument Digital Interface

음악과 관련한 정보인데 Digital Info. ?

한글 코드?

완성형, 조합형 ?

KSC-5601, UNICODE

3

한글 코드 (ANSI Code ? / UNICODE)

>> fid=fopen(‘song.txt’, ‘r’);

>> ansi_string = fread(fid);

>> fclose(fid);

>> uni_string = native2unicode(ansi_string);

>> unicode_start_code=[255, 254];

>> fid2=fopen('uni_song.txt', 'w');

>> fwrite(fid2, unicode_start_code, 'uint8');

>> fwrite(fid2, uni_string, 'uint16');

>> fclose(fid2);

4

Data Acquisition System

Data Acquisition H/W

At the heart of any data acquisition system lies the data acquisition hardware. The main function of this hardware is to

convert analog signals to digital signals, and to convert digital signals to analog signals. (ADC / DAC)

Sensor and Actuators (Transducers)

Sensors and actuators can both be transducers. A transducer is a device that converts input energy of one form into

output energy of another form. For example, a microphone is a sensor that converts sound energy (in the form of

pressure) into electrical energy, while a loudspeaker is an actuator that converts electrical energy into sound energy.

Signal Conditioning H/W

Sensor signals are often incompatible with data acquisition hardware. To overcome this incompatibility, the signal

must be conditioned. For example, you might need to condition an input signal by amplifying it or by removing

unwanted frequency components. Output signals might need conditioning as well.

Physical Phenomena

Sensor

Actuator

Signal Conditioning

Acquisition H/W

Computer

5

Analog Info. Digital Data

소리

PCM (Pulse Coded Modulation)

Sampling Rate, Bits/Sample, Channel

이미지

Pixel, RGB, Bits/Pixel

VGA (640*480), QVGA, CIF(352*288), QCIF

동영상

Frame, 24/30 FPS

480P, 720P, 1080i, 1080P …

6

Pulse Code Modulation (PCM)

Nyquist Sampling Theory

If a signal is sampled at regular intervals at a rate higher than twice the highest signal freque

ncy, the samples contain all the information of the original signal

Ex) Voice data limited to below 4000Hz Require 8000 sample per second

Analog samples : Pulse Amplitude Modulation (PAM)

Each sample assigned digital value - Quantization

Quantizing error or noise

Approximations mean it is impossible to recover original exactly

Ex) 8 bit sample gives 256 levels

• 8000 samples per second of 8 bits each gives 64kbps

7

PCM Example

8

멀티미디어 정보의 크기

48Khz, 16bits/Sample, Stereo (2 Channel) Digital Audio를 1시간 동

안 녹음할 경우 발생하는 정보의 양은?

CD-ROM 에 기록할 수 있는 정보의 양?

VGA, 16bits/Pixel, 30FPS 비압축 동영상의 초당 정보 발생량은?

1시간 동안 녹음할 경우 발생하는 정보의 양은 ?

고화질 멀티미디어 방송?

5.1 Channel, 720P, 1080i, 1080P

DVD, BlueRay, HD-DVD

압축은 필수 요소

9

멀티미디어 압축

Audio 압축 기술

Digital Speech Coding

낮은 전송 자원 소모가 주목적으로 높은 압축율이 중요

Human Vocal System의 특성을 활용, Vocoder

핵심 알고리즘 및 기술 – LPC (Linear Predictive Coding) & CELP (Code Exited

Linear Prediction)

주요 압축 표준 – AMR(Adaptive Multi-Rate), G722, G723.1, G726, G728,

G729…

Digital Audio Coding

높은 압축율이 좋지만 좋은 음질을 재생해낼 수 있는 것이 중요

Human Auditory System의 특성을 활용 – Psychoacoustic Model

핵심 요소 – Hearing Sensitivity, Frequency Masking, Temporal Masking

주요 압축 표준 – MPEG-1 Audio Layers (1, 2, 3), Dolby AC3, MPEG-2

Advanced Audio Coding (AAC), MPEG-4 AAC (HE-AAC), …

11

Multimedia compression and container formats (wiki)

Video

ISO/IECMJPEG · Motion JPEG 2000 · MPEG-1 · MPEG-2 (Part 2) · MPEG-4 (Part 2/ASP · Part 10/AVC) · HEVC

ITU-T H.120 · H.261 · H.262 · H.263 · H.264 · HEVC

othersAVS · Bink · CineForm · Cinepak · Dirac · DV · Indeo · Microsoft Video 1 · OMS Video · Pixlet · RealVideo ·RTVideo · SheerVideo · Smacker · Sorenson Video & Sorenson Spark · Theora · VC-1 · VC-2 · VC-3 · VP3 ·VP6 · VP7 · VP8 · WMV

Audio

ISO/IECMPEG-1 Layer III (MP3) · MPEG-1 Layer II (Multichannel) · MPEG-1 Layer I · AAC · HE-AAC · MPEG Surround ·MPEG-4 ALS · MPEG-4 SLS · MPEG-4 DST · MPEG-4 HVXC · MPEG-4 CELP

ITU-T G.711 · G.718 · G.719 · G.722 · G.722.1 · G.722.2 · G.723 · G.723.1 · G.726 · G.728 · G.729 · G.729.1

othersAC-3, AMR, AMR-WB, AMR-WB+, Apple Lossless, ATRAC, CELT, DRA, DTS, EVRC, EVRC-B, FLAC, GSM-HR, GSM-FR, GSM-EFR, iLBC, iSAC, Monkey's Audio, TTA (True Audio), MT9, A-law, μ-law, Musepack, Nellymoser, OptimFROG, OSQ, QCELP, RealAudio, RTAudio, SD2, SHN, SILK, Siren, SMV, Speex, SVOPC, TwinVQ, VMR-WB, Vorbis, WavPack, WMA

Image

ISO/IEC/ITU-T JPEG · JPEG 2000 · JPEG XR · lossless JPEG · JBIG · JBIG2 · PNG · TIFF/EP · TIFF/IT

others APNG · BMP · DjVu · EXR · GIF · ICER · ILBM · MNG · PCX · PGF · TGA · QTVR · TIFF · WBMP · WebP

Cont-ainer

ISO/IECMPEG-PS · MPEG-TS · ISO base media file format · MPEG-4 Part 14 · Motion JPEG 2000 · MPEG-21 Part 9

ITU-T H.222.0 · T.802

others3GP and 3G2 · AMV · ASF · AIFF · AVI · AU · Bink · DivX Media Format · DPX · EVO · Flash Video · GXF ·M2TS · Matroska · MXF · Ogg · QuickTime File Format · RealMedia · REDCODE RAW · RIFF · Smacker ·MOD and TOD · VOB · WAV · WebM

12

Digital Speech Coding

In relation to the opening and closing vibrations of the vocal cords

as air blows over them, speech signals can be roughly categorized

into two types of signals: voiced speech and unvoiced speech.

The Human Speech Production System

13

Voiced vs. Unvoiced Speech

Voiced

Speech

Unvoiced

Speech

14

Linear Predictive Coding

A speech signal s(n) can be approximated as an auto-regressive (AR)

formulation

The coefficients {𝑎𝑘} are derived on the basis of a 20~30ms block of data (frame)

𝑠 𝑛 = 𝑒 𝑛 +

𝑘=1

𝑝

𝑎𝑘𝑠(𝑛 − 𝑘)

15

Digital Audio Coding – Auditory System

1) The outer ear directs sounds through the ear canal towards the eardrum2) The middle ear transforms sound pressure waves into mechanical movement on three small bones called “ossicles”

(the hammer, anvil, and stirrup)3) The inner ear houses the cochlea, a spiral-shaped structure for human hearing which sits in an extremely sensitive

membrane called the basilar membrane. The cochlea converts the middle ear’s mechanical movements to basilar membrane movement and eventually into the firing of auditory neurons, which, in turn send electrical signals to the brain

16

Frequency(Hz)

Hearing Sensitivity

17

Hearing Sensitivity의 활용

If we uniformly quantize each audio sample with 12 bits, the resulting quantization noise can be as low as -26 dB, which is far below the threshold of hearing.

We can divide the audible frequency range (20Hz to 20Khz) into several bands, and the audio sample in different bands can be quantized with different numbers of bits to accommodate different tolerances of quantization noise.

18

Frequency Masking

19

Frequency Masking

20

Temporal Masking

A weak sound emitted soon after the end of a louder sound is masked by the louder sound. (Post-masking)

Even a weak sound just before a louder sound can be masked by the louder sound. (Pre-masking)

The combined frequency and temporal masking effect

21

Frequency Domain Analysis ?

앞서 살펴본 Digital Speech/Audio Coding 기술 적용을 위해서는

Audio 신호에 대한 스펙트럼(주파수) 분석이 필요

Fourier Analysis

22

Digital Audio Standards

MPEG-1 Audio Layer I, II, III

Layer I : MP1

• one of three audio formats included in the MPEG-1 standard. While supported by most media

players, the codec is considered largely outdated, and replaced by MP2 or MP3.

Layer II : MP2, (sometimes incorrectly called MUSICAM)

• While MP3 is much more popular for PC and internet applications, MP2 remains a dominant

standard for audio broadcasting.

• 우리의 지상파 DMB (T-DMB)의 원조라고 할 수 있는

Eureka-147이라 불리는 DAB(Digital Audio Broadcasting)의 기본 Audio Codec

• 유럽의 DTV 표준인 DVB(Digital Video Broadcasting)의 기본 Audio Codec

• MPEG-2 Audio Layer II extension을 통해 Multi-Channel을 지원

Layer III : MP3

• a patented digital audio encoding format using a form of lossy data compression. It is a common

audio format for consumer audio storage, as well as a de facto standard of digital audio

compression for the transfer and playback of music on digital audio players.

23

Digital Audio Standards

Dolby AC3 Audio Codec

Multi-Channel Support

http://en.wikipedia.org/wiki/Dolby_AC3

24

Digital Audio Standards

Advanced Audio Coding (AAC)

Designed to be the successor of the MP3 format, AAC generally achieves better sound

quality than MP3 at similar bit rates

• AAC has been standardized by ISO and IEC, as part of the MPEG-2 and MPEG-4 specifications.

Part of the AAC known as High-Efficiency Advanced Audio Coding (HE-AAC) which is part of

MPEG-4 Audio is also adopted into digital radio standards like DAB+ and Digital Radio Mondiale,

as well as mobile television standards DVB-H and ATSC-M/H.

• AAC supports inclusion of 48 full-bandwidth (up to 96 kHz) audio channels in one stream plus 16

low frequency effects (LFE, limited to 120 Hz) channels, up to 16 "coupling" or dialog channels,

and up to 16 data streams. The quality for stereo is satisfactory to modest requirements at 96

kbit/s in joint stereo mode; however, hi-fi transparency demands data rates of at least 128 kbit/s

(VBR). The MPEG-2 audio tests showed that AAC meets the requirements referred to as

"transparent" for the ITU at 128 kbit/s for stereo, and 320 kbit/s for 5.1 audio.

• AAC is also the default or standard audio format for iPhone, iPod, iPad, Nintendo DSi, iTunes,

DivX Plus Web Player and PlayStation 3. It is supported on PlayStation Portable, Wii (with the

Photo Channel 1.1 update installed for Wii consoles purchased before late 2007), Sony Walkman

MP3 series and later, mobile phones made by Sony Ericsson and Nokia and Android-based mobile

phones.

25

VIDEO COMPRESSION

영상처리의 과정

Analog / Digital Convert

RGB-YUV Convert

Subsampling

Encoding ( H.264 / MPEG4-AVC)

27

RGB & YUV

YUV

The YUV model defines a color space in terms of one luma (Y) and two chrominance (UV

) components. The YUV color model is used in the PAL, NTSC, and SECAM composite c

olor video standards. Previous black-and-white systems used only luma (Y) information a

nd color information (U and V) was added so that a black-and-white receiver would still be

able to display a color picture as a normal black and white picture.

YUV models human perception of color in a different way from the standard RGB model u

sed in computer graphics hardware.

Y stands for the luma component (the brightness) and U and V are the chrominance (colo

r) components. The YPbPr color model used in analog component video and its digital ver

sion YCbCr used in digital video are more or less derived from it (Cb/Pb and Cr/Pr are de

viations from grey on blue-yellow and red-cyan axes, whereas U and V are blue-luminanc

e and red-luminance differences), and are sometimes inaccurately called "YUV". The YIQ

color space used in the analog NTSC television broadcasting system is related to it, altho

ugh in a more complex way.

28

컬러 공간(Color space)

YUV

명도(휘도)와 채도로 나타낸 색상계

Y : 명도 (Luminance)

U : 채도( 청색 계열 : Y – B )

V : 채도 (적색 계열 : Y – R )

RGB ↔ YUV 변환가능

사용 이유

명도에 좀 더 중점을 두기 위하여

• 사람의 눈은 색상보다 밝기에 민감

Subsampling

• 명도(Y)는 유지시키고 색깔정보(U, V)의

정보량을 줄임

29

Subsampling Subsampling

Y, U, V의 비율을 다르게 해서 추출하는 방식

4:4:4 샘플링 방식은 비손실 압축

4:2:2 (카메라), 4:2:0 (다양한 압축기술)

4:2:0의 경우 기존 4:4:4보다 50% 압축

1 2 3 4

65 7 8

1 2 3

65 7 8

1 2 3 4

6 7 8

4:4:4

5

Y

U

V4

24 samples

1 2 3 4

1 2 3 4

4:2:2

1 2

3 4Y

U

V

16 samples

22 11

4:2:0

1 2Y

U V

12 samples

30

영상 압축(Video Compression)의 방법

영상압축의

방법

공간적 압축

(Spatial

Model)

확률적 압축

(Entropy

Model)

시간적 압축

(Temporal

Model)

31

공간적 압축 (Spatial model)

공간적 압축(Spatial model)

공간주파수(Spatial frequency)

• 공간에서의 색이나 구조의 변화

DCT(Discrete Cosine transform)

• 화소 값 -> 공간주파수

• 푸리에 변환과 유사한 변환

• 일반 영상의 경우, DCT의 값들이 저주파 쪽으로 몰리는 성질

공간주파수가 낮다 공간주파수가 높다

Image Block DCT Coefficient Matrix

DCT

32

Discrete Cosine Transform

For the reduction of spatial redundancy

convert the spatial representation of an 8*8 image to the

frequency domain

Similar to FFT

otherwise

xxC

jyixjiDCTjCiCyxpixel

jyixyxpixeljCiCjiDCT

i j

x y

1

0 2

1

)(

where

]16

)12(cos[]

16

)12(cos[),()()(

4

1),(

]16

)12(cos[]

16

)12(cos[),()()(

4

1),(

7

0

7

0

7

0

7

0

33

Example88 Source Image Block DCT Coefficient Matrix

DCT

Quantization Table

Quantized Coefficient Matrix

Quantization

ZigZag Scanning & RLE

34

시간적 압축(Temporal model)

시간적 압축(Spatial model)

시간적 중복 (Temporal Redundancy)

• 텔레비전 : 약 30 fps (frame per second) , 영화 : 약 24 fps

• 사물의 움직임에 비해 1 frame당 시간은 매우 짧음

• 따라서 영상에서는 시간적 중복이 많이 일어남

35

시간적 압축(Temporal model)

시간적 압축(Spatial model)

움직임 예측(Motion Estimation)

• 현재의 블록을 과거의 프레임에서 찾는 과정

움직임 보상(Motion Compensation)

http://en.wikipedia.org/wiki/Motion_compensation

• 움직임 벡터 (Motion Vector) 를 구하는 과정

과 거 현 재

36

시간적 압축(Temporal model)

시간적 압축(Spatial model)

움직임 벡터와 기준 프레임 사용하여

현재의 프레임을 복구

과 거 현 재

37

Group of Pictures

I frame

transformed without using prediction

restarting point for prediction

random access point

P frame

unidirectional prediction

B frame

bidirectional prediction

not used for predicting other frames

38

Group Of Picture1 2 3 4 5 6 7 8 9 10 11 12 13

Group Of Picture

PI B

1 5 2 3 4 9 6 7 8 13 10 11 12

Group Of Picture

PI B

재생 순서

코딩 순서 & 전송 순서

39

Representation - Entropy

The Concept of Entropy from Information Theory

For a given set of symbols, 𝐴 = 𝑎1, 𝑎2, . . . , 𝑎𝑁

Each symbol 𝑎𝑛 is associated with an event or an observation that has

occurrence probability 𝑝𝑛 separately; 𝑝𝑛 ∈ 𝑝1, 𝑝2, . . . , 𝑝𝑁

The Information measure 𝐼(𝑎𝑛) of the symbol 𝑎𝑛 is defined as

𝐼(𝑎𝑛) = − log𝑏 𝑝𝑛 = −log𝑏1

𝑝𝑛

The average amount (expected value) of information we can get from each

symbol emitted in the stream from the source is defined as the entropy 𝐻(𝐴)

for the discrete set of probabilities 𝑃 ∈ 𝑝1, 𝑝2, . . . , 𝑝𝑁 :

𝐻 𝐴 = 𝐼 𝐴 =

𝑛=1

𝑁

𝑝𝑛 log𝑏1

𝑝𝑛

http://en.wikipedia.org/wiki/Information_entropy

40

Entropy Coding

Entropy Coding

A coding scheme that assigns codes to symbols so as to match code lengths

with the probabilities of the symbol.

The more frequently, the shorter codeword

• According to Shannon’s source coding theorem, the optimal code length for a

symbol is log𝑏1

𝑝; p is the probability of the input symbol

Example: Huffman Coding, Lempel-Zip Coding

ex) 2 bits per sample -> 1.6 bits per sample

Run-Length Encoding

ex) 000000001122222 ==> (0;8)(1;2)(2;5)

Input Codeword Frequency (Prob.) Output Codeword

00 0.6 0

01 0.15 100

10 0.2 11

11 0.05 101

41

SOFTWARE DEFINED RADIO, VISIBLE LIGHT COMMUNICATION, ACOUSTIC COMMUNICATION

The GNU Software Radio

http://www.gnuradio.org

GNU Radio is a free & open-source software development toolkit that

provides signal processing blocks to implement software radios. It can be

used with readily-available low-cost external RF hardware to create software-

defined radios, or without hardware in a simulation-like environment. It is

widely used in hobbyist, academic and commercial environments to support

both wireless communications research and real-world radio systems.

Hardware - USRP

The Universal Software Radio Peripheral is the recommended device for

interfacing GNU Radio with the real world. The USRP has been developed

especially for GNU Radio, and is available from Ettus Research.

43

Exploring GNU Radio

http://www.gnu.org/software/gnuradio/doc/exploring-

gnuradio.html

44

USRP

45

Listening to FM Radio using GNU Radio

http://www.linuxjournal.com/article/7505

Daughter Board (TVRX 50Mhz to 870Mhz Receiver)

Bandpass signal with 6Mhz bandwidth at IF (Intermediate Frequency) 5.75Mhz.

ADC

Up to 64M samples per seconds, 12 bits/sample

FPGA

Digital Down Converter

46

Listening to FM Radio using GNU Radio

Angle Modulation (Phase Modulation, Frequency Modulation)

𝑠 𝑡 = 𝐴𝑐 ⋅ cos[2𝜋𝑓𝑐𝑡 + 𝜙 𝑡 ]

PM : 𝜙 𝑡 = 𝑘 ⋅ 𝑚(𝑡)

FM : 𝜙′ 𝑡 = 𝑘 ⋅ 𝑚(𝑡)

Instantaneous frequency

48

Listening to FM Radio using GNU Radio

𝑠 𝑡 = 𝐴𝑐 ⋅ cos[2𝜋𝑓𝑐𝑡 + 𝜙 𝑡 ]에서 𝜙′ 𝑡 = 𝑘 ⋅ 𝑚(𝑡) 추출하기

Digital Down Converter

IF (Intermediate Frequency) Baseband 로; 2𝜋𝑓𝑐 없애기

FPGA에서 수행

cos[2𝜋𝑓𝑐𝑡 + 𝜙 𝑡 ] ⋅ cos 2𝜋𝑓𝑐𝑡 =1

2⋅ (cos 4𝜋𝑓𝑐𝑡 + 𝜙 𝑡 + cos 𝜙 𝑡 )

Quadrature Demodulator

Differential, Difference ?

)( 1tie

)( 2tie

)()())()(( 1212 titittieee

49

GNU Radio Applications

In addition to the examples discussed above, GNU Radio comes with a complete HDTV transmitter and

receiver, a spectrum analyzer, an oscilloscope, concurrent multichannel receiver and an ever-growing

collection of modulators and demodulators.

Projects under investigation or in progress include:

A TiVo equivalent for radio, capable of recording multiple stations simultaneously.

Time Division Multiple Access (TDMA) waveforms.

A passive radar system that takes advantage of broadcast TV for its signal source

TETRA transceiver.

Digital Radio Mundial (DRM).

Software GPS.

Distributed sensor networks.

Distributed measurement of spectrum utilization.

Amateur radio transceivers.

Ad hoc mesh networks.

RFID detector/reader.

Multiple input multiple output (MIMO) processing.

50

Visible Light Communication

가시광 통신

http://blog.skbroadband.com/938

http://www.disneyresearch.com/project/visible-light-communication/

51

Communication over Screen-Camera Links?

2D barcodes are everywhere !!!

“Transmitting” information (vs linking)

ReceiverTransmitter

Original frame Single frame 2-frame mix

Mixing pattern varies by line

52

Acoustic Communication / Soundcode

Acoustic Communication ?

자연계에서 일반적으로 쓰이는 전통적 통신 방법

기술적 가치 ? 수중 통신 (Underwater Communication)

스마트 폰과 연계? - http://digxtal.egloos.com/v/2654784

2 Approaches

Sonic Notify inserts ultra-high frequency sounds to the carrier audio. These frequencies

are beyond the hearing range of most people and thus people just perceive it as if there

were no alterations. https://sonicnotify.com/

Intrasonics modifies the carrier audio and adds artificial echoes to it. The human brain

perceives these as natural echoes and just ignores them as if there are a few insignificant

objects that bounces the original sound.

53