media processing – audio part

45
1 Media Processing – Audio Part Dr Wenwu Wang Centre for Vision Speech and Signal Processing Department of Electronic Engineering [email protected] http://personal.ee.surrey.ac.uk/Personal/W.Wang/te aching.html

Upload: dante

Post on 06-Jan-2016

40 views

Category:

Documents


0 download

DESCRIPTION

Media Processing – Audio Part. Dr Wenwu Wang Centre for Vision Speech and Signal Processing Department of Electronic Engineering [email protected] http://personal.ee.surrey.ac.uk/Personal/W.Wang/teaching.html. Tentative outline. Week 6: Fundamentals of audio - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Media Processing – Audio Part

1

Media Processing – Audio Part

Dr Wenwu Wang

Centre for Vision Speech and Signal Processing

Department of Electronic Engineering

[email protected]

http://personal.ee.surrey.ac.uk/Personal/W.Wang/teaching.html

Page 2: Media Processing – Audio Part

2

Tentative outline

Week 6: Fundamentals of audio

Week 7: Audio acquiring, recording, and standards

Week 8: Audio processing, coding, and standards

Week 9: Audio production and reproduction

Week 10: Audio perception and audio quality assessment

Page 3: Media Processing – Audio Part

3

Fundamentals of audio

What is sound

Physical characteristics of audio

Amplitude, frequency, envelope, phase, etc

Pure tone versus complex sounds

Sound pressure level

Haas effect, inverse square law

Free and reverberant sound field

Audio signal chain

Early recording systems (mechanical, electrical, and more)

Modern recording chain (stereo, multitrack)

Broadcast distribution

Concepts and topics to be covered:

Page 4: Media Processing – Audio Part

4

What is sound Sound is produced by a vibrating source around which the carrying media

(e.g. air) is caused to move. From the vibrating source, a sound wave radiates omnidirectionally away from itself, and the sound energy is transferred to the carrying media through compressions and rarefactions (similar to waves moving on the surface of the sea).

Source: Francis Rumsey and Tim McCormick (1994)

Page 5: Media Processing – Audio Part

5

Displaying a sound wave Time domain method: such as waveform, using oscilloscope.

Frequency domain method: such as spectrum, using spectrum analyser.

Source: Francis Rumsey and Tim McCormick (1994)

Demos spectrum analyser:http://www.youtube.com/watch?v=LIOUXr9v2RIhttp://www.youtube.com/watch?v=YS6jaqeXVok

Page 6: Media Processing – Audio Part

6

Basic characteristics of sound wave Frequency: the rate at which the vibrating source oscillates, quoted in hertz

(Hz) or cycles per second (cps). The human ear is able to perceive sounds with frequencies between approximately 20 Hz and 20 kHz (known as audio frequency range).

Amplitude: the amount of compression and rarefaction of the carrying media resulting from the motion of the vibrating source, related to the loudness of the sound when perceived by human ears.

Wavelength: the distance between two adjacent peaks of compression or rarefaction as the wave travels through the carrying media, often represented by the Greek letter lambda.

Velocity: the speed of the sound energy transfer. The velocity of sound in air is about 344 meters per second. The velocity of sound depends the carrying media and also its density. Wavelength = Velocity/Frequency

Envelope: the shape of sound wave evolution. It includes four main parts: the attack, the initial decay, the sustain (i.e. internal dynamic), and the final decay (i.e. release).

Phase: the time course of a signal relative to a reference arriving to a receiver (i.e. ear).

Page 7: Media Processing – Audio Part

7

A simple sound – pure tone (sine wave)

Source: Francis Rumsey and Tim McCormick (1994)

A demo for sound waves:http://www.youtube.com/watch?v=dbeK1fg1Rew&list=PLC9626B413EC82543

Page 8: Media Processing – Audio Part

8

Harmonic (repetitive) sounds

Source: Alan P. Kefauver and David Patschke (2007)

Harmonics

Different from the pure tone (single frequency, which is not commonly heard in real life), real music audio is much more complex and is a mixture of harmonic sounds (multiple frequencies).

Harmonic frequencies are the integer multiples of the fundamental frequency. For example, the first harmonic is equal to the fundamental frequency, the second harmonic (also known as the first overtone, or partial) is a doubling of the fundamental frequency, the third harmonic (i.e. the second overtone) is three times of the fundamental frequency, etc.

Page 9: Media Processing – Audio Part

9

Harmonic (repetitive) sounds

Source: Francis Rumsey and Tim McCormick (1994)

Modes of vibration of a stretched string: (a) fundamental, (b) second harmonic, (c) third harmonic.

Page 10: Media Processing – Audio Part

10

Harmonic (repetitive) sounds The existence of harmonics is due to the fact that most vibrating sources

are able to vibrate in a number of harmonic modes simultaneously.

As an example shown on the figure in the previous slide for a stretched string, it may be made to vibrate in any of a number of modes, corresponding to integer multiples of the fundamental frequency of vibration of the string.

The fundamental corresponds to the mode in which the string moves up and down as a whole, while the harmonics correspond to the mode in which the vibration pattern is divided into points of maximum and minimum motion along the string (called antinodes and nodes).

It is possible that the overtones in a sound spectrum are not exactly the integer multiple of the fundamental. In this case, they are more correctly referred to as overtones or inharmonic partials. This tends to happen for vibrating sources with a complicated shape, such in a bell or a percussion instrument.

Page 11: Media Processing – Audio Part

11

Harmonic (repetitive) sounds

Source: Francis Rumsey and Tim McCormick (1994)

Spectrum representation of a selection of some simple waveforms. (a) The sine wave contains only a single frequency. (b) The sawtooth wave consists of components at the fundamental and its integer multiples, with amplitudes gradually decreasing. (c) The square wave consists of frequencies at odd multiples of the fundamental.

Page 12: Media Processing – Audio Part

12

Sound complexity Sound like sine wave is not heard frequently. A person whistling or a wind

instrument can produce sounds similar to a sinusoidal wave.

Real sounds are usually made up of a combination of more complex waveforms. The more complex the waveform, the more like noise the sound becomes.

When the waveform has a highly random pattern, the sound becomes noise (as shown in the next page).

Page 13: Media Processing – Audio Part

13

Non-repetitive sound

Source: Francis Rumsey and Tim McCormick (1994)

Wave form and frequency spectra of non-repetitive waveforms, (a) pulse, (b) noise.

Page 14: Media Processing – Audio Part

14

Non-repetitive sound

Non-repetitive sound waves do not have a recognisable frequency as in harmonic sounds.

The frequency spectrum of non-repetitive sound consists of a collection of unrelated frequency components.

As shown in the examples, short pulses have continuous frequency spectra extending over a wide frequency range. The shorter the pulse, the wider its frequency spectrum but usually the lower its totally energy.

A completely random waveform is known as white noise, where the frequency amplitude and phase of components are equally probable and constantly varying. Its spectrum is flat, and it has equal energy for a given bandwidth.

Page 15: Media Processing – Audio Part

15

The envelope of an audio signal

Source: Alan P. Kefauver and David Patschke (2007)

Page 16: Media Processing – Audio Part

16

The envelope of an audio signal

The attack is the time for the sound generator (e.g. music instruments) to respond (i.e. vibrate) to a strike, which depends on the materials that the instruments are made from. A softly blown flute has a longer attack time than a sharply struck snare drum. Also, struck instruments have an attack time (in the 1- to 20- ms range) that is much shorter than wind instruments (in the 60- to 200- ms range)

The initial decay is caused by the cessation of the striking force that set the instruments to vibrate.

The sustain is referred to the leveling-off period of the sound, when the sound energy becomes stable.

The final decay occurs when the sound is no longer played by the player or by the resonance of the vibrating medium. It varies from as short as 250 ms to 100 s, depending on the vibrating medium. In addition, different frequencies decay in different rate, causing a change in sound timbre.

Page 17: Media Processing – Audio Part

17

What is phase

Source: Francis Rumsey and Tim McCormick (1994)

The height of the spot varies sinusoidally with the angle of rotation of the wheel, i.e. it rises and falls regularly as the circles rotates at a constant speed. The phase angle of a sine wave can be understood in terms of the number of degrees of rotation of the wheel.

Page 18: Media Processing – Audio Part

18

Phase difference versus angles

Source: Francis Rumsey and Tim McCormick (1994)

For the phase relationship between two waves of the same frequency, if each cycle is considered as corresponding to 360 degree, then the phase difference between the two waves can be determined by comparing the 0 degree point on one wave with the 0 degree point on the other. In the above example, the top signal is 90 degree out of phase with the below signal.

Page 19: Media Processing – Audio Part

19

Phase difference versus time delay

Source: Francis Rumsey and Tim McCormick (1994)

If two signals (of same frequency) start out at sources equidistant from a listener simultaneously, then they arrive at the listener in phase. If one source is more distant than the other, it will be delayed, and the phase relationship between the two is dependant on the amount of delay.

Sound travels about 30 cm per ms, therefore, if speaker 2 is 1 meter more distant than speaker 1, then it would be delayed by just over 3 ms. As the resulting phase also depends on the frequency of the sound, for a 330 Hz sound, the 3 ms delay would correspond to one wavelength, thus for the signals at this frequency, the delayed one would be in phase with the first one.

Page 20: Media Processing – Audio Part

20

Some facts about phase

Phase difference between signals can be caused by the time delays between them. It is usually measured as a number of degrees relative to some reference.

Phase is a relevant concept in the case of continuous repetitive waveforms, and has little meaning for impulsive or transient sounds where time difference is a more relevant quantity.

For a given time delay between two signals, the higher the frequency, the greater the phase difference.

It is possible that the phase difference between two signals is greater than 360 degree if the delay of the second signal is great enough to be more than one cycle.

Demo for sound and its characteristics:http://www.youtube.com/watch?v=cK2-6cgqgYA

Page 21: Media Processing – Audio Part

21

In phase & out of phase

Source: Francis Rumsey and Tim McCormick (1994)

In phase: If the compression (positive) and rarefaction (negative) half-cycles of two waves of the same frequency coincide exactly in time and space. If the two signals are added together, they will produce another signal of the same frequency but twice the amplitude.

Out of phase: If the positive half-cycle of one signal coincides with the negative half-cycle of the other. If these two are added together, they will cancel each other.

Partially out of phase: when they are added together, the phase and amplitude will be the point-by-point sum of the two, resulting partial addition or cancellation, and a phase somewhere between those of the original two.

Page 22: Media Processing – Audio Part

22

Decibel Decibel is a widely used unit in audio engineering to measure the ratio of one signal’s

amplitude to another’s. It is usually in preference to other units such as volts, watts, and other such absolute units, as it approximates more closely to one’s subjective impression of changes in the amplitude of a signal.

It helps to compress the range of values between the maximum and minimum sound levels encountered in real signals. For example, the human ear can perceive the sound intensities ranging from 0.000 000 000 001 W per square meters, to around 100 W per square meters, described in decibels, it is from 0 dB to 140 dB.

It is also used to describe the voltage gain of a device. For example, a microphone amplifier has a gain of 60 dB means a multiplication of the input voltage by a factor of 1000, i.e. 20 log 1000/1 = 60 dB.

Decibels applies to both acoustical sound pressure (analogous to electrical voltage), i.e. dB = 20log10(V1/V2) and sound power (analogous to electrical power), i.e. dB = 10log10(P1/P2).

If a signal level is quoted in decibels, then a reference must be given. Therefore, 0 dB means the signal concerned is the same level as the reference, instead of “no signal”. The reference level of sound pressure levels (SPL) is defined worldwide as 20muPa.

Page 23: Media Processing – Audio Part

23

Sound in electrical form In order to perform operations (such as amplification, recording, and

mixing) on sound, it is often converted from an acoustical form into an electrical form. This is achieved by a microphone.

The equivalence between the acoustical and electrical signals:

Voltage of the electrical signal at the output of the microphone <-> acoustic compression and rarefaction of the air

Current flowing down the wire from the microphone <-> acoustic wave carried in the motion of the air particles

Source: Francis Rumsey and Tim McCormick (1994)

Page 24: Media Processing – Audio Part

24

Sound power and sound pressure

Sound source generates a certain amount of power, measured in watts. Sound pressure is the effect of sound power on its surroundings. (like the relation between the heat energy generated by a radiator versus the temperature in the room).

Sound pressure level (SPL) is measured by Newtons per square meter, and often conveniently quoted in decibels. SPL = 0 dB is approximately equivalent to the threshold of hearing (the quietest sound perceived by an average person) at a frequency of 1 kHz.

The amount of acoustical power generated by real sound sources is surprisingly small, compared with the number of watts of electrical power involved in lighting a light bulb. Most daily sources generate fractions of a watt of sound power. An acoustical source radiating 20 watts would produce a sound pressure level close to the threshold of pain if the listener is close to the source.

Page 25: Media Processing – Audio Part

25

Haas effect & Inverse square law

Sound power decreases gradually over an increasingly large area when travelling away from the source.

The law of decreasing power per unit area (intensity) of a wavefront with increasing distance from the source is known as inverse square law, as the intensity of sound drops in proposition to the inverse square of the distance from the source. In practice, the amplitude drops about 6 dB for every doubling in distance from the source in free space (i.e. no nearby reflecting surfaces).

The Haas effect, also known as the precedence effect, states that the sound with shortest timing difference (reaching the two ears) appears louder and therefore closer. Haas found that when the delay was greater than 5 but less than 30 ms, the amplitude of the delayed sound has to be 10 dB louder than the non-delayed sound for them to be perceived as equal.

Page 26: Media Processing – Audio Part

26

Haas effect & Inverse square law

Source: Francis Rumsey and Tim McCormick (1994)

Page 27: Media Processing – Audio Part

27

Measuring sound pressure level

A SPL meter which is usually used to measure the level of sound at a particular point, is a device consisting of a microphone, amplifier, filters and a meter, as shown in the following figure, where the weighting filter is used to attenuate different frequency bands according to the sensitivity of human hearing.

Source: Francis Rumsey and Tim McCormick (1994)

Page 28: Media Processing – Audio Part

28

A professional sound level meter(B&K Corp.)

Source: Alan P. Kefauver and David Patschke (2007)

Page 29: Media Processing – Audio Part

29

Typical sound pressure levels in dBs

Source: Alan P. Kefauver and David Patschke (2007)

Page 30: Media Processing – Audio Part

30

Free and reverberant fields As the distance from a source increases direct sound level drops but reverberant

sound level remains roughly constant. The resultant sound level experienced at different distances from the source depends on the reverberation time of the room, as in a reverberant room the level of reflected sound is higher than in a ‘dead’ or ‘dry’ room (without reverberations).

Source: Francis Rumsey and Tim McCormick (1994)

Page 31: Media Processing – Audio Part

31

Free and reverberant fields

Source: Francis Rumsey and Tim McCormick (1994)

Direct sound: sound source that directly arrives to the listener without reflections.

Early reflections: the reflected sounds (echoes) from nearby surfaces in a room which arrives to the listener with first few milliseconds (up to 50 ms).

Late reflections: the echoes that reach listener’s ears after about 50 ms from the direct sound.

Page 32: Media Processing – Audio Part

32

Early mechanical recording machines

Source: Francis Rumsey and Tim McCormick (1994)

Page 33: Media Processing – Audio Part

33

Early mechanical recording machines (cont.)

The early recording machines, developed by Edison and Berliner in late 19th century, were completely mechanical or ‘acoustic’. Such systems (Edison’s ‘phonograph’, and Berliner’s ‘gramophone’), as shown in the previous slide, typically consisting of horn, diaphragm, stylus, and cylinder covered by soft foil.

The recordist spoke or sang into the horn causing the diaphragm and stylus to vibrate, which then inscribes a modulated groove into the surface of the soft foil or gramophone disk. On reproduction, the modulated groove would cause the stylus and diaphragm to vibrate, resulting in a sound wave being emitted from the horn.

The sounds recorded and reproduced by such early systems have limited frequency range and are very distorted.

Recordings made directly on the disk last for the duration of the side of the disk, at the time, a maximum of around 4 minutes, with no possibility of editing, and for long music, this would be recorded in short sections with gaps to change the disk. In addition, instruments need to be placed quite tightly around the pickup horn for them to be heard on the recording.

Page 34: Media Processing – Audio Part

34

Electrical recording machines

Source: Francis Rumsey and Tim McCormick (1994)

Page 35: Media Processing – Audio Part

35

Electrical recording (cont.) Early electrical recording machines, developed during 1920s, were based

on the principles of electromagnetic transduction.

In such systems, the outputs of microphones can be mixed together before being fed to the disk cutter. Variable resistors could also be inserted to the system to control the levels from each microphone, and valve amplifiers could be used to increase the electrical level for it to be suitable to drive the stylus transducer.

The microphones can be placed remotely from the recording machines, having more flexibility in their positions.

The sound quality of electrical recordings is markedly improved over the mechanical recordings, with a wider frequency range and a greater dynamic range.

Page 36: Media Processing – Audio Part

36

Electromagnetic transducers

Source: Francis Rumsey and Tim McCormick (1994)

Page 37: Media Processing – Audio Part

37

Electromagnetic transducers (cont.)

If a wire is made to move in magnetic field, perpendicular to the lines of the flux linking the poles of the magnet, then an electric current is induced in the wire (as shown in the diagram of the previous page).

The direction of motion governs the direction of the current flow in the wire, and thus if the wire is made to move back and forth then an alternating current can be induced in the wire. Conversely, if a current is made to flow through the wire which cuts the lines of a magnetic field then the wire will move.

A simple moving-coil microphone would involve a wire moving in a magnetic field, by means of a coil attached to a flexible diaphragm which vibrates in sympathy with the sound wave. The frequency of the electrical signal (i.e. the output of the microphone) is the same as that of the sound wave, and the amplitude of the electrical signal is proportional to the velocity of the coil.

Page 38: Media Processing – Audio Part

38

Late developments in sound recording

1940s, the introduction of the first AC-biased tape recorders, when tape was first made of paper coated with metal oxide which tends to deteriorate rather quickly, and later of plastics which was shown to last longer and easier to handle.

1950s, the introduction of microgroove LP (long-playing) recorders, with lower surface noise and better frequency response.

1960s, the development of stereo recordings and the introduction of multitrack tape recorders.

Later developments include high-quality digital recording, with Compact Disc and digital tape systems.

Page 39: Media Processing – Audio Part

39

Examples of modern recording chain - stereo recording

Source: Francis Rumsey and Tim McCormick (1994)

Page 40: Media Processing – Audio Part

40

Examples of modern recording chain - stereo recording (cont.)

Recording: Microphone sources are mixed ‘live’ to create a stereo session master, either analogue or digital, which is a collection of recordings. The balance between the sources must be correct at this stage.

Editing: The sound source materials are assembled in an artistically satisfactory manner, under the control of the producer, to create a final master.

Mastering: In this stage, additional and/or special information is added to the master for commercial release of the recordings to LPs, cassettes and CDs. For example, an LP master requires special equalisation to prepare it for cutting to the disk.

Stereo recording is a cheaper and less time consuming way of production than multitrack recording, but requires skill to achieve a usable balance quickly.

Page 41: Media Processing – Audio Part

41

Examples of modern recording chain - multitrack recording

Source: Francis Rumsey and Tim McCormick (1994)

Page 42: Media Processing – Audio Part

42

Examples of modern recording chain - multitrack recording (cont.)

Recording: Acoustic and electrical sources are fed into a mixer and recorded onto multitrack tape. The resulting tape contains a collection of individual sources on multiple tracks. Individual songs or titles are recorded in separate places on the tape. In track-laying mixer, a master is usually built up by laying down backing tracks (drums, keyboards, rhythm guitars, etc.) for a complete song.

Mixing: The multiple tracks are mixed down into final format, usually stereo.

Mastering: In this stage, additional and/or special information is added to the master for commercial release of the recordings to LPs, cassettes and CDs. For example, an LP master requires special equalisation to prepare it for cutting to the disk.

Multitrack recording is usually created in the recording studio.

Page 43: Media Processing – Audio Part

43

Broadcast distribution

Source: Francis Rumsey and Tim McCormick (1994)

Page 44: Media Processing – Audio Part

44

Broadcast distribution (cont.)

A typical television sound signal from an outside broadcast location may have travelled over a large number of miles (as shown by the figure in the previous page).

A radio microphone may transmit locally to the outside broadcast (OB) vehicle, which may then use a microwave radio link to send the signal back to the studio centre.

The sound signal may then travel through a length of internal cabling within the studio centre, finally to be connected to the network which distributes the signals around the country.

The network is in the form of either ‘land-lines’ (equalised to compensate the losses due to the distances) or wireless radio-frequency (RF) links (maybe digitally encoded).

In the receiver side, the signal is distributed to the consumer from the transmitter through a further RF link.

Page 45: Media Processing – Audio Part

45

References

Francis Rumsey and Tim McCormick, Sound and Recording: an Introduction, 1994.

Alan P. Kefauver and David Patschke, Fundamentals of Digital Audio, 2007.