1 best practices in digital preservation of the spoken word emeld

30
1 Best Practices in Digital Preservation of the Spoken Word http://www.historicalvoices.org/ oralhistory EMELD Workshop of Digitizing Lexical Information August 2002 Bartek Plichta, Michigan State University

Upload: madlyn-hopkins

Post on 23-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Best Practices in Digital Preservation of the Spoken Word   EMELD

1

Best Practices in Digital Preservation of the Spoken Word

http://www.historicalvoices.org/oralhistory

EMELD Workshop of Digitizing Lexical InformationAugust 2002

Bartek Plichta, Michigan State University

Page 2: 1 Best Practices in Digital Preservation of the Spoken Word   EMELD

2

Why Best Practices?

Page 3: 1 Best Practices in Digital Preservation of the Spoken Word   EMELD

3

Why Best Practices?

• Highest possible audio “quality” of the speech signal.

• Platform and hardware-independent storage format.

• Platform and hardware-independent storage medium.

• Comprehensive metadata.

Page 4: 1 Best Practices in Digital Preservation of the Spoken Word   EMELD

4

Part I. Recording Speech

Common recording situations: – Field recording

– Studio and lab recording

– Telephone recording

Issues to be considered:– Recording techniques

– Hardware

– Software

Page 5: 1 Best Practices in Digital Preservation of the Spoken Word   EMELD

5

Analog or Digital? Acoustic properties of the speech signal:

– Frequency response: < 10000 Hz– Dynamic range: 30-40 dB

Analog tape:– Frequency response: < 10000-15000 Hz– Dynamic range: 45 dB

DAT (Digital Audio Tape, 16 bit, 48 kHz):– Frequency response: < 24000 Hz– Dynamic range: 96 dB

Digital telephone (ISDN)– Frequency response: < 4000 Hz– Dynamic range: 48 dB

Page 6: 1 Best Practices in Digital Preservation of the Spoken Word   EMELD

6

Microphones

• omnidirectional built-in

• omnidirectional lavalier

• hand-held dynamic

• hand-held condenser

• head-set dynamic or condenser

• highly directional shotgun

• digital telephoneRoom noise

Page 7: 1 Best Practices in Digital Preservation of the Spoken Word   EMELD

7

Recommended Microphones

1. Unidirectional (cardioid polar pattern) head-worn microphone.

– Shure SM 10A or AKG C 420

2. Unidirectional shotgun (highly directional polar pattern)

– Shure SM 89

3. Unidirectional dynamic or condenser– Shure SM 58 or AKG C 1000S

Page 8: 1 Best Practices in Digital Preservation of the Spoken Word   EMELD

8

Microphone PreamplifierA premium quality microphone preamp is CRUCIAL

to obtaining a reliable speech signal– 2 balanced XLR inputs – High gain (< 65 dB)– Phantom power (+48 V)

Built-in preamps (generally not recommended)– Marantz PMD222– TASCAM DA-P1

Stand-alone preamps– Symetrix 628– M-Audio DMP2– Shure FP24

Page 9: 1 Best Practices in Digital Preservation of the Spoken Word   EMELD

9

Recorders

Avoid using automatic levels settings, EQ, Dolby, etc.

• Field– Portable analog [Marantz PMD222]– Portable DAT (16 bit, 48 kHz) [TASCAM DA-P1]– Portable Hard Disk (24 bit, 48 kHz) [USB Pre]

• Studio/Lab– Reel-to-reel analog– Hard disk (24 bit, 48 kHz) [stand-alone ADC]

Page 10: 1 Best Practices in Digital Preservation of the Spoken Word   EMELD

10

Recording Technique

• Position the directional microphone close to the talker’s lips, 45 degrees off to the side, keeping the distance constant.

• Avoid low-frequency noise (refrigerator, traffic, fluorescent light buzz, computer hum)

• Use ONLY balanced XLR cables.• Use reliable microphone stands and clips.• Use manual gain control if possible.• Monitor ADC gain control to avoid clipping.• ALWAYS monitor your input with an earpiece.

Page 11: 1 Best Practices in Digital Preservation of the Spoken Word   EMELD

11

Part II. Processing

Analog to Digital Conversion

The main goal of A/D conversion is to obtain THE BEST POSSIBLE digital representation of the analog original for

the purposes of:

• Preservation

• Analysis

Page 12: 1 Best Practices in Digital Preservation of the Spoken Word   EMELD

12

Recommended Digitization Settings

• Sample rate: 48 kHz (96 KHz even better)

• Quantization: 24 bit

• Hardware: stand-alone, oversampling delta-sigma A/D converter with dither added prior to sampling.

• S/PDIF I/O interface.

• Store in an uncompressed (PCM) digital audio file format.

PC noise

Page 13: 1 Best Practices in Digital Preservation of the Spoken Word   EMELD

13

Processing

• Preservation– No further processing necessary.– Store as 48 kHz, 24 bit wav, aiff, or headerless

(e.g. raw)

• Analysis– Save as 16 bit, signed wav, aiff or raw– Downsample to 11025 Hz with anti-alias filter– Apply restoration processing if necessary

Page 14: 1 Best Practices in Digital Preservation of the Spoken Word   EMELD

14

Restoration processing

Restoration processing must be applied carefully to avoid removing information from the speech

signal itself.

• Hiss removal• Click and crackle removal• Clipped peak restoration• Volume normalization

Page 15: 1 Best Practices in Digital Preservation of the Spoken Word   EMELD

15

Example: preparing a DARE tape for analysis

• Original digitized at 48/24

• Converted to 16 bit

• Downsampled to 11025 Hz

• Band-passed for low and high frequency noise

• 2:1 compression starting at –15 dB

• Volume adjustment

Page 16: 1 Best Practices in Digital Preservation of the Spoken Word   EMELD

16

Before and After

• DM 0735 – S1 tape from DARE (Michigan)

Page 17: 1 Best Practices in Digital Preservation of the Spoken Word   EMELD

17

Recommended Software

Operating Systems:Windows 2000 Professional, Mac 9.1, Linux RedHat

Audio Editors:– Windows – GoldWave 2.24– Mac OS – Peak VST

Analysis Software:– Windows – Praat, WaveSurfer, MultiSpeech, MatLab,

SpeechStation2, Spectrogram 6.0– Mac OS – Praat– Linux – Praat, MatLab

Page 18: 1 Best Practices in Digital Preservation of the Spoken Word   EMELD

18

What to Avoid

• Using inexpensive, generic hardware (microphones, portable recorders, cables, sound cards, etc.)

• Outputting digital audio through analog D/A outputs (e.g., DAT to PC transfer)

• Capturing audio directly into analysis software• Ignoring metadata

Page 19: 1 Best Practices in Digital Preservation of the Spoken Word   EMELD

19

Metadata

• It is recommended to ALWAYS enter metadata in a common database or XML format.

• IDEALLY, metadata should be encoded in an OAI-compliant format (OLAC, METS)

• What about MPEG 7?

Page 20: 1 Best Practices in Digital Preservation of the Spoken Word   EMELD

20

METS

•Descriptive metadata•Administrative metadata•File groups•Structural Map•Behavior

Page 21: 1 Best Practices in Digital Preservation of the Spoken Word   EMELD

21

METS – Audio metadata

•File group•One master => many derivatives

•Structural map•Time alignment

•Technical metadata•Platform and hardware-independent storage

•Digital Provenance•How? Why? Who?

•Behavior •Executable code in metadata

Page 22: 1 Best Practices in Digital Preservation of the Spoken Word   EMELD

22

Bibliography

Harris, M.N. and R. Kelly, D.A.McLeod, M.J.Story. 1998. “Effects in High Sample Rate Audio Material”. DCS Ltd.

Johan, Lilkencrants. 1997 “Speech Signal Processing.” The Handbook of Phonetic Sciences. Eds. Hardcastle William and Laver John. Oxford: Blackwell.

Karl, J.H. 1989. An Introduction to Digital Signal Processing. Academic Press.Plichta, Bartek. 2001. “Digitizing Speech Recordings for Archival Purposes”. Working Paper. Matrix,

Michigan State University.Pohlmann, Ken. 2000. Principles of Digital Audio. New York: McGraw-Hill.Rabiner, Lawrence R., and Ronald W. Schafer. 1978. Digital Processing of Speech Signals. Englewood

Cliffs: Prentice-Hall.SONY Corporation. 1993. “Minidisk Specifications”. Digital Audio Disk Corporation.Stevens, Kenneth N. 1998. Acoustic Phonetics. Cambridge, Mass.: MIT Press. Story, Mike. 1997. “A Suggested Explanation for (Some of) the Audible Differences between High

Sample Rate and Conventional Sample Rate Audio Material”. DCS Ltd.Story, Mike and R. Kelly, D.A.McLeod. 1998. “Resolution, Bits, SNR and Linearity”. DCS Ltd.Titze, Ingo. 1994. Workshop of Acoustic Voice Analysis. National Center for Voice and Speech.Vanderkooy, J., and S.P. Lipshitz. 1984 “Dither in Digital Audio”. JAES, vol.32, no. 11.

Page 23: 1 Best Practices in Digital Preservation of the Spoken Word   EMELD

23

Built-in, omni directional microphone

‘Bob was positive that he heard his wife...’

Page 24: 1 Best Practices in Digital Preservation of the Spoken Word   EMELD

24

Omnidirectional, lavalier microphone

‘Bob was positive that he heard his wife...’

Page 25: 1 Best Practices in Digital Preservation of the Spoken Word   EMELD

25

Dynamic microphone, Shure SM58

‘Bob was positive that he heard his wife...’

Page 26: 1 Best Practices in Digital Preservation of the Spoken Word   EMELD

26

Unidirectional, shotgun microphone, Shure SM89

‘Bob was positive that he heard his wife...’

Page 27: 1 Best Practices in Digital Preservation of the Spoken Word   EMELD

27

Head-worn unidirectional microphone

‘Bob was positive that he heard his wife...’

Page 28: 1 Best Practices in Digital Preservation of the Spoken Word   EMELD

28

Digital telephone

‘Bob was positive that he heard his wife...’

Page 29: 1 Best Practices in Digital Preservation of the Spoken Word   EMELD

29

Room Noise

Page 30: 1 Best Practices in Digital Preservation of the Spoken Word   EMELD

30

LPC Comparison

f1 (formant + bandwidth Hz) f2 (formant + bandwidth Hz) built-in mic 871; 138 1670; 263 head-set mic 521; 146 1770; 45

Built-in

Head-set

Built-in

Head-set

frame length = 20ms, filter order = 12, pre-emphasis = 0.9

“Shannon”