1 cross-correlations and cleaning up data jessica ferguson

26
1 Cross- Cross- Correlations and Correlations and Cleaning Up Data Cleaning Up Data Jessica Ferguson Jessica Ferguson

Upload: cristian-dolliver

Post on 01-Apr-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Cross-Correlations and Cleaning Up Data Jessica Ferguson

1

Cross-Cross-Correlations and Correlations and Cleaning Up DataCleaning Up Data

Jessica FergusonJessica Ferguson

Page 2: 1 Cross-Correlations and Cleaning Up Data Jessica Ferguson

Senior Computer Science Major English: Creative Writing Minor Pacific University

Page 3: 1 Cross-Correlations and Cleaning Up Data Jessica Ferguson

3

Project Aims: HSD Project Aims: HSD ProjectProject

Aim 1: Collecting and transcribing Aim 1: Collecting and transcribing spoken language dataspoken language data

Aim 2: Automatically deriving Aim 2: Automatically deriving features from spoken language features from spoken language samplessamples

Aim 3: Characterizing features Aim 3: Characterizing features derived from Aim 2derived from Aim 2

Page 4: 1 Cross-Correlations and Cleaning Up Data Jessica Ferguson

4

My TaskMy Task

Falls under Aim 1Falls under Aim 1 Improving the quality of the Improving the quality of the

recordings in the corpusrecordings in the corpus Reducing noise to give clearer Reducing noise to give clearer

speechspeech

Page 5: 1 Cross-Correlations and Cleaning Up Data Jessica Ferguson

Subjects

Currently enrolled in studies at the Layton Aging and Alzheimer’s Disease Center at OHSU Individuals over 90 Individuals with Mild Cognitive

Impairment (MCI)

Page 6: 1 Cross-Correlations and Cleaning Up Data Jessica Ferguson

6

Test BatteryTest Battery

Wechsler Logical Memory I/II (Story Wechsler Logical Memory I/II (Story Recall)Recall)

Category Fluency (Fruits, States)Category Fluency (Fruits, States) Picture Description TaskPicture Description Task Autobiographical reflectionsAutobiographical reflections Conversational SpeechConversational Speech

Page 7: 1 Cross-Correlations and Cleaning Up Data Jessica Ferguson

7

Recording SetupRecording Setup

Same for all sessionsSame for all sessions Four different microphones set upFour different microphones set up

Tests administered by examinerTests administered by examiner

Page 8: 1 Cross-Correlations and Cleaning Up Data Jessica Ferguson

8

Characteristics of Characteristics of RecordingsRecordings

Similarly-shaped wavesSimilarly-shaped waves Shifted horizontallyShifted horizontally

Page 9: 1 Cross-Correlations and Cleaning Up Data Jessica Ferguson

9

Sample WavesSample Waves

Page 10: 1 Cross-Correlations and Cleaning Up Data Jessica Ferguson

10

Shifting FilesShifting Files

Shifting files is relatively easyShifting files is relatively easy But how far to shift?But how far to shift?

Page 11: 1 Cross-Correlations and Cleaning Up Data Jessica Ferguson

11

Close-up of Comments Close-up of Comments FilesFiles

Page 12: 1 Cross-Correlations and Cleaning Up Data Jessica Ferguson

12

Observed shift:Observed shift: 380380 320320 315315

Page 13: 1 Cross-Correlations and Cleaning Up Data Jessica Ferguson

13

Calculating Shift – Cross-Calculating Shift – Cross-CorrelationCorrelation

Cross-correlation: a measure of how Cross-correlation: a measure of how similar one signal is to anothersimilar one signal is to another To calculate: split the file into To calculate: split the file into

overlapping windowsoverlapping windows Take windows of the same length in Take windows of the same length in

another fileanother file Multiply them togetherMultiply them together

Page 14: 1 Cross-Correlations and Cleaning Up Data Jessica Ferguson

14

Cross-Correlation Cont.Cross-Correlation Cont.

The window we multiply it by in the The window we multiply it by in the other file keeps getting moved by one other file keeps getting moved by one sample (1/16 msec)sample (1/16 msec)

If corresponding values have the If corresponding values have the same sign, they contribute positivelysame sign, they contribute positively

If one is negative and the other is If one is negative and the other is positive, they contribute negativelypositive, they contribute negatively

We take the highest value from the We take the highest value from the rangerange

Page 15: 1 Cross-Correlations and Cleaning Up Data Jessica Ferguson

15

Issues with Cross-Issues with Cross-CorrelationCorrelation

With original parameters:With original parameters: Window length: 1280 samplesWindow length: 1280 samples Lag: -400 to 400 samplesLag: -400 to 400 samples For one value: 1280 * 800 = 512,000For one value: 1280 * 800 = 512,000

One value every 10 msec: 100 values One value every 10 msec: 100 values per second of file correlatedper second of file correlated

This gets unmanageable very quicklyThis gets unmanageable very quickly

Page 16: 1 Cross-Correlations and Cleaning Up Data Jessica Ferguson

16

Time Under Original Time Under Original ParametersParameters

Correlate 1.5s of files: up to 20 Correlate 1.5s of files: up to 20 minutesminutes

Relatively high accuracy, but Relatively high accuracy, but impracticalimpractical

Task: Reduce time while maintaining Task: Reduce time while maintaining accuracyaccuracy

Page 17: 1 Cross-Correlations and Cleaning Up Data Jessica Ferguson

17

Optimizing ParametersOptimizing Parameters

Parameters that could be adjusted:Parameters that could be adjusted: Window SizeWindow Size LagLag Number of correlations (how much of Number of correlations (how much of

the file gets correlated)the file gets correlated)

Page 18: 1 Cross-Correlations and Cleaning Up Data Jessica Ferguson

18

Window SizeWindow Size

Initial parameters were 200 msecInitial parameters were 200 msec Decreasing below 80 msec resulted in Decreasing below 80 msec resulted in

unacceptable loss of accuracyunacceptable loss of accuracy Runtime was improved but not Runtime was improved but not

significantly enoughsignificantly enough

Page 19: 1 Cross-Correlations and Cleaning Up Data Jessica Ferguson

19

Number of CorrelationsNumber of Correlations

Unfortunately, correlations are not Unfortunately, correlations are not always perfectalways perfect

We take the mode of the correlations We take the mode of the correlations producedproduced

n = 150 was the minimum, and still n = 150 was the minimum, and still had a high error ratehad a high error rate

Page 20: 1 Cross-Correlations and Cleaning Up Data Jessica Ferguson

20

LagLag

Recall the sound wave images from Recall the sound wave images from before:before:

Page 21: 1 Cross-Correlations and Cleaning Up Data Jessica Ferguson

21

Lag cont.Lag cont.

Assume that these are Assume that these are representativerepresentative

Lag values should all be between Lag values should all be between 300-400 samples (18-25 msec)300-400 samples (18-25 msec)

Add this to previous improvements:Add this to previous improvements: Runtime for one set of four files Runtime for one set of four files

decreases to about 5-6 mindecreases to about 5-6 min

Page 22: 1 Cross-Correlations and Cleaning Up Data Jessica Ferguson

22

Other BenefitsOther Benefits

If the assumption holds:If the assumption holds: Error from optimal value decreasesError from optimal value decreases Max. error decreases from 50 msec to Max. error decreases from 50 msec to

6msec6msec

Page 23: 1 Cross-Correlations and Cleaning Up Data Jessica Ferguson

23

Original FileOriginal File

Taken from a picture description Taken from a picture description tasktask

Page 24: 1 Cross-Correlations and Cleaning Up Data Jessica Ferguson

24

Shifted FileShifted File

The same file, but correlated and The same file, but correlated and shiftedshifted

Page 25: 1 Cross-Correlations and Cleaning Up Data Jessica Ferguson

Acknowledgements

Paul Hosom and Brian Roark Fellow Interns Everyone who has made me

welcome at CSLU

Page 26: 1 Cross-Correlations and Cleaning Up Data Jessica Ferguson

Questions?