1 cross-correlations and cleaning up data jessica ferguson
TRANSCRIPT
1
Cross-Cross-Correlations and Correlations and Cleaning Up DataCleaning Up Data
Jessica FergusonJessica Ferguson
Senior Computer Science Major English: Creative Writing Minor Pacific University
3
Project Aims: HSD Project Aims: HSD ProjectProject
Aim 1: Collecting and transcribing Aim 1: Collecting and transcribing spoken language dataspoken language data
Aim 2: Automatically deriving Aim 2: Automatically deriving features from spoken language features from spoken language samplessamples
Aim 3: Characterizing features Aim 3: Characterizing features derived from Aim 2derived from Aim 2
4
My TaskMy Task
Falls under Aim 1Falls under Aim 1 Improving the quality of the Improving the quality of the
recordings in the corpusrecordings in the corpus Reducing noise to give clearer Reducing noise to give clearer
speechspeech
Subjects
Currently enrolled in studies at the Layton Aging and Alzheimer’s Disease Center at OHSU Individuals over 90 Individuals with Mild Cognitive
Impairment (MCI)
6
Test BatteryTest Battery
Wechsler Logical Memory I/II (Story Wechsler Logical Memory I/II (Story Recall)Recall)
Category Fluency (Fruits, States)Category Fluency (Fruits, States) Picture Description TaskPicture Description Task Autobiographical reflectionsAutobiographical reflections Conversational SpeechConversational Speech
7
Recording SetupRecording Setup
Same for all sessionsSame for all sessions Four different microphones set upFour different microphones set up
Tests administered by examinerTests administered by examiner
8
Characteristics of Characteristics of RecordingsRecordings
Similarly-shaped wavesSimilarly-shaped waves Shifted horizontallyShifted horizontally
9
Sample WavesSample Waves
10
Shifting FilesShifting Files
Shifting files is relatively easyShifting files is relatively easy But how far to shift?But how far to shift?
11
Close-up of Comments Close-up of Comments FilesFiles
12
Observed shift:Observed shift: 380380 320320 315315
13
Calculating Shift – Cross-Calculating Shift – Cross-CorrelationCorrelation
Cross-correlation: a measure of how Cross-correlation: a measure of how similar one signal is to anothersimilar one signal is to another To calculate: split the file into To calculate: split the file into
overlapping windowsoverlapping windows Take windows of the same length in Take windows of the same length in
another fileanother file Multiply them togetherMultiply them together
14
Cross-Correlation Cont.Cross-Correlation Cont.
The window we multiply it by in the The window we multiply it by in the other file keeps getting moved by one other file keeps getting moved by one sample (1/16 msec)sample (1/16 msec)
If corresponding values have the If corresponding values have the same sign, they contribute positivelysame sign, they contribute positively
If one is negative and the other is If one is negative and the other is positive, they contribute negativelypositive, they contribute negatively
We take the highest value from the We take the highest value from the rangerange
15
Issues with Cross-Issues with Cross-CorrelationCorrelation
With original parameters:With original parameters: Window length: 1280 samplesWindow length: 1280 samples Lag: -400 to 400 samplesLag: -400 to 400 samples For one value: 1280 * 800 = 512,000For one value: 1280 * 800 = 512,000
One value every 10 msec: 100 values One value every 10 msec: 100 values per second of file correlatedper second of file correlated
This gets unmanageable very quicklyThis gets unmanageable very quickly
16
Time Under Original Time Under Original ParametersParameters
Correlate 1.5s of files: up to 20 Correlate 1.5s of files: up to 20 minutesminutes
Relatively high accuracy, but Relatively high accuracy, but impracticalimpractical
Task: Reduce time while maintaining Task: Reduce time while maintaining accuracyaccuracy
17
Optimizing ParametersOptimizing Parameters
Parameters that could be adjusted:Parameters that could be adjusted: Window SizeWindow Size LagLag Number of correlations (how much of Number of correlations (how much of
the file gets correlated)the file gets correlated)
18
Window SizeWindow Size
Initial parameters were 200 msecInitial parameters were 200 msec Decreasing below 80 msec resulted in Decreasing below 80 msec resulted in
unacceptable loss of accuracyunacceptable loss of accuracy Runtime was improved but not Runtime was improved but not
significantly enoughsignificantly enough
19
Number of CorrelationsNumber of Correlations
Unfortunately, correlations are not Unfortunately, correlations are not always perfectalways perfect
We take the mode of the correlations We take the mode of the correlations producedproduced
n = 150 was the minimum, and still n = 150 was the minimum, and still had a high error ratehad a high error rate
20
LagLag
Recall the sound wave images from Recall the sound wave images from before:before:
21
Lag cont.Lag cont.
Assume that these are Assume that these are representativerepresentative
Lag values should all be between Lag values should all be between 300-400 samples (18-25 msec)300-400 samples (18-25 msec)
Add this to previous improvements:Add this to previous improvements: Runtime for one set of four files Runtime for one set of four files
decreases to about 5-6 mindecreases to about 5-6 min
22
Other BenefitsOther Benefits
If the assumption holds:If the assumption holds: Error from optimal value decreasesError from optimal value decreases Max. error decreases from 50 msec to Max. error decreases from 50 msec to
6msec6msec
23
Original FileOriginal File
Taken from a picture description Taken from a picture description tasktask
24
Shifted FileShifted File
The same file, but correlated and The same file, but correlated and shiftedshifted
Acknowledgements
Paul Hosom and Brian Roark Fellow Interns Everyone who has made me
welcome at CSLU
Questions?