homework 4 - solutions due: soft + hard copy, 3:00pm, on...

18
Carnegie Mellon University Department of Computer Science 15-826 Multimedia Databases and Data Mining C. Faloutsos, Spring 2016 Homework 4 - Solutions Due: Soft + hard copy, 3:00pm, on 03/30/2016 VERY IMPORTANT: 1. On Blackboard, deposit a tar-file with your code, as described on page 2. 2. Deposit a hard copy of your answers, in class. As before: Separate your answers, and type your full info on each page (andrewId, course#, hw#, q# etc). Reminders: Plagiarism: Homework is to be completed individually. Typeset all your answers. Late homeworks: email it to all TAs with the subject line exactly: 15-826 Homework Submission (HW 4) and the count of slip-days you are using. For your information: Graded out of 100 points; 4 questions total Rough time estimate: 20 -25 hours Revision : 2016/04/17 23:58 Question Points Score Projections 25 Tensors with SQL 25 Hadoop and map-reduce 25 Fourier, Wavelets 25 Total: 100 1

Upload: others

Post on 15-Mar-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Homework 4 - Solutions Due: Soft + hard copy, 3:00pm, on ...christos/courses/826.S16/HOMEWORKS/HW4/hw4-sols.pdf(a) [5 points] Find Npoints in a 2-d space, that maximally, provably,

Carnegie Mellon UniversityDepartment of Computer Science

15-826 Multimedia Databases and Data MiningC. Faloutsos, Spring 2016

Homework 4 - SolutionsDue: Soft + hard copy, 3:00pm, on 03/30/2016

VERY IMPORTANT:1. On Blackboard, deposit a tar-file with your code, as described on page 2.2. Deposit a hard copy of your answers, in class. As before:

• Separate your answers, and• type your full info on each page (andrewId, course#, hw#, q# etc).

Reminders:• Plagiarism: Homework is to be completed individually.• Typeset all your answers.• Late homeworks: email it

– to all TAs– with the subject line exactly: 15-826 Homework Submission (HW 4)

– and the count of slip-days you are using.

For your information:• Graded out of 100 points; 4 questions total• Rough time estimate: ≈ 20 -25 hoursRevision : 2016/04/17 23:58

Question Points Score

Projections 25

Tensors with SQL 25

Hadoop and map-reduce 25

Fourier, Wavelets 25

Total: 100

1

Page 2: Homework 4 - Solutions Due: Soft + hard copy, 3:00pm, on ...christos/courses/826.S16/HOMEWORKS/HW4/hw4-sols.pdf(a) [5 points] Find Npoints in a 2-d space, that maximally, provably,

15-826 Homework 4 , Page 2 of 18 03/30/2016, 3:00pm

Preliminaries: Code-packaging info

These instructions are the same as in earlier homeworks, and repeated for your convenience.

Summary

Submit your code to blackboard, in a single tar file, called

andrewId-hw4.tar.gz

We will refer to that as “your-tar-file”.As before, for your convenience, we provide a template-tar-file package, at

http://www.cs.cmu.edu/~christos/courses/826.S16/HOMEWORKS/HW4/template-hw4.

tar.gz

It has 4 directories /Q1, /Q2, /Q3, /Q4. We will refer to it as the “template tar-file”.

To Do:

• tar xvfz template-hw4.tar.gz; make;• Delivery:

– In each directory Q2-Q4, add your code or replace the place-holder code with yoursolutions, (notice, for Q1 we do not ask for code)

– package the necessary files (make package)– and submit to blackboard.

Hints:

Please explore the makefiles we have created for your convenience. For example, from thetop directory:• make hw4 will run the code for all 4 questions• make package should create the tar-file, for submission• make spotless will/should clean up all the derived files

FAQs

Q: May I change the makefiles?A: Yes.Q: What language should we use?A: As described in each question: MATLAB/Python/SQL.Q: Will you auto-grade, with scripts?A: No - we’ll grade manually.

Homework 4 continues. . .

Page 3: Homework 4 - Solutions Due: Soft + hard copy, 3:00pm, on ...christos/courses/826.S16/HOMEWORKS/HW4/hw4-sols.pdf(a) [5 points] Find Npoints in a 2-d space, that maximally, provably,

15-826 Homework 4 , Page 3 of 18 03/30/2016, 3:00pm

Question 1: Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [25 points]

On separate page, with ‘[course-id] [hw#] [question#] [andrew-id] [your-name]’

Grading info: Varshaa NAGANATHAN

Problem Description: Consider the given similarity (dot-product) score matrix Sin /Q1/mystery.csv. It represents N=639 points in a k dimensional unit hypersphere.These could be, e.g., patient records with k numerical attributes, like (normalized) body-height, body-weight, etc. In such cases, as a data analyst, you would like to see whetherthere are groups, clusters, outliers. Projecting in a 2-d space, would help you visualizeand analyze the dataset

Mathematically, That is, you want to recover the N × k matrix A, given the N by Nsimilarity matrix S = A×AT .

Answer the questions that follow. We recommend you use MatLab - although expensive,it is excellent for matrix manipulations.

(a) [5 points] Find N points in a 2-d space, that maximally, provably, preserve thesimilarities; submit the resulting scatter-plot.

Figure 1: Projection on 2-D plane

Grading info: -5 for wrong plot.

(b) [8 points] What is the dimensionality k of our mystery dataset? (Do not usefractals - we have even better tools, now). Briefly justify your answer.

Question 1 continues. . .

Page 4: Homework 4 - Solutions Due: Soft + hard copy, 3:00pm, on ...christos/courses/826.S16/HOMEWORKS/HW4/hw4-sols.pdf(a) [5 points] Find Npoints in a 2-d space, that maximally, provably,

15-826 Homework 4 , Page 4 of 18 03/30/2016, 3:00pm

Solution: 3, 288.26 190.62 160.12 Justification: Three dominant singular valuesin S

Grading info: -4 for wrong k. -4 for wrong justification

(c) [6 points] Give the range of values along each of the k dimensions, within 10%.

Solution: Range: [-1.0,1.0] [-1.0,0.7] [-1,0.8]

Grading info: -2 for every wrong range. -2 if min and max not mentioned, but difference

between min and max mentioned correctly

(d) [6 points] If k ≤ 2, give the scatter-plot. If k > 2, give all the pair-plots, that is,the scatterplots of all the k-choose-2 possibilities.

Figure 2: XY plane

Grading info: -2 for every wrong plot.

What to turn in:

• Answers: On hard copy, please submit

1. Your numerical answers

2. Your justification

3. Your plots.

• Code: : no code required

Homework 4 continues. . .

Page 5: Homework 4 - Solutions Due: Soft + hard copy, 3:00pm, on ...christos/courses/826.S16/HOMEWORKS/HW4/hw4-sols.pdf(a) [5 points] Find Npoints in a 2-d space, that maximally, provably,

15-826 Homework 4 , Page 5 of 18 03/30/2016, 3:00pm

Figure 3: XZ plane

Figure 4: YZ plane

Question 2: Tensors with SQL . . . . . . . . . . . . . . . . . . . . . . . [25 points]

On separate page, with ‘[course-id] [hw#] [question#] [andrew-id] [your-name]’

Grading info: Di Jin

Question 2 continues. . .

Page 6: Homework 4 - Solutions Due: Soft + hard copy, 3:00pm, on ...christos/courses/826.S16/HOMEWORKS/HW4/hw4-sols.pdf(a) [5 points] Find Npoints in a 2-d space, that maximally, provably,

15-826 Homework 4 , Page 6 of 18 03/30/2016, 3:00pm

Problem Description: In this question, you will learn about the CP/PARAFACtensor decomposition, and how to use to spot interesting structures in data. Supposethat you have a purchasing trace from Internet, consisting of a list of quadruplet

user product timestamp price

This can be envisioned as a 4-mode tensor. For example the quadruplet

121 115 5432 2

means that the user number 121 purchased product number 115 at timestamp 5432 with2 dollors (rounded to integer).

You will use the CP/PARAFAC decomposition of such a tensor to discover (4-mode)communities, where a set of users purchase a set of products, during a set of timestampswith different prices. Such a “community” may indicate an burst: for example, customersare likely to purchase a set of products during a period of time, say, Black Friday, withrelatively cheaper prices.

For your convenience, we re-numbered the users, products, the timestamps and the pricesto make them integers. Thus, the dataset is in tab separated format, with one quadrupletper line, of the form

i j k price value

where the first four values are integers, and value is always 1.

Implementation Details:

(a) [0 points] Download and install the Tensor Toolbox for Matlab, from http://

www.sandia.gov/~tgkolda/TensorToolbox/index-2.5.html

(b) [0 points] Read the tab separated file from the following link:

https://www.dropbox.com/s/ps5oyuxwp3cionc/mystery.dat?dl=0

into a sparse tensor in Tensor Toolbox. [Hint: : type help dlmread and help

sptensor].

(c) [6 points] Run the CP/PARAFAC decomposition on the tensor you created, for

rank R = 3. Provide 8 plots, one for each of the components (~a1, ~a2, ~b1, ~b2, ~c1, ~c2,~d1, ~d2) That is, for say, component ~a1, plot the score a1,i, versus the index i. A highvalue for a1,i means that user i participates in the first ’concept’.Hint: : Check the documentation of Tensor Toolbox by typing help tensor toolbox.The function you will need is cp als.

Question 2 continues. . .

Page 7: Homework 4 - Solutions Due: Soft + hard copy, 3:00pm, on ...christos/courses/826.S16/HOMEWORKS/HW4/hw4-sols.pdf(a) [5 points] Find Npoints in a 2-d space, that maximally, provably,

15-826 Homework 4 , Page 7 of 18 03/30/2016, 3:00pm

(d) [7 points] Consider the first component ( with the three vectors ~a1, ~b1, ~c1, ~d1).Entries with non-zero values of the rank-one component, “belong” to that 4-modecommunity. Give

1. [1 pt] the count of users in this community10

2. [1 pt] the user-IDs of the community2, 6, 8, 9, 11, 13, 15, 21, 24, 26

3. [1 pt] the count of products in this community11

4. [1 pt] the product-IDs of those users2, 5, 8, 9, 14, 20, 26, 29, 31, 32, 33

5. [1 pt] the count of timestamps in this community12

6. [1 pt] the timestamps used in this community3, 4, 7, 10, 16, 18, 20, 21, 22, 23, 26, 27

7. [1 pt] the prices in this community1, 2, 3, 4, 6, 7, 8, 9

(e) [5 points] SQL, for tensor analysis: It turns out that SQL is enough, to helpus find the rank-1 decomposition of a tensor. Implement the method in http:

Question 2 continues. . .

Page 8: Homework 4 - Solutions Due: Soft + hard copy, 3:00pm, on ...christos/courses/826.S16/HOMEWORKS/HW4/hw4-sols.pdf(a) [5 points] Find Npoints in a 2-d space, that maximally, provably,

15-826 Homework 4 , Page 8 of 18 03/30/2016, 3:00pm

//www.cs.cmu.edu/~maraujo/papers/pakdd14.pdf (Equation 1, page 6), to findthe rank-one decomposition. Attach your SQL code.

Hint: : Use SQLite or PostgreSQL for this question. Hint: : After every itera-tion, normalize the vectors, by dividing each entry with the (absolute) max valuefor that vector. I.e., if the original vector is ( 3, 2.2, -5), we divide everything by| − 5|, and we get ( 3/5, 2.2/5, -1).For one iteration loop:

INSERT INTO new usrSELECT T. row id , SUM(T. va l ∗ D. va l ∗ H. va l ∗ P. va l )FROM prod D, time H, tenso r T, p r i c PWHERE D. column id = T. column id AND H. h e i g h t i d = T. h e i g h t i d ANDP. t ime id = T. t ime idGROUPBY T. row id ;

(f) [7 points] Apply your SQL code from the previous question, to the same datasetin (b). Use SQL, and do 20 iterations to answer the following questions:

1. [1 pt] Report the count of significant users in this first component. We defineai be significant if it is within 10% of the maximum, that is if it has value largerthan 0.9 ∗max(ai) for i in 1, 2, . . ..10

2. [1 pt] List the significant users in this component.2, 6, 8, 9, 11, 13, 15, 21, 24, 26

3. [1 pt] Report the count of significant products in this component.11

4. [1 pt] List the significant products in this component.2, 5, 8, 9, 14, 20, 26, 29, 31, 32, 33

5. [1 pt] Given the count of significant timestamps in this component.12

6. [1 pt] List the significant timestamps in this component.3, 4, 7, 10, 16, 18, 20, 21, 22, 23, 26, 27

7. [1 pt] List the significant prices in this component.2, 4, 5

What to turn in:

• Code: In the [andrewid-hw4.tar.gz] file, put all the code you used to generatethese results• Answers: On hard copy, please submit your answers, plots and code for Q2.b -

Q2.f.

Homework 4 continues. . .

Page 9: Homework 4 - Solutions Due: Soft + hard copy, 3:00pm, on ...christos/courses/826.S16/HOMEWORKS/HW4/hw4-sols.pdf(a) [5 points] Find Npoints in a 2-d space, that maximally, provably,

15-826 Homework 4 , Page 9 of 18 03/30/2016, 3:00pm

Question 3: Hadoop and map-reduce . . . . . . . . . . . . . . . . [25 points]

On separate page, with ‘[course-id] [hw#] [question#] [andrew-id] [your-name]’

Grading info: Wan-Shu LAI

Problem Description: You are given a mystery dataset of numbers, and you have tofind the mean and standard-deviation. The goal is to introduce you to Hadoop 2.7.2,which is suitable for such calculations, on huge, distributed datasets. Hadoop developedfrom the parallel processing concept named MapReduce proposed by Google, and ApacheHadoop is the open-source framework which represents MapReduce as analytics enginesand other components such as HDFS. Although Hadoop usually manages resources ongrouped cluster nodes for scalability, you will install and operate on the single nodeinstead.

The purpose is to learn how to deploy and execute a program on the Hadoop framework.

Note, this question has been designed for and tested on the GHC Andrew machines(@ghcXX.ghc.andrew.cmu.edu where ranges from 25-86), Each student has a specificmachine assigned to them in the grade center on Blackboard that they should use (underthe column Project - Machine & Port Number).

Setup:

• Download the homework package contains setup scripts 1 and dataset from http:

//www.contrib.andrew.cmu.edu/~wanshul/mapreduce.tar.gz (this is NOT inthe “template tar-file”- our apologies for the confusion).• Place it in the directory where you want to run Hadoop and solve this question• Execute command:

./build-hadoop-2.7.sh; source ~/.bashrc

FYI (For your information): You will likely need to enter your Andrew password a fewtimes while the script is running. This script will install Hadoop and HDFS locallyin the directory where you placed the original script 2. When the script is done run-ning, Hadoop starts and stops on the machine by shell scripts ./sbin/start-all.sh

and./sbin/stop-all.sh respectively, and the necessary resources for this question willbe in the homework package, which we will go over later.

Careful: Note, if another student is currently running Hadoop on the machine you areon you will not be able to also run Hadoop on that machine. If this happens, you

1Single node setting up documents:https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SingleCluster.

html2Note that the script assumes that your default shell on the Andrew machines is bash; if you have changed

the shell environment, you will need to add the correct environment variables

Question 3 continues. . .

Page 10: Homework 4 - Solutions Due: Soft + hard copy, 3:00pm, on ...christos/courses/826.S16/HOMEWORKS/HW4/hw4-sols.pdf(a) [5 points] Find Npoints in a 2-d space, that maximally, provably,

15-826 Homework 4 , Page 10 of 18 03/30/2016, 3:00pm

do not need to reinstall Hadoop on a different GHC machine (assuming you’ve alreadyinstalled it), but rather just log into another GHC machine and try starting your Hadoopinstance there. (Please try the machines from 81-86 since these will not be assigned toany students.) The script will warn you if there is an issue.

However, please shut down your Hadoop service when you log out!!!!!

(If it appears someone else is using the machine, you can look for long running javainstances in top and email the user to turn off their Hadoop.) When using Hadoop ona new machine, you may get an error when running your commands that Hadoop is in“safe mode.” If you get this error, go to the hadoop-2.7.2 directory and run the followingcommand bin/hadoop dfsadmin -safemode leave

Implementation Details: The MapReduce programming model is designed to pro-cess big data using a large number of machines. With this pioneer idea, Hadoop is anopen-source framework, implemented by Java, to realize the MapReduce model. WithHadoop, various applications are allowed for the distributed processing of large data setsacross clusters of computers using simple programming models.Below is the brief description of Hadoop main modules those are configured during in-stallation:

• HDFS is a distributed file system deployed by Hadoop• Hadoop YARN responsible for job scheduling and cluster resource management• Hadoop MapReduce is a YARN-based system for parallel processing of huge data

Although program running on Hadoop usually should be written by Java and compileinto executable jar file, Hadoop Streaming utility allows us to create and run map-reducejobs with other executable such as Python. 3

After environment setup, please solve the following problems in Python:

(a) [5 points] First, let’s start with some basic file operations on HDFS. Upload themapreduce/dataset.txt file to HDFS and then verify it uploaded correctly. To dothis, you need to use command hadoop fs (you should able to invoke this com-mand anywhere after running source ~/.bashrc). Entering that command willgive you a list of possible commands to run such as -ls and -mkdir, which ofcourse correspond to their Unix equivalent ls and mkdir. Your home path is just /When this step is complete your HDFS should have a directory /input with a file/input/dataset.txt.

Solution: $hadoop fs -mkdir /input$hadoop fs -put(-copyFromLocal) dataset.txt /input

3Tutorial documents of Hadoop streaming interface:https://hadoop.apache.org/docs/r2.7.0/hadoop-streaming/HadoopStreaming.html

Question 3 continues. . .

Page 11: Homework 4 - Solutions Due: Soft + hard copy, 3:00pm, on ...christos/courses/826.S16/HOMEWORKS/HW4/hw4-sols.pdf(a) [5 points] Find Npoints in a 2-d space, that maximally, provably,

15-826 Homework 4 , Page 11 of 18 03/30/2016, 3:00pm

Grading info: -1 if operation commands are ambiguous

Grading info: -3 if failed to list related commands

(b) [5 points] We will now test that Hadoop works properly, with a classic exam-ple, WordCount. In “template tar-file”, there are two example programs namedwordReducer.py and wordSplitter.py. Upload these two programs into HDFSand run the WordCount example on the dataset by executing

hadoop jar

$JAVA HOME/share/hadoop/tools/lib/hadoop-streaming-2.7.2.jar -input

input -output output -mapper wordSplitter.py -reducer wordReducer.py

-file wordSplitter.py -file wordReducer.py

Please report how many unique words exist.

Solution: $ hadoop fs -cat /output/part-00000 |wc -lnumber of unique words: 10000

Grading info: -2 if failed to list related commands

Grading info: -1 if number of words is slightly different

Careful: Note, the output directory created by previous map-reduce task need tobe deleted if you want to use the same command. Or you can choose to change thename of the output directory instead. Otherwise, you will get error messages whentrying to launch a new job.

(c) Analyze the mystery dataset

i. [5 points] Write the necessary program(s) in Python to compute the averageand standard-deviation. Please hand-in your map-reduce code.

Solution: $hadoop jar ./mapreduce/hadoop-2.7.2/share/hadoop/tools/lib/hadoop-streaming-2.7.2.jar -input /input -output /output -mapper ./mapper.py -reducer./reducer.py -file mapper.py -file reducer.py

Grading info: -3 if not list execution command or failed to convience the python

program running on Hadoop

ii. [5 points] What is the mean value of the given dataset?

Solution: 5000.779900

iii. [5 points] What is the standard deviation?

Solution: 2888.645410

What to turn in:

• Answers:1. On hard copy, please give your answers, and also give2. A print out of all your code (including bash commands, etc)

Question 3 continues. . .

Page 12: Homework 4 - Solutions Due: Soft + hard copy, 3:00pm, on ...christos/courses/826.S16/HOMEWORKS/HW4/hw4-sols.pdf(a) [5 points] Find Npoints in a 2-d space, that maximally, provably,

15-826 Homework 4 , Page 12 of 18 03/30/2016, 3:00pm

• Code:1. Turn in all the code you wrote, in “your-tar-file”2. Also include a text file with the answers to the questions.

Homework 4 continues. . .

Page 13: Homework 4 - Solutions Due: Soft + hard copy, 3:00pm, on ...christos/courses/826.S16/HOMEWORKS/HW4/hw4-sols.pdf(a) [5 points] Find Npoints in a 2-d space, that maximally, provably,

15-826 Homework 4 , Page 13 of 18 03/30/2016, 3:00pm

Question 4: Fourier, Wavelets . . . . . . . . . . . . . . . . . . . . . . . [25 points]

On separate page, with ‘[course-id] [hw#] [question#] [andrew-id] [your-name]’

Grading info: Zhoucheng LI

Problem Description: The goal is to see the power of Fourier and wavelets in spottingfrequencies, and in denoising. Suppose you are a private investigator, and your targetis calling from a touch-tone pay-phone. You want to find the number he is calling. (Or,alternatively, you are studying dolphins and how they communicate - you want to spotthe special frequencies that each dolphin uses).

Let’s stay with the touch-tone pay-phones scenario, for which pay-phones we know thefollowing: The DTMF (Dual Tone Multi Frequency) telephone keypad is laid out in a 44matrix of buttons. Each row and column represent a unique frequency. Pressing a keysends a combination of the row and column frequencies. For example, key 6 produces asuperimposition of tones of 770 and 1477 Hz, key 9 produces a superimposition of tonesof 852 and 1477 Hz. Your goals are to solve challenges given below, with MATLAB.

Image taken from Wikipedia

Setup: The GHC machines already have MATLAB installed. For help on a function,say, (function name), try help function name in MATLAB’s command window.

Implementation Details:

(a) There are 10 mystery audio files (mystery a.wav to mystery j.wav) inside the“template tar-file”, in the ./Q4 directory. Each one is the sound that is producedwhen one presses a number key (i.e. 0 to 9), but the mystery files are not necessarilyin this order.

i. The DTMF components in a phone generate time-continuous signals. So youcannot use DFT to analyze them directly. The good thing is, we have sampledthem for you, just like the signals in course slides! Use audioread to read in theaudio data and its sampling frequency f sample. When we calculate the DFTof this audio signal, the DFT result should have the same length as data length.

Question 4 continues. . .

Page 14: Homework 4 - Solutions Due: Soft + hard copy, 3:00pm, on ...christos/courses/826.S16/HOMEWORKS/HW4/hw4-sols.pdf(a) [5 points] Find Npoints in a 2-d space, that maximally, provably,

15-826 Homework 4 , Page 14 of 18 03/30/2016, 3:00pm

Now, recall the symmetric property of DTF amplitude, we can keep only thefirst half part of DFT amplitude spectrum (which is also called Single-Sidedspectrum) to a range from 0Hz to f sample/2.α) [2 points] Implement the function defined in Q4/q1.m. Please use fft (a

Fast DFT algorithm) in MATLAB to compute the Single-Sided spectrum.The x-axis of the figure should be Frequency (unit: Hz) ranging from 0Hz

to f sample/2. Give your code.

β) [2 points] Use mystery a.wav as input to test your function and includethe amplitude spectrum plot in “your-tar-file”. Please also zoom on the twomain frequencies, and give two more plots, as in Q4/q1-example.png

Grading info: -1 point, if one plot is missing

fft of mystery a.wav

Question 4 continues. . .

Page 15: Homework 4 - Solutions Due: Soft + hard copy, 3:00pm, on ...christos/courses/826.S16/HOMEWORKS/HW4/hw4-sols.pdf(a) [5 points] Find Npoints in a 2-d space, that maximally, provably,

15-826 Homework 4 , Page 15 of 18 03/30/2016, 3:00pm

γ) [2 points] What are the 2 frequencies with strongest amplitudes? (fullpoints, if within 5Hz).Grading info: -1 point, if error is larger than +/- 5Hz

Solution: About 768Hz and about 1336Hz, we accept answers with er-rors less than +/-5 Hz.

δ) [1 point] Which number key does this file represent?

Question 4 continues. . .

Page 16: Homework 4 - Solutions Due: Soft + hard copy, 3:00pm, on ...christos/courses/826.S16/HOMEWORKS/HW4/hw4-sols.pdf(a) [5 points] Find Npoints in a 2-d space, that maximally, provably,

15-826 Homework 4 , Page 16 of 18 03/30/2016, 3:00pm

Solution: Number 5

ii. [3 points] For the remaining 9 mystery files, please use your code to guessthe pressed key. Give your answers in the format, e.g.,:mystery z.wav:8

Grading info: 0.33 point for each

Solution:mystery b.wav:9

mystery c.wav:1

mystery d.wav:8

mystery e.wav:7

mystery f.wav:0

mystery g.wav:6

mystery h.wav:3

mystery i.wav:4

mystery j.wav:2

(b) Fourier and wavelets are powerful with respect to denoising, too. In the privateinvestigator scenario, one day you noticed that your target was dialing someone butthere was some noise, not necessarily a single frequency, not necessarily throughoutthe whole dialing.

i. [1 point] Guess the type of noise first, inside audio record Q4/q2.wav. Later,you’ll have to guess the duration (tstart, tend) of each noise component. For thetype of noise, choose among the following:

A. Noise within range of 85Hz to 95Hz

B. Noise within range of 525Hz to 535Hz

C. White Noise (flat spectrum, like a hissing noise)

D. Both A and C

E. Both B and C

F. Both A and B

G. All of A, B and C

H. None of the above (explain).

Solution: D

ii. [2 points] Now, on, to the duration(s) of the noise(s). For the type(s) ofnoises you found, specify the starting and ending time-stamp, rounded to thenearest thousand. We expect answers of the form, eg.,white noise:5,000, 10,000

10kHz noise:everywhere

Grading info: 1 point for each. However, even you had right answers, if you didn’t

do part iii, we give 0 point for this question since we have no idea how you get the

Question 4 continues. . .

Page 17: Homework 4 - Solutions Due: Soft + hard copy, 3:00pm, on ...christos/courses/826.S16/HOMEWORKS/HW4/hw4-sols.pdf(a) [5 points] Find Npoints in a 2-d space, that maximally, provably,

15-826 Homework 4 , Page 17 of 18 03/30/2016, 3:00pm

correct answers.

Solution: White noise: everywhere85Hz to 95Hz noise: 14000, 28000

iii. [2 points] Justify your answer, briefly, and with the as few plots as you can.Consider FFT or Haar wavelets, with the build-in function wscalogram. Givehard copy of your justification and the plots; put your code in Q4/q2.m.Grading info: -1 point, if justification is not convincing

(c) You must be tired of decoding these phone numbers one by one. Let’s write a

Question 4 continues. . .

Page 18: Homework 4 - Solutions Due: Soft + hard copy, 3:00pm, on ...christos/courses/826.S16/HOMEWORKS/HW4/hw4-sols.pdf(a) [5 points] Find Npoints in a 2-d space, that maximally, provably,

15-826 Homework 4 , Page 18 of 18 03/30/2016, 3:00pm

function to decode the audio files, automatically.

i. [8 points] Implement the function defined in Q4/q3.m. Your program shouldaccept an audio file (in wav format) as input, and output the phone number thatis dialed in the audio file. Assume that the pause time between each numberis fixed (1638 data points), also each key pressing time is also fixed (3277 datapoint).

ii. [2 points] Use q3.wav to test your program and output the result.Grading info: -1 point (each), if result has some minor errors, e.g. failed to decode

one tune

Grading info: -2 point, if result is fully wrong

Grading info: -2 point, if code has minor bugs

Grading info: -4 point, if code has some big problems

Solution: 9324715806

What to turn in:

• Code: In “your-tar-file”, put all the code you used to generate these results• Answers: On hard copy, please submit

1. your answers/justifications for all questions2. your plots3. and your code

End of Homework 4