matlab data analysis greg reese, ph.d research computing support group academic technology services...

74
MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

Upload: britney-briggs

Post on 28-Dec-2015

228 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

MATLAB

Data Analysis

Greg Reese, Ph.D

Research Computing Support Group

Academic Technology Services

Miami University

Page 2: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

MATLABData Analysis

© 2010-2013 Greg Reese. All rights reserved 2

Page 3: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

3

Data analysis

MATLAB has functions for the basic statistical analysis of numbers stored in a vector. The table that follows shows some of them. For more details, type

help datafun

at the command line.

Page 4: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

4

Basic statistics

max Largest component.

min Smallest component.

mean Average or mean value.

median Median value.

std Standard deviation.

var Variance.

sum Sum of elements.

prod Product of elements.

hist Histogram.

Page 5: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

5

Basic statistics

ExampleClass’s quiz grades:

2, 9, 8, 5, 4, 5, 8, 10, 8, 7

Store grades in vector and compute the average quiz score:>> grades = [2 9 8 5 4 5 8 10 8 7];>> mean(grades)ans = 6.6000

Page 6: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

6

Basic statistics

Try ItMake a vector of a class’s quiz grades: 2, 9, 8, 5, 4, 5, 8, 10, 8, 7

• Compute the mean, minimum, maximum, median, and mode

• Show the number of grades

Page 7: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

7

Basic statistics

Try It>> grades = [ 2 9 8 5 4 5 8 10 8 7 ];>> mean(grades)ans = 6.6000>> min(grades)ans = 2>> max(grades)ans = 10

Page 8: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

8

Basic statistics

Try It>> grades = [ 2 9 8 5 4 5 8 10 8 7 ];>> median(grades)ans = 7.5000>> mode(grades)ans = 8>> length( grades ) % number of gradesans = 10

Page 9: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

9

Basic statistics

Can also compute statistics on matrices. For two-dimensional matrices MATLAB operates on each column separately. This produces a row vector whose length is the number of columns in the original matrix.

Page 10: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

10

Basic statistics

ExampleClass’s quiz grades on one Friday (column 1) and on the following Friday (column 2): >> grades = [2 9 8 5 4 8 10; 4 9 9 2 1 4 6]'grades = 2 4 9 9 8 9 5 2 4 1 8 4 10 6

Page 11: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

11

Basic statistics

Example>> mean( grades )ans = 6.5714 5.0000>> min( grades )ans = 2 1>> max( grades )ans = 10 9

Page 12: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

12

Basic statistics

Two ways to compute a statistic on all of the data. In first, apply function twice.

Example>> mean( mean( grades ) )ans = 5.7857>> min( min( grades ) )ans = 1>> max( max( grades ) )ans = 10

Page 13: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

13

Basic statistics

Second way - convert matrix to 1D, then compute statistic

If M is a matrix (of any dimension), M(:) produces a one-dimensional column vector

– Both have same number of elements– M(:) made by stacking columns up, i.e., concatenating second column under first, third column under second, etc.

Page 14: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

14

Basic statistics

Example>> m1 = [ 1 2 3; 4 5 6]m1 = 1 2 3 4 5 6>> m2 = m1(:)m2 = 1 4 2 5 3 6

Page 15: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

15

Basic statistics

Example>> m1 = [ 1 2 3; 4 5 6]m1 = 1 2 3 4 5 6>> mean( m1 )ans = 2.5000 3.5000 4.5000>> mean( m1(:) )ans = 3.5000>> mean( mean(m1) )ans = 3.5000

Page 16: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

16

Basic statistics

Try It>> m1 = [ 1 2 3; 4 5 6];

Compute the minimum and maximum of all elements in m1 using the m1(:) notation>> min( m1(:) )ans = 1>> max( m1(:) )ans = 6

Page 17: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

17

Basic statistics

Caution - depending on the statistic, the two methods may not be the same

Try It>> m1 = [ 1 2 3; 4 5 6];

Compute the standard deviation both ways using std()>> std( std( m1 ) )ans = 0>> std( m1(:) )ans = 1.8708 Wuz

up?

Page 18: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

18

Basic statistics

Can also take statistics of each row by putting in second parameter of "2", e.g.

Example>> m1m1 = 1 2 3 4 5 6 >> mean( m1, 2 )ans = 2 5

Page 19: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

19

Basic statistics

Try ItCompute the median of each row of>> m = [ 3:3:15; 1:5 ]m = 3 6 9 12 15 1 2 3 4 5

using the function median()>> median( m, 2 )ans = 9 3

Page 20: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

20

Basic statisticson vectors and matrices

Questions?

Page 21: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

21

Sorting

To get more than one of smallest or largest numbers must sort first.

sort( data, direction )• data is a one-dimensional vector• direction is 'ascend' or 'descend'

(if omitted, uses 'ascend')

Page 22: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

22

Sorting

ExampleFind 2 lowest and 2 highest grades>> sortedGrades = sort( grades )sortedGrades = 2 4 5 5 7 8 8 8 9 10>> sortedGrades(1:2)ans = 2 4>> sortedGrades(end-1:end)ans = 9 10

Page 23: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

23

SortingOften have rows of data, each row representing one object. Want to sort rows based on values in one column (called key). Use sortrows()

aSorted = sortrows( a, col )• a is a matrix• col is column to base sort on

– col > 0 sorts column |col| in ascending order

– col < 0 sorts column |col| in descending order

Page 24: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

24

Sorting

Try It% col 1=ID, col 2=graduation year% col 3=money donated>> alumni = [ 7885 2008 5;... 1202 1972 22900;... 4580 2000 350000 ];

recent grad, can only afford $5

Hit it rich in less than 10 years

Old guy, gives a little each year

Page 25: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

25

Sorting

Try It% sort by ID, lowest first>> sortrows( alumni, 1 )ans = 1202 1972 22900 4580 2000 350000 7885 2008 5

Page 26: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

26

Sorting

Example% sort by $, highest first>> sortrows( alumni, -3 )ans = 4580 2000 350000 1202 1972 22900 7885 2008 5

Page 27: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

27

Coefficient of correlation

Correlation quantifies the strength of a linear relationship between two variables. When there is no correlation between the two quantities, then there is no tendency for the values of one quantity to increase or decrease with the values of the second quantity.

Page 28: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

28

Coefficient of correlation

Coefficient of correlation (r)– Common way to quantify correlation– -1 ≤ r ≤ 1– r close to 1 means two signals trend the same way

• As one increases the other also increases• As one decreases the other also decreases

– r close to -1 means two signals are anticorrelated (trend the opposite way), i.e., as one increases the other decreases and vice versa– r close to zero means there's little correlation

Page 29: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

29

Coefficient of correlationTry ItReal data collected by a Miami graduate

>> data = xlsread( 'fuelcons.xls' );>> size( data )ans = 8 4Column 1 – Week

Column 2 – Average weekly temperature

Column 3 – Average weekly wind chill

Column 4 - Millions of cubic feet of natural gas per week required to heat the homes and businesses in a small city

Page 30: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

30

Coefficient of correlationTry ItHow should amount of gas used be correlated with wind chill, and why?

– Positive correlation because higher wind chill means higher heat desired, means more gas used

How should amount of gas used be correlated with temperature, and why?

– Negative correlation because lower temperature means higher heat desired, means more gas used

Page 31: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

31

Coefficient of correlationTry ItGraph columns 2-4 vs. column 1 on the same plot and determine if the data matches your previous determination of correlation

>> plot( data(:,1), data(:,2:4) )>> legend( 'Temperature', 'Wind chill', 'Gas' )

1 2 3 4 5 6 7 80

10

20

30

40

50

60

70

Temperature

Wind chill

Gas

Page 32: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

32

Coefficient of correlation

Try ItR = corrcoef(X)

• R is matrix of correlation coefficients– R contains coefficients for all pairs of columns

• X is matrix whose rows are observations (data points) and columns are variables

Compute all correlation coefficients using the matrix of the last three columns as input

Page 33: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

33

Coefficient of correlation

Try It>> corrcoef( data(:,2:4) )ans = 1.0000 -0.7182 -0.9484 -0.7182 1.0000 0.8706 -0.9484 0.8706 1.0000

Correlation of temperature with itself

Correlation of temperature with wind chillCorrelation of temperature with gas

Correlation of wind chill with gas

Column 2 – Average weekly temperatureColumn 3 – Average weekly wind chillColumn 4 - Millions of cubic feet of natural gas

Page 34: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

34

Coefficient of correlationTerm "highly correlated" is common but not precisely defined. A conventional meaning* is something like this:

– no or negligible correlation: 0.0 ≤ r < 0.2– low correlation:0.2 ≤ r < 0.4– moderate correlation: 0.4 ≤ r < 0.6– marked correlation: 0.6 ≤ r < 0.8– high correlation: 0.8 ≤ r ≤ 1.0

* A primer of statistics for non-statisticians

Franzblau, Abraham Norman, 1901-New York : Harcourt, Brace & World, 1958

Page 35: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

35

Coefficient of correlation

To refine the correlation analysis, often compute the p-value:

– The probability of getting a correlation as large as the observed value by chance, when the true correlation is zero

Page 36: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

36

Coefficient of correlation

If p is small, we say the correlation is significant

– Conventionally say that correlations with p-value less than 0.05 are significant– Often see results written as "r value of 0.34 with p < 0.05"– In words, "p < 0.05" means "there is less than a 5% chance that a truly uncorrelated data set would produce the given correlation coefficient"

Page 37: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

37

Coefficient of correlation

To compute p-values, use

[ r p ] = corrcoef( X )• X is matrix of data (as before)•r is matrix of correlation coefficients (as before)• p is matrix of corresponding p-values, i.e., p(i,j) is the p-value for r(i,j)

Page 38: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

38

Coefficient of correlationTry ItCompute correlation coefficients and p-values for fuelcons.xls data>> [ r p ] = corrcoef( data(:,2:4) )r = 1.0000 -0.7182 -0.9484 -0.7182 1.0000 0.8706 -0.9484 0.8706 1.0000p = 1.0000 0.0448 0.0003 0.0448 1.0000 0.0049 0.0003 0.0049 1.0000

Page 39: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

39

Coefficient of correlationFollow AlongWhich of the three coefficients of correlation are significant?p = 1.0000 0.0448 0.0003 0.0448 1.0000 0.0049 0.0003 0.0049 1.00001. Values below main diagonal are same as those above

2. Values on main diagonal always one§

3. Therefore, want to look at only values above main diagonal

§ The definition of the p-value is "the probability of getting a correlation as large as the observed value by chance, when the true correlation is zero". p(1,1) comes from data that is correlated with itself and so its true correlation is 1, not zero. p(1,1) is always 1 and in fact, the main diagonal of the p-value matrix is always all ones

Page 40: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

40

Coefficient of correlationFollow AlongWhich of the three coefficients of correlation are significant and what are values?p = 1.0000 0.0448 0.0003 0.0448 1.0000 0.0049 0.0003 0.0049 1.0000>> aboveDiagonal = triu( ones(size(p)), 1 )aboveDiagonal = 0 1 1 0 0 1 0 0 0>> significantLocations = p<0.05 & aboveDiagonal == 1significantLocations = 0 1 1 0 0 1 0 0 0>> p(significantLocations)ans = 0.0448 0.0003 0.0049

Press to skip details

Page 41: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

41

Coefficient of correlationMATLAB function triu(M,k) returns M with values below kth diagonal set to zero, those on kth diagonal and above kept as is

– k = 0 is main diagonal>> triu( p, 0 )ans = 1.0000 0.0448 0.0003 0 1.0000 0.0049 0 0 1.0000>> triu( p, 1 )ans = 0 0.0448 0.0003 0 0 0.0049 0 0 0

Want to test nonzero values of triu(p,1)

Page 42: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

42

Coefficient of correlationCan't just see which elements of triu() matrix are <= 0.05 because it substitutes zeros and a p-value can be zero>> upperTriangle = triu( p, 1 )upperTriangle = 0 0.0448 0.0003 0 0 0.0049 0 0 0

>> upperTriangle < 0.05ans = 1 1 1 1 1 1 1 1 1

Page 43: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

43

Coefficient of correlationNeed to find p-values that are <= 0.05 and above main diagonal

1. Mark elements above diagonal>> aboveDiagonal = triu( ones(size(p)), 1 )aboveDiagonal = 0 1 1 0 0 1 0 0 0

2. Mark elements also < 0.05>> significant = p<0.05 & aboveDiagonal == 1ans = 0 1 1 0 0 1 0 0 0>> p(significant)ans = 0.0448 0.0003 0.0049

Page 44: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

44

Data analysis

MATLAB has other analysis functions• Histograms• Cumulative products• Finite differences• Fourier transforms

For more information, type help datafun

Page 45: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

45

Sorting and correlation

Questions?

Page 46: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

46

Polynomial fittingFind polynomial of specified degree that matches data with least error

ExampleMake 100 data points of uniform, random constant from [0,5) with additive Gaussian noise of zero mean, variance 4

>> yIntercept = 5 * rand(1,1)yIntercept = 3.2980>> data1 = yIntercept * ones(1,100);>> data1 = data1 + 4*randn(1,100);>> plot( data1 )

Page 47: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

47

Polynomial fittingExample

0 10 20 30 40 50 60 70 80 90 100-10

-5

0

5

10

15

Page 48: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

48

Polynomial fitting

P = POLYFIT(X,Y,N) finds the coefficients of a polynomial P(X) of degree N that fits the data Y best in a least-squares sense. P is a row vector of length N+1 containing the polynomial coefficients in descending powers,

P(1)*X^N + P(2)*X^(N-1) +...+ P(N)*X + P(N+1).

Page 49: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

49

Polynomial fitting

ExampleFind polynomial of degree 0 that matches data with least error>> x = 1:100;>> poly1 = polyfit( x, data1, 0 );>> y1 = poly1(1) + zeros(1,100);>> plot( x, data1, 'r-', x, y1, 'b-' );>> poly1poly1 = 2.9677

Created with 3.2980

Page 50: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

50

Polynomial fitting

ExampleResult (blue line) does look like it's right in the middle, i.e., it's the mean

0 10 20 30 40 50 60 70 80 90 100-10

-5

0

5

10

15

Page 51: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

51

Polynomial fitting

ExampleMake 100 data points of line with y-intercept from uniform, random distribution from [0,5), with slope from uniform, random distribution from [0,-1), with additive Gaussian noise of zero mean, variance 1

Page 52: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

52

Polynomial fitting

Example>> yIntercept = 5 * rand(1,1)yIntercept = 2.0381>> slope = -rand(1,1)slope = -0.8200>> data2 = slope*(1:100)+yIntercept;>> data2 = data2 + randn(1,100);>> plot( data2 );

Page 53: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

53

Polynomial fitting

Example

0 10 20 30 40 50 60 70 80 90 100-90

-80

-70

-60

-50

-40

-30

-20

-10

0

10

Page 54: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

54

Polynomial fittingExampleFind polynomial of degree 1 that matches data with least error>> x = 1:100;>> poly2 = polyfit( x, data2, 1 );>> y2 = poly2(1)*(1:100)+poly2(2);>> plot( x, data2, 'r-', x, y2, 'b-' );>> poly2poly2 = -0.8212 1.9858

Created with -0.8200 2.0831

Page 55: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

55

Polynomial fitting

Example

0 10 20 30 40 50 60 70 80 90 100-90

-80

-70

-60

-50

-40

-30

-20

-10

0

10

Page 56: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

56

Polynomial fitting

ExampleSuppose want to compare remainder after get rid of output from model

If model is constant or linear, can get rid of its effect by detrending

Page 57: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

57

Polynomial fittingExample

Y = detrend( X, trend_type )– X is data vector– trend_type is 'constant' or 'linear'

removes the best fit straight-line or constant vector from X and returns the residual in vector Y

Page 58: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

58

Polynomial fitting

Example

Now can see that noise in first data much larger than noise in second data

0 10 20 30 40 50 60 70 80 90 100-15

-10

-5

0

5

10

Page 59: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

59

Polynomial fittingPreviously, evaluated best-fit polynomial "by hand", i.e., explicitly multiplying the polynomial coefficients by powers of the independent variable>> x = 1:100;>> poly2 = polyfit( x, data2, 1 );>> y2 = poly2(1)*(1:100)+poly2(2);

For higher powers, this is clumsy to code and inefficient to run

Page 60: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

60

Polynomial fitting

A more general way to evaluate a polynomial at various points is to use

y = polyval(p,x) – returns value of polynomial of degree n evaluated at x– p is a vector of length n+1 whose elements are the coefficients in descending powers of the polynomial to be evaluated

• NOTE - p is the same as the output from polyfit()!

Page 61: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

61

Polynomial fitting

Try ItMake an input of ½ cycle of sinusoid and a noisy input equal to the previous input with additive Gaussian noise of 0 mean, ½ standard deviation. On one graph, plot input, noisy input, and best fit cubic

Page 62: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

62

Polynomial fittingTry It>> x = 0:pi/100:pi;>> input = sin(x);>> noisyInput = input + 0.5 * randn( size(input) );>> p = polyfit( x, noisyInput, 3 );>> bestFit = polyval( p, x );>> plot( x, [ input; noisyInput;... bestFit ] )>> legend( 'Input', 'Noisy input',... 'Best fit cubic' );

0 0.5 1 1.5 2 2.5 3 3.5-1

-0.5

0

0.5

1

1.5

2

Input

Noisy input

Best fit cubic

Page 63: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

63

Polynomial fitting

Be careful of extrapolating your model too far! The further past the range of your input independent values you go, the more likely the model is to be wrong.

Page 64: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

64

Polynomial fittingExampleRepeat previous example but 1) let input go from 0 to 2π; 2) fit model only from 0 to π; 3) plot all three from 0 to 2π>> x = 0:pi/100:2*pi;>> input = sin(x);>> noisyInput = input +...0.5 * randn( size(input) );>> midIx = round( length(x)/2 );>> p = polyfit( x(1:midIx),...noisyInput(1:midIx), 3 );>> bestFit = polyval( p, x );>> plot( x, [ input; noisyInput; bestFit ] )>> legend( 'Input', 'Noisy input', 'Best fit cubic' );>>

0 1 2 3 4 5 6 7-10

-8

-6

-4

-2

0

2

Input

Noisy input

Best fit cubic

Bad fit in extrapolation

Page 65: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

65

Polynomial fitting

Sometimes may want to know what value of independent variable produces a specified value of the model, i.e., if p(x) is the polynomial model, for what x is p(x) = p* ?

– If h(x) is the altitude of a rocket x miles from its launch site, where does the rocket land, i.e., for what x is h(x) = 0?– If v(t) is the voltage over a capacitor at time t, when does the voltage reach 10V ?

Page 66: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

66

Polynomial fittingpolyfit( x, y, n ) returns a vector p of n+1 coefficients such that the best fit polynomial is

We're given the constant y* and want to know what x (or x's) produces p(x) = y*

So the x's we're looking for are the roots of the polynomial p(x) - y* = 0

)1()()2()1()( 11 npxnpxpxpxy nn

0)1()()2()1(

)1()()2()1(*11

11*

ynpxnpxpxp

npxnpxpxpynn

nn

Page 67: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

67

Polynomial fittingTo find the roots of a polynomial use the MATLAB function

r = roots( p ) • p is a polynomial defined as before• r is a column vector

Note that roots can be complex. To find roots that have non-zero imaginary parts, i.e., that have complex values, use the expression imag(r) ~= 0

(See help on isreal() for details of why this works)

Page 68: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

68

Polynomial fittingExampleThe amount of material left after radioactive decay is given by

– A(t) is amount remaining at time t– A0 is amount at t = 0

– T1/2 (called the half-life) is time for amount to be cut in half from amount at start

• At A(T1/2) the amount of material is A0 / 2

• At A(2T1/2) the amount of material is A0 / 4

• At A(3T1/2) the amount of material is A0 / 8

• etc.

21

2ln

0)(T

t

eAtA

Page 69: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

69

Polynomial fittingExampleAssume that there are initially 50 grams of a substance whose half-life is 5 hours. Simulate measurements of the decay by using the radioactive decay equation and adding zero-mean Gaussian noise with a standard deviation of 1. Make data points for every quarter hour of one day, starting at time zero. 1. Fit a cubic polynomial to the data and plot the fit

and the noisy data

2. Determine when the fit predicts there will be 6.25 grams left

3. Compare this to the correct answer

Page 70: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

70

Polynomial fittingFollow AlongFollow along!

For A0 = 50 and T1/2 = 5 the equation becomes

• Simulate the measurements>> t = 0:0.25:23.75;>> decay = 50*exp( -t*log(2)/5 ) + randn( size(t) );

• Find the best-fit cubic>> p = polyfit( t, decay, 3 );>> bestFit = polyval( p, t );

• Plot both>> plot( t, [ decay; bestFit ] )

5

2ln

50)(t

etA

0 5 10 15 20 250

5

10

15

20

25

30

35

40

45

50

Page 71: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

71

Polynomial fittingFollow Along• Find the roots of p(t) = 6.25>> p(end) = p(end) - 6.25;>> r = roots( p )r = 22.5719 + 9.8810i 22.5719 - 9.8810i 14.8484 • Eliminate all imaginary roots (answer must be real)>> r( imag(r) ~= 0 ) = [];>> rr = 14.8484

Page 72: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

72

Polynomial fitting

Follow AlongTo compare answers, note that 6.25g = 50/8g, and the decay equation says this happens in three half-times, i.e., in 15 hours. The best-fit predicts 14.85 hours, which is quite close

Page 73: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

73

Polynomial fitting

Questions?

Page 74: MATLAB Data Analysis Greg Reese, Ph.D Research Computing Support Group Academic Technology Services Miami University

74

The End