chapter 4 bivariate data notes pwe 2015 - paul...

30
Page 1 of 30 Year 11 General Maths Year 10 General Mathematics Unit 2 Bivariate Data – Chapter 4 Chapter Four 1 st Edition 2 nd Edition 2013 4A 1, 2, 3, 4, 6, 7, 8, 9, 10, 11 1, 2, 3, 4, 6, 7, 8, 9, 10, 11 2F (FM) 1, 2(ii), 3, 4 1, 2(ii), 3, 4 4C 1 (a, b, c, d, g, h), 2, 3, 4, 5, 6, 7, 8 1 (a, b, c, d, g, h), 2, 3, 4, 5, 6, 7, 8 4D 1, 3, 5, 6, 7, 8, 9, 10, 13, 14, 15, 16 1, 3, 5, 6, 7, 8, 9, 10, 13, 14, 15, 16 3A (FM) 1 1 3B (FM) 1, 2, 3, 8, 9 1, 2, 3, 8, 9 3C (FM) 1, 2, 3, 4, 5 1, 2, 3, 4, 5 3E (FM) 1, 2, 3, 5, 8 1, 2, 3, 5, 8 Please note that exercises that are indicated as (FM) refer to the exercises from Year 12 Further Maths textbook. These exercises ARE included as part of these handout notes. More resources http://drweiser.weebly.com

Upload: lamkien

Post on 03-Mar-2018

240 views

Category:

Documents


11 download

TRANSCRIPT

Page 1: Chapter 4 Bivariate Data Notes PWE 2015 - PAUL WEISERdrweiser.weebly.com/uploads/5/2/6/4/52647653/... · Chapter 4 Bivariate Data Notes PWE 2015

Page 1 of 30  

 

   

 

Year  11    

General  Maths  

 

 

Year  10  

General  Mathematics  Unit  2  

 

Bivariate  Data  –  Chapter  4  

 

 Chapter  Four  

  1st  Edition   2nd  Edition  2013  4A   1,  2,  3,  4,  6,  7,  8,  9,  10,  11   1,  2,  3,  4,  6,  7,  8,  9,  10,  11  

2F  (FM)   1,  2(ii),  3,  4   1,  2(ii),  3,  4  4C   1  (a,  b,  c,  d,  g,  h),  2,  3,  4,  5,  6,  7,  8   1  (a,  b,  c,  d,  g,  h),  2,  3,  4,  5,  6,  7,  8  4D   1,  3,  5,  6,  7,  8,  9,  10,  13,  14,  15,  16   1,  3,  5,  6,  7,  8,  9,  10,  13,  14,  15,  16  

3A  (FM)   1   1  3B  (FM)   1,  2,  3,  8,  9   1,  2,  3,  8,  9  3C  (FM)   1,  2,  3,  4,  5   1,  2,  3,  4,  5  3E  (FM)   1,  2,  3,  5,  8   1,  2,  3,  5,  8    Please  note   that  exercises   that  are   indicated  as   (FM)  refer   to   the  exercises   from  Year  12  Further  Maths  textbook.  These  exercises  ARE  included  as  part  of  these  handout  notes.    

More  resources  

http://drweiser.weebly.com        

Page 2: Chapter 4 Bivariate Data Notes PWE 2015 - PAUL WEISERdrweiser.weebly.com/uploads/5/2/6/4/52647653/... · Chapter 4 Bivariate Data Notes PWE 2015

Page 2 of 30  

     

Table  of  Contents  

 

Bivariate  Data  –  Chapter  4  ................................................................................................................  1  

4A  –  SCATTERPLOTS  .........................................................................................................................  3  

Drawing  conclusions/causation  ...................................................................................................................  4  

2F  (Further  Mathematics)  .................................................................................................................  6  

Pearson’s  Product  -­‐  Moment  Correlation  Coefficient  (r)  ..............................................................................  6  

Exercise  2F  ..................................................................................................................................................  7  

4C  –  LINEAR  MODELLING  ..................................................................................................................  8  

4D  –  MAKING  PREDICTIONS  ...........................................................................................................  12  

Interpolation  and  Extrapolation  ................................................................................................................  14  

Reliability  of  Results  ..................................................................................................................................  14  

3A  INTRODUCTION  TO  REGRESSION  (Further  Maths)  .....................................................................  15  

3A  Method  of  Fitting  Lines  by  Eye  .............................................................................................................  15  

Exercise  3A  ................................................................................................................................................  16  

3B  Fitting  a  straight  line  —  the  3-­‐median  method  ..........................................................................  17  

Graphical  approach  ...................................................................................................................................  17  

Arithmetic  approach  .................................................................................................................................  17  

CAS  CALCULATOR:  Fitting  a  Straight  Line  Using  the  3  Median  Method  ........................................................  20  

Exercise  3B  ................................................................................................................................................  21  

3C  Fitting  a  straight  line  —  least-­‐squares  regression  .......................................................................  22  

Choosing  Between  3-­‐Median  and  Least  –Squares  Regression  ....................................................................  22  

Calculating  the  least-­‐squares  regression  line  by  hand  ..................................................................................  23  

Exercise  3C  ................................................................................................................................................  25  

3E  Residual  analysis  .......................................................................................................................  26  

Residual  Plot  .............................................................................................................................................  27  

Exercise  3E  ................................................................................................................................................  30  

 

 

Page 3: Chapter 4 Bivariate Data Notes PWE 2015 - PAUL WEISERdrweiser.weebly.com/uploads/5/2/6/4/52647653/... · Chapter 4 Bivariate Data Notes PWE 2015

Page 3 of 30  

4A  –  SCATTERPLOTS   Bivariate  data  result  from  measurements  being  made  on  each  of  the  two  variables  for  a  given  set  of  items.  

Bivariate   data   can   be   graphed   on   a   scatterplot   (or  scattergraph)  as  shown  at  left.    

Each  of  the  data  points   is  represented  by  a  single  visible  point  on  the  graph.  

When   drawing   a   scatterplot,   we   need   to   choose   the  correct  variable  to  assign  to  each  of  the  axes.    

The   convention   is   to   place   the   independent   variable   on  the  x-­‐axis  and  the  dependent  variable  on  the  y-­‐axis.    

The   independent   variable   in   an   experiment   or  investigation  is  the  variable  that  is  deliberately  controlled  or  adjusted  by  the  investigator.    

The  dependent  variable  is  the  variable  that  responds  to  changes  in  the  independent  variable.  

Example  1  

The  operators  of  a  casino  keep  records  of  the  number  of  people  playing  a  ‘Jackpot’  type  game.  The  

table  below  shows  the  number  of  players  for  different  prize  amounts.  

 a) Draw  a  scatter  plot  of  the  data  (no  calculator)  

 

 

 

 

 

Page 4: Chapter 4 Bivariate Data Notes PWE 2015 - PAUL WEISERdrweiser.weebly.com/uploads/5/2/6/4/52647653/... · Chapter 4 Bivariate Data Notes PWE 2015

Page 4 of 30  

Drawing  conclusions/causation  When  data  are  graphed,  we  can  often  estimate  by  eye  (rather  than  measure)  the  type  of  correlation  involved.  Our  ability  to  make  these  qualitative  judgements  can  be  seen  from  the  following  examples,  which  summarise  the  different  types  of  correlation  that  might  appear  in  a  scatterplot.  

 

     

Page 5: Chapter 4 Bivariate Data Notes PWE 2015 - PAUL WEISERdrweiser.weebly.com/uploads/5/2/6/4/52647653/... · Chapter 4 Bivariate Data Notes PWE 2015

Page 5 of 30  

Example  2  

Using  the  same  data  in  the  first  example:  

   

a) Draw  a  scatter  plot  of  the  data  using  your  CAS  calculator.  

   

b) State  the  type  of  correlation  that  the  scatterplot  shows.    

 

 c) Suggest  why  the  plot  is  not  perfectly  linear.  

 

 

 

 

 

 

   

Page 6: Chapter 4 Bivariate Data Notes PWE 2015 - PAUL WEISERdrweiser.weebly.com/uploads/5/2/6/4/52647653/... · Chapter 4 Bivariate Data Notes PWE 2015

Page 6 of 30  

2F  (Further  Mathematics)  

Pearson’s  Product  -­‐  Moment  Correlation  Coefficient  (r)  A   more   precise   tool   to   measure   the   correlation   between   the   two   variables   is   Pearson’s   product-­‐moment  

correlation  coefficient  (denoted  by  the  symbol  r).    It  is  used  to  measure  strength  of  linear  relationships  between  

two  variables.  The  value  of  r  ranges  from  −1  to  1.    That  is    −1  ≤  r  ≤  1.  

Following  is  a  gallery  of  scatterplots  with  the  corresponding  value  of  r  for  each.  

 

   

Page 7: Chapter 4 Bivariate Data Notes PWE 2015 - PAUL WEISERdrweiser.weebly.com/uploads/5/2/6/4/52647653/... · Chapter 4 Bivariate Data Notes PWE 2015

Page 7 of 30  

Exercise  2F  

 

 

 

 

 

 

   

Page 8: Chapter 4 Bivariate Data Notes PWE 2015 - PAUL WEISERdrweiser.weebly.com/uploads/5/2/6/4/52647653/... · Chapter 4 Bivariate Data Notes PWE 2015

Page 8 of 30  

4C  –  LINEAR  MODELLING  If   a   linear   relationship   exists   between   a   pair   of   variables   then   it   is   useful   to   be   able   to   summarise   the  relationship   in  terms  of  an  equation.  This  equation  can  then  be  used  to  make  predictions  about  the   levels  of  one  variable  given  the  value  of  the  other.  

The  process  of  finding  the  equation  is  known  as  linear  modelling.    

An  equation  can  be  found  to  represent  the  line  which  passes  through  any  two  points  by  using  two  coordinate  geometry  formulas.  

The  gradient  of  the  line,  passing  through  (x1,  y1)  and  (x2,  y2)  is  given  by:  

𝑚 =(𝑦! − 𝑦!)(𝑥! − 𝑥!)

 

The  equation  of  a  straight  line  with  the  gradient  m  and  passing  through  (x1,  y1)  is  given  by:  

𝑦−𝑦! = 𝑚(𝑥 − 𝑥!)                          if  you  use  𝑥!, 𝑦!  

Or  you  can  substitue  it  into  𝑦 = 𝑚𝑥 + 𝑐,  to  solve  for  c  

Example  1  

Find  the  equation  of  the  line  passing  through  the  points  (2,  6)  and  (5,  12).  

 

 

 

 

 

 

 

 

 

 

   

Page 9: Chapter 4 Bivariate Data Notes PWE 2015 - PAUL WEISERdrweiser.weebly.com/uploads/5/2/6/4/52647653/... · Chapter 4 Bivariate Data Notes PWE 2015

Page 9 of 30  

  To  find  the  equation  for  a  scatterplot  that  consists  of  many  points  we  need  to  fit  a  straight  line  through  the  whole  set  of  points.  

  The  process  of  fitting  a  line  to  a  set  of  points  is  often  referred  to  as  regression.  The  regression  line  or  trend   line   (also  known  as   line  of  best   fit)  may  be  placed  on  a   scatterplot  by  eye  or  by  using   the   three-­‐mean  method  (to  be  covered  in  exercise  3B).    

The  line  of  best  fit  is  the  straight  line  which  most  closely  fits  the  data.  

Ski  Resort  Data  

Its  equation  can  then  be  found  by  using  the  method   in   the   previous   example   by  choosing   any   two   points   that   are   on   the  line.  

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The  y-­‐intercept  is  the  value  of  y  when  the  level  of  x  is  zero,  that  is,  where  the  line  touches  the  y-­‐axis.    

The  gradient  (slope)  of  the  equation  represents  the  rate  of  change  of  variable  y  with  changing  x.  

Sometimes  after  drawing  a  scatterplot  it  is  clear  that  the  points  represent  a  relationship  that  is  not  linear.  The  relationship  might  be  one  of  the  non-­‐linear  types  shown  below.  

 

 In  such  cases  it  is  not  appropriate  to  try  to  model  the  data  by  attempting  to  fit  a  straight  line  through  the  points  and   find   its   equation.   It   is   similarly   inappropriate   to   attempt   to   fit   a   linear   model   (straight   line)   through   a  scatterplot  if  it  shows  that  there  is  no  correlation  between  the  variables.  

 

Page 10: Chapter 4 Bivariate Data Notes PWE 2015 - PAUL WEISERdrweiser.weebly.com/uploads/5/2/6/4/52647653/... · Chapter 4 Bivariate Data Notes PWE 2015

Page 10 of 30  

Example  2  

The  following  table  shows  the  fare  charged  by  a  bus  company  for  journeys  of  differing  length.  

   

a) Represent  the  data  using  a  scatterplot  and  place  in  the  trend  line  by  eye.  

   

b) Find  an  equation  that  relates  the  fare,  F,  to  distance  travelled,  d.  

 

 

 

 

 

 

 

 

 

c) Explain  in  words  the  meaning  of  the  y-­‐intercept  and  gradient  of  the  line.  

 

 

 

 

Page 11: Chapter 4 Bivariate Data Notes PWE 2015 - PAUL WEISERdrweiser.weebly.com/uploads/5/2/6/4/52647653/... · Chapter 4 Bivariate Data Notes PWE 2015

Page 11 of 30  

Example  3  

The  table  below  gives  the  times  (in  hours)  spent  by  8  students  studying  for  a  measurement  test  and  the  marks  (in  %)  obtained  on  the  test.  

   

a) Draw  the  scatterplot  to  represent  the  data.  Use  your  Calculator.  

 

 

 b) Using  your  calculator  find  the  equation  of  the  line  of  best  fit.  Write  your  equation  in  terms  of  the  variables:  

time  spent  studying  and  test  mark.  

 

   

Page 12: Chapter 4 Bivariate Data Notes PWE 2015 - PAUL WEISERdrweiser.weebly.com/uploads/5/2/6/4/52647653/... · Chapter 4 Bivariate Data Notes PWE 2015

Page 12 of 30  

4D  –  MAKING  PREDICTIONS  The  equation  of  the  trend  line  may  be  used  to  make  predictions  about  the  variables  by  substituting  a  value  into  the  equation.  

 

Example  1  

It   is   found   that   the   relationship  between   the  number  of  people  playing  a  casino   Jackpot  game  and   the  prize  money  offered  is  given  by  the  equation  N  =  0.07p  +  220,  where  N  is  the  number  of  people  playing  and  p  is  the  prize  money.  

 

a) Find  the  number  of  people  playing  when  the  prize  money  is  $2500.                        

 

b) Find  the  likely  prize  on  offer  if  there  were  500  people  playing.  

 

 

 

 

 

 

 

Using  technology:  

 Alternatively,  a  prediction  could  be  made  from  the  graph’s  trend  line.  

 

Page 13: Chapter 4 Bivariate Data Notes PWE 2015 - PAUL WEISERdrweiser.weebly.com/uploads/5/2/6/4/52647653/... · Chapter 4 Bivariate Data Notes PWE 2015

Page 13 of 30  

Example  2  

The  scatterplots  below  show  the  depth  of  snow  and  the  corresponding  number  of  skiers.  

From  the  graph’s  trend  line  find:  

a)  the  number  of  skiers  when  snow  depth  was  3  m.                          b)    the  depth  of  snow  that  would  attract  about  400  people.  

   

Page 14: Chapter 4 Bivariate Data Notes PWE 2015 - PAUL WEISERdrweiser.weebly.com/uploads/5/2/6/4/52647653/... · Chapter 4 Bivariate Data Notes PWE 2015

Page 14 of 30  

Interpolation  and  Extrapolation  We  use   the   term   interpolation  when  we  make   predictions   from   a   graph’s   trend  line   from   within   the   bounds   of   the  original  experimental  data.  

We  use   the   term  extrapolation  when  we  make   predictions   from   a   graph’s   trend  line   from   outside   the   bounds   of   the  original  experimental  data.  

Data  can  be   interpolated  or  extrapolated  either  algebraically  or  graphically.  

 

Reliability  of  Results  Results  predicted  (whether  algebraically  or  graphically)  from  the  trend  line  of  a  scatterplot  can  be  considered  reliable  only  if:  

1. a  reasonably  large  number  of  points  were  used  to  draw  the  scatterplot,  2. a  reasonably  strong  correlation  was  shown  to  exist  between  the  variables  (the  stronger  the  correlation,  the  

greater  the  confidence  in  predictions),  3. the   predictions   were  made   using   interpolation   and   not   extrapolation.   Extrapolated   results   can   never   be  

considered  to  be  reliable  because  when  extrapolation  is  used  we  are  assuming  that  the  relationship  holds  true  for  untested  values.  

 

   

Page 15: Chapter 4 Bivariate Data Notes PWE 2015 - PAUL WEISERdrweiser.weebly.com/uploads/5/2/6/4/52647653/... · Chapter 4 Bivariate Data Notes PWE 2015

Page 15 of 30  

3A  INTRODUCTION  TO  REGRESSION  (Further  Maths)    

The  process  of  ‘fitting’  straight  lines  to  bivariate  data  enables  us  to  analyse  relationships  between  the  data  and  possibly  make  predictions  based  on  the  given  data  set.  

Regression  analysis  is  concerned  with  finding  these  straight  lines  using  various  methods  so  that  the  number  of  points  above  and  below  the  line  is  ‘balanced’.  

3A  Method  of  Fitting  Lines  by  Eye  There  should  be  an  equal  number  of  points  above  and  below  the  line.  

Example  1:  

Fit  a  straight  line  to  the  data  in  the  figure  using  

the  equal-­‐number-­‐of-­‐points  method.  

 

 

 

   

Page 16: Chapter 4 Bivariate Data Notes PWE 2015 - PAUL WEISERdrweiser.weebly.com/uploads/5/2/6/4/52647653/... · Chapter 4 Bivariate Data Notes PWE 2015

Page 16 of 30  

Exercise  3A  1.      Fit  a  straight  line  to  the  data  in  the  scatterplots  using  the  equal-­‐number-­‐of-­‐points  method.  

 

 

 

   

Page 17: Chapter 4 Bivariate Data Notes PWE 2015 - PAUL WEISERdrweiser.weebly.com/uploads/5/2/6/4/52647653/... · Chapter 4 Bivariate Data Notes PWE 2015

Page 17 of 30  

3B  Fitting  a  straight  line  —  the  3-­‐median  method  Fitting  lines  by  eye  is  useful  but  it  is  not  the  most  accurate  of  methods.  

We  can  find  the  line  of  best  fit  in  the  form  of  ___________________________________  

One  method  to  find  the  line  of  best  fit  is  called  the  3-­‐median  method.  

This  method  is  as  follows:  

Step  1.   Plot  the  points  on  a  scatterplot.  

Step  2.    Divide  the  points  into  3  groups  (lower,  middle  and  upper)  using  vertical  divisions    

   (a)  If  the  number  of  points  is  divisible  by  3,  divide  them  into  3  equal  groups  

   (b)  If  there  is  1  extra  point,  put  the  extra  point  in  the  middle  group  

     (c)  If  there  are  2  extra  points,  put  1  extra  point  in  each  of  the  outer  groups  

Step  3.      Find  the  median  point  of  each  of  the  3  groups  and  mark  each  median  on  the  scatterplot  (the  median  of  the  x-­‐values  and  the  median  of  the  y-­‐values  in  the  group).  

  (a)  The  median  of  the  lower  group  is  denoted  by   ),( LL yx  

  (b)  The  median  of  the  middle  group  is  denoted  by   ),( MM yx    

  (c)  The  median  of  the  upper  group  is  denoted  by   ),( UU yx      

Note:  Although  the  x-­‐values  are  already  in  ascending  order  on  the  scatterplot,  the  y-­‐values  within  each  group  may  need  re-­‐ordering  before  you  can  find  the  median.  

Steps  4  and  5  can  be  completed  using  2  different  approaches;  graphical  or  arithmetic  

Graphical  approach  Step  4.  Place  your  ruler  so  that  it  passes  through  the  lower  and  upper  medians.  Move  the  ruler  a  third  of  the  

way  toward  the  middle  group  median  while  maintaining  the  slope.  Hold  the  ruler  there  and  draw  the  line.  

Step  5.  Find  the  equation  of  the  line  (general  form  y  =  mx  +  c).  

   There  are  two  general  methods.  

(a)  Method  A:  Choose  two  points  which  lie  on  the  line  and  use  these  to  find  the  gradient  of  the  line  and  then  the  equation  of  the  line.      

   12

12

xxyym

−=  Substitute  the  coordinates  of  one  point  and  m  into  the  equation  to  find  c  

(b)  Method  B:  If  the  scale  on  the  axes  begins  at  zero,  you  can  read  off  the  y-­‐intercept  of  the  line  and  calculate  the  gradient  of  the  line.    

Arithmetic  approach  

Step  4.  Calculate  the  gradient  (m)  of  the  line.  Use  the  rule:    LU

LU

xxyy

m−

−=  

Step  5.  Calculate  the  y-­‐intercept  (c)  of  the  line.  Use  the  rule:     ( ) ( )[ ]UMLUML xxxmyyyc ++−++=31

 

Thus,  the  equation  of  the  regression  line  is  y  =  mx  +  c.  

Page 18: Chapter 4 Bivariate Data Notes PWE 2015 - PAUL WEISERdrweiser.weebly.com/uploads/5/2/6/4/52647653/... · Chapter 4 Bivariate Data Notes PWE 2015

Page 18 of 30  

Example  1:  Find  the  equation  of  the  regression  line  for  the  data  in  the  table  using  the  3-­‐median  method.  Give  coefficients  correct  to  2  decimal  places.  

 

 

 

 

1. Sketch  the  scatterplot  then  divide  it  into  3  groups.    

 

 

2. Using  graphical  approach  to  find  the  equation  for  the  line  of  best  fit.  

 

 

 

 

 

 

 

 

Page 19: Chapter 4 Bivariate Data Notes PWE 2015 - PAUL WEISERdrweiser.weebly.com/uploads/5/2/6/4/52647653/... · Chapter 4 Bivariate Data Notes PWE 2015

Page 19 of 30  

3. Using  arithmetic  approach  to  find  the  equation  for  the  line  of  best  fit.  

i. Find  the  gradient  of  the  line  

 

 

ii. Find  y-­‐intercept  

 

 

 

 

 

 

 

 

 

iii. Find  the  equation  of  the  line  

 

 

 

 

 

 

   

Page 20: Chapter 4 Bivariate Data Notes PWE 2015 - PAUL WEISERdrweiser.weebly.com/uploads/5/2/6/4/52647653/... · Chapter 4 Bivariate Data Notes PWE 2015

Page 20 of 30  

CAS  CALCULATOR:  Fitting  a  Straight  Line  Using  the  3  Median  Method  Example  2  

Find   the   equation   of   the   regression   line   for   the   data   in   the   table   below   using   the   3-­‐median  method.   Give  coefficients  correct  to  2  decimal  places.  

 

 

On   a   Lists  &   Spreadsheet   page,   enter   x-­‐values   into   column  A  and  y-­‐values  into  column  B.  Label  the  columns  accordingly.  

 

 

 

 

 

 

 

To  draw  a  scatterplot  of  the  data,  add  a  Data  &  Statistics  page.    

Tab  e   to  each  axis   to   select   ‘Click   to  add  variable’.  Place  x  on  the  horizontal  axis  and  y  on  the  vertical  axis.  

 

 

 

The   graph   should   appear   as   shown.   If   you  move   the   pointer  lover  any  point  and  press  Click  x  twice,  the  coordinates  for  that  point  will  be  displayed.  

 

 

 

 

To  fit  a  regression  line,  complete  the  following  steps.  Press:    

• MENU  b  

• 4:  Analyse  4  

• 6:  Regression  6  

• 3:  Show  Median–Median  3  

 

   

Page 21: Chapter 4 Bivariate Data Notes PWE 2015 - PAUL WEISERdrweiser.weebly.com/uploads/5/2/6/4/52647653/... · Chapter 4 Bivariate Data Notes PWE 2015

Page 21 of 30  

Exercise  3B  

 

 

 

Page 22: Chapter 4 Bivariate Data Notes PWE 2015 - PAUL WEISERdrweiser.weebly.com/uploads/5/2/6/4/52647653/... · Chapter 4 Bivariate Data Notes PWE 2015

Page 22 of 30  

3C  Fitting  a  straight  line  —  least-­‐squares  regression  Another  method  for   finding  the  equation  of  a  straight   line  which   is   fitted  to  data   is  known  as  the  method  of  least-­‐squares  regression.  It  is  used  when  data  show  a  linear  relationship  and  have  no  obvious  outliers.  

To  understand  the  underlying  theory  behind  least-­‐squares,  consider  the  regression  line  shown  below.  

 

 

 

 

 

 

 

 

 

We  wish  to  minimise  the  total  of  the  vertical  lines,  or  ‘errors’  in  some  way.  For  example,  balancing  the  errors  above   and   below   the   line.   This   is   reasonable,   but   for   sophisticated  mathematical   reasons   it   is   preferable   to  minimise   the   sum  of   the   squares   of   each   of   these   errors.   This   is   the   essential  mathematics   of   least-­‐squares  regression.  

Choosing  Between  3-­‐Median  and  Least  –Squares  Regression  The   3-­‐median   method   should   be   used   in   preference   to   least-­‐squares   regression   method   if   there   are   clear  outliers  in  the  data    

The  calculation  of  the  equation  of  a  least-­‐squares  regression  line  is  simple  using  a  CAS  calculator.  

Example  3  

A  study  shows  the  more  calls  a  teenager  makes  on  their  mobile  phone,  the  less  time  they  spend  on  each  call.  Find  the  equation  of  the  linear  regression  line  for  the  number  of  calls  made  plotted  against  call  time  in  minutes  using  the  least-­‐squares  method  on  a  CAS  calculator.  Express  coefficients  correct  to  2  decimal  places.  

Number  of  minutes   1   3   4   7   10   12   14   15  

Number  of  calls   11   9   10   6   8   4   3   1  

 

On  a   Lists  &  Spreadsheet  page,  enter   the  minutes  values   into   column  A  and   the  number  of   calls   values   into  column  B.  Label  the  columns  accordingly.  

 

 

 

 

 

 

 

Page 23: Chapter 4 Bivariate Data Notes PWE 2015 - PAUL WEISERdrweiser.weebly.com/uploads/5/2/6/4/52647653/... · Chapter 4 Bivariate Data Notes PWE 2015

Page 23 of 30  

To  draw  a  scatterplot  of  the  data  in  a  Data  &  Statistics  page,  tab  e  to  each  axis  to  select  ‘Click  to  add  variable’.  Place  minutes  on  the  horizontal  axis  and  calls  on  the  vertical  axis.  The  graph  will  appear  as  shown.  

 

 

 

 

To  fit  a  least-­‐squares  regression  line,  complete  the  following  steps.  Press:    

• MENU  b  

• 4:  Analyse  4  

• 6:  Regression  6  

• 1:  Show  Linear  (mx+b)  1  

 

To  find  r  and  r2,  return  to  the  Lists  &  Spreadsheet  page  by  pressing  Ctrl/and   then   the   left   arrow.   Summary   variables   are   found   by  pressing:  

• MENU  b  

• 4:  Statistics  4  

• 1:  Stat  Calculations  1  

• 3:  Linear  Regression  (mx+b)  3  

 

Complete   the   table   as   shown   below   and   press   OK   to   display   the  statistical   parameters.   Notice   that   the   equation   is   stored   and  labelled  as  function  f1.  

The  regression  information  is  stored  in  the  first  available  column  on  the  spreadsheet.  

Calculating  the  least-­‐squares  regression  line  by  hand  Summary  data  needed:  

_x     The  mean  of  the  independent  variable  (x-­‐variable)  

y   The  mean  of  the  dependent  variable  (y-­‐variable)  

sx       the  standard  deviation  of  the  independent  variable  

sy     the  standard  deviation  of  the  dependent  variable  

r   Pearson’s  product–moment  correlation  coefficient.  

 

 

 

 

Page 24: Chapter 4 Bivariate Data Notes PWE 2015 - PAUL WEISERdrweiser.weebly.com/uploads/5/2/6/4/52647653/... · Chapter 4 Bivariate Data Notes PWE 2015

Page 24 of 30  

Formula  to  use:  

The  general  form  of  the  least-­‐squares  regression  line  is    

 

 

Where  the  slope  of  the  regression  line  is    

 

 

the  y-­‐intercept  of  the  regression  line  is    

 

 

 

Example  4:  

A   study   to   find   a   relationship   between   the   height   of   husbands   and   the   height   of   their   wives   revealed   the  following  details.  

Mean  height  of  the  husbands:  180  cm  

Mean  height  of  the  wives:  169  cm  

Standard  deviation  of  the  height  of  the  husbands:  5.3  cm  

Standard  deviation  of  the  height  of  the  wives:  4.8  cm  

Correlation  coefficient,  r  =  0.85  

The  form  of  the  least-­‐squares  regression  line  is  to  be:  Height  of  wife  =  m  ×  height  of  husband  +    c  

(a) Which  variable  is  the  dependent  variable?  ______________________________  

 

 

(b) Calculate  the  value  of  m  for  the  regression  line  (to  2  decimal  places).    

 

 

 

 

(c) Calculate  the  value  of  c  for  the  regression  line  (to  2  decimal  places).      

   

(d) Use  the  equation  of  the  regression  line  to  predict  the  height  of  a  wife  whose  husband  is  195  cm  tall  (to  the  nearest  cm).  

 

 

 

 

 

Page 25: Chapter 4 Bivariate Data Notes PWE 2015 - PAUL WEISERdrweiser.weebly.com/uploads/5/2/6/4/52647653/... · Chapter 4 Bivariate Data Notes PWE 2015

Page 25 of 30  

Exercise  3C  

 

 

 

 

   

Page 26: Chapter 4 Bivariate Data Notes PWE 2015 - PAUL WEISERdrweiser.weebly.com/uploads/5/2/6/4/52647653/... · Chapter 4 Bivariate Data Notes PWE 2015

Page 26 of 30  

3E  Residual  analysis  There  are  situations  where  the  mere  fitting  of  a  regression  line  to  some  data  is  not  enough  to  convince  us  that  the  data  set  is  truly  linear.  Even  if  the  correlation  is  close  to  +1  or  –  1  it  still  may  not  be  convincing  enough.  

The  next  stage  is  to  analyse  the  residuals,  or  deviations,  of  each  data  point  from  the  straight  line.  

A  residual  is  the  vertical  difference  between  each  data  point  and  the  regression  line.  

 

 

 

 

 

When   we   plot   the   residual   values   against  the   original   x-­‐values   and   the   points   are  randomly   scattered   above   and   below   zero  (x-­‐axis),   then  the  original  data   is  most   likely  to  have  a  linear  relationship.  

 

 

 

 

 

 

 

If  the  residual  plot  shows  some  sort  of  pattern  then  the  original  data  probably  is  not  linear  

 

   

Page 27: Chapter 4 Bivariate Data Notes PWE 2015 - PAUL WEISERdrweiser.weebly.com/uploads/5/2/6/4/52647653/... · Chapter 4 Bivariate Data Notes PWE 2015

Page 27 of 30  

Residual  Plot  To  produce  a  residual  plot,  carry  out  the  following  steps:  

Step  1.     Draw  up  a  table  as  follows  

x   1   2   3   4   5   6   7   8   9   10  

y   5   6   8   15   24   47   77   112   187   309  

ypred                      

Residuals  (y−ypred)  

                   

 

Step  2.   Find  the  equation  of  the  least-­‐squares  regression  line  y  =  mx  +  b  using  the  graphics  calculator.  

Step  3.     Calculate  the  predicted  y-­‐values  (ypred)  using  the  least  squares  regression  equation.      

    The  predicted  y-­‐values  are  the  y-­‐values  on  the  regression  line.  

    Put  these  values  into  the  table.  

Step  4.     Calculate  the  residuals.  

    Residual  value  =  y  -­‐  ypred  

                                                           

                                                       actual  data  value     y-­‐value  from  the  regression  line  

  Enter  these  values  into  the  table.  

  Note:  the  sum  of  all  the  residuals  will  always  add  to  zero  (or  very  close).  

Step  5.   Plot  the  residual  values  against  the  original  x-­‐values.  

If  the  data  points  in  the  residual  plot  are  randomly  scattered  above  and  below  zero  (the  x-­‐axis),  then  the  original  data  will  probably  be  linear.  

  If  the  residual  plot  shows  a  pattern  then  the  original  data  is  not  linear.  

 

Example  8  

Use  the  data  below  to  produce  a  residual  plot  and  comment  on  the  likely  linearity  of  the  data.  

Step  1.  

x   1   2   3   4   5   6   7   8   9   10  

y   5   6   8   15   24   47   77   112   187   309  

ypred                      

Residual  (y  –  y  pred)  

                   

 

Step  2.   Equation  of  the  least-­‐squares  regression  line.  

    y  =  ax  +  b  

     

   

Page 28: Chapter 4 Bivariate Data Notes PWE 2015 - PAUL WEISERdrweiser.weebly.com/uploads/5/2/6/4/52647653/... · Chapter 4 Bivariate Data Notes PWE 2015

Page 28 of 30  

Step  3.   Calculate  the  predicted  y-­‐values  using  the  equation  _________________________________  

  When  x  =  1                  ypred  =  

                                   =  

         =  

  When  x  =  2                ypred  =  

          =  

=  

Or  use  the  CAS  calculator  to  get  the  ypred  values  from  the  regression  line  by  opening  a  Graphs  &  Geometry  page  and  enter  the  equation  of  the  least-­‐squares  regression  and  press  enter.  

Once  you  have  the  graph  press    

• Menu  b,    • 5:  Trace  5      • 1:  Graph  Trace  1.    • Type  in  the  x  value  and  the  corresponding  y  value  will  appear.  

Step  4.   Calculate  the  residuals.  

Residual  =  y  −  ypred    

  Residual  =       Residual  =    

                                   =                                      =  

                                   =                                        =    

 

Calculate  the  rest  of  the  residuals  and  enter  them  into  the  table.      

Add  all  residuals  to  check  it  equals  zero.  

Step  5.   Plot  residual  values  against  original  x-­‐values.    

 

x

y

0 1 2 3 4 5 6 7 8 9 10

-50

-40

-30

-20

-10

0

10

20

30

40

50

60

70

80

90

100 Residual

Page 29: Chapter 4 Bivariate Data Notes PWE 2015 - PAUL WEISERdrweiser.weebly.com/uploads/5/2/6/4/52647653/... · Chapter 4 Bivariate Data Notes PWE 2015

Page 29 of 30  

The  residual  plot  shows    _____________________________________________________________________    

  ________________________________________________________________________________________    

  ________________________________________________________________________________________    

 

Using  a  CAS  calculator  

Find  the  equation  of  a  least-­‐squares  regression  line.  

Enter  the  data  on  a  Lists  &  Spreadsheet  page.  

To  find  the  values  of  m  and  b  for  the  equation    

y  =  mx  +  b  press  

• MENU  b  

• 4:  Statistics  4  

• 1:  Stat  Calculations  1  

• 3:  Linear  Regression  (  mx  +  b)  3  

     

To  generate  the  residual  values   in  their  own  column,  move   to   the  shaded  cell  in  column  E  and  press:    

• Ctrl  /  • MENU  b  • 4:  Variables  …  4  • 3:  Link  To:  ¢  3  • Select  the  list  stat6.resid  

Write  down  all  of  the  residuals  displayed  in  the  column.  Scroll  down  for  the  complete  list  of  values.    

Note:   The   stat   number  will   vary   depending   on   the   calculator   and  previously  stored  data.  

 

Example  9  

Using  the  same  data  as  in  Worked  example  8,  plot  the  residuals  and  discuss  the  features  of  the  residual  plot.  

 

Generate  the  list  of  residuals  as  demonstrated  in  Example  8.  

On   the  Data  &  Statistics  page   select  x   for   the  x-­‐axis   and   stat.resid  for  the  y-­‐axis.  

To  identify  if  a  pattern  exists,  it  is  useful  to  join  the  residual  points.    

To  do  this,  press:    • MENU  b  • 2:  Plot  Properties  2  • 1:  Connect  Data  Points  1  

   

Page 30: Chapter 4 Bivariate Data Notes PWE 2015 - PAUL WEISERdrweiser.weebly.com/uploads/5/2/6/4/52647653/... · Chapter 4 Bivariate Data Notes PWE 2015

Page 30 of 30  

Exercise  3E