maths write up

27
LIVERPOOL JOHN MOORES UNIVERSITY SCHOOL of the BUILT ENVIRONMENT Applied Engineering Mathemacs Students Names Phillip Cooke Student Number 643022 Module Code 5100BEUG Assignment Name Assignment 1 Submission Date 01/12/2014 Lecturer Fil Ruddock A=6, B=4, C=3, D=0, E=2, F=2 Binary = 10011100111111000000

Upload: ljmu

Post on 20-Nov-2023

2 views

Category:

Documents


0 download

TRANSCRIPT

LIVERPOOL JOHN MOORES UNIVERSITY

SCHOOL of the BUILT ENVIRONMENT

Applied Engineering Mathematics

Students Names Phillip Cooke

Student Number 643022

Module Code 5100BEUG

Assignment Name Assignment 1

Submission Date 01/12/2014

Lecturer Fil Ruddock

A=6, B=4, C=3, D=0, E=2, F=2Binary = 10011100111111000000

Question 1The investigation of the probabilities of winning the lottery in different scenarios based on a random choice of “x” balls from a set of “y” balls. The values given are shown below.

x y

4 10 + af

6 40 + b

5 40 + b

7 20 + cd

Student Number = 643022

a 6b 4c 3d 0e 2f 2

Subbing in the specific values gives the set number of balls in each lottery type.

x y4 726 445 447 50

A) First of all, the probability of winning the lottery when each lottery ball is replaced after each selection will be calculated. This means, when a ball is selected, it is put back into the draw. Since, in a lottery, you cannot have the same number twice on your docket, a problem can occur. This is because the ball that has been selected and replaced still has a chance of being selected again and will not conclude in the lottery being won.

So, to calculate the probability for the second selection, it is worked out as the previous number of required balls minus 1 out of the same number of balls included. To finally work out the probability of winning the lottery, it is the probability of getting the first ball multiplied by the probability of getting the next ball multiplied by the probability of getting the next ball etc. This is because in probability “AND” means “MULITIPLY” so to win the lottery you need to select your first ball AND your second ball AND your third ball etc.

A calculated example is shown below with the probabilities of winning the lottery if 4 balls are selected out of 72 then replaced back into the ball collection. The final results are also shown below

the sample.

To have the 1st required ball selected, the probability (1) is: 472

The probability correctly selecting a 2nd correct ball (2) is: 372

The probability of correctly selecting the 3rd correct ball (3) is: 272

The probability of correctly selecting the 4th and final correct ball (4) is: 172

To find the probability of winning the lottery in this scenario, the contender would need to have correctly picked ball (1) AND (2) AND (3) AND (4).

Therefore, using the knowledge that in probability “AND” means “MULTIPLY”, the probability of winning this specific lottery is as follows:

472× 3

72× 2

72× 1

72 = 11119744 = 0.000000893061

Probabilities of winning lottery when ball is replaced after chosen

x y Ball 1 Ball 2 Ball 3 Ball 4 Ball5 Ball 6 Ball7Probability of

winning

472

0.0555556

0.0416667

0.0277778

0.0138889

0.000000893061

644

0.1363636

0.1136364

0.0909091

0.0681818

0.0454545

0.0227273

0.000000099224

544

0.1136364

0.0909091

0.0681818

0.0454545

0.0227273

0.000000727642

750

0.1400000

0.1200000

0.1000000

0.0800000

0.0600000

0.0400000

0.0200000

0.000000006451

B) Now to calculate the probability of winning the lottery when the selected balls are handled in a completely different way. This time, once a ball is selected, it is not replaced back into the selection. Instead, it is kept out so there is zero chance it can be selected again.

To calculate the first probability of selecting the first ball correctly, it is just the number of balls to be selected divided by the total number of balls.

The probability of the next selection to be another required number, it will be the previous number of balls required in the last section minus 1, divided by the total number of balls minus 1. This is because one ball has been selected and removed from the game.

A calculated example is shown below with the probabilities of winning the lottery if 4 balls are selected out of 72 then permanently removed from the collection and cannot be selected again. The final results are also shown below the sample.

To have the 1st required ball selected, the probability (1) is: 472

x y4 72

x y4 72

The probability correctly selecting a 2nd correct ball (2) is: 371

The probability of correctly selecting the 3rd correct ball (3) is: 2

70

The probability of correctly selecting the 4th and final correct ball (4) is: 1

69

So to calculate the probability of getting all required balls selected and winning the lottery, use the “AND” means “MULTIPLY” method:

472× 3

71× 2

70× 1

69 = 11028790 = 0.0000009720

Probabilities of winning lottery when ball is not replaced after chosen

x Y Ball 1 Ball 2 Ball 3 Ball 4 Ball5 Ball 6 Ball7Probability of

winning

472

0.0555556

0.0422535

0.0285714

0.0144928

0.0000009720

644

0.1363636

0.1162791

0.0952381

0.0731707

0.0500000

0.0256410

0.0000001417

544

0.1136364

0.0930233

0.0714286

0.0487805

0.0250000

0.0000009208

750

0.1400000

0.1224490

0.1041667

0.0851064

0.0652174

0.0444444

0.0227273

0.0000000100

By comparing the results, there is a higher probability of winning the lottery when the ball that is selected is not returned back into the machine. This is because it eliminates the chance of selecting the same ball twice, which increases the probability of selecting the required balls.

The results above also show that the higher the number of balls required to win the lottery, the harder it is to win the lottery. This is because more and more probabilities are multiplies giving a smaller and smaller probability.

Question 2A)

Using this raw data, a graph was produced to show the pollution levels with time. This is shown on the next page.

Date NO2 (x)02/05/2014 5303/05/2014 4604/05/2014 7305/05/2014 8006/05/2014 6907/05/2014 7408/05/2014 9009/05/2014 7810/05/2014 5011/05/2014 2712/05/2014 6113/05/2014 5314/05/2014 7315/05/2014 9616/05/2014 8017/05/2014 7618/05/2014 8219/05/2014 4020/05/2014 5221/05/2014 6522/05/2014 5923/05/2014 10324/05/2014 61.325/05/2014 6526/05/2014 2927/05/2014 7128/05/2014 6729/05/2014 5930/05/2014 6531/05/2014 6701/06/2014 9402/06/2014 10103/06/2014 5204/06/2014 6505/06/2014 5206/06/2014 6707/06/2014 9708/06/2014 7309/06/2014 4810/06/2014 7411/06/2014 6312/06/2014 6113/06/2014 6514/06/2014 6315/06/2014 8016/06/2014 4417/06/2014 6318/06/2014 6919/06/2014 5320/06/2014 7321/06/2014 5722/06/2014 7123/06/2014 3624/06/2014 5725/06/2014 3426/06/2014 3827/06/2014 1528/06/2014 1929/06/2014 5530/06/2014 52

To calculate the standard deviation of these values, the first thing that has to be calculated is the mean. This is done by selecting all NO2 values in the “AVERAGE” function on Excel. The formula used to find the mean and the calculated value of the mean is shown below.

Mean (x)62.58833

Date NO2 (x)

02/05/2014 5303/05/2014 4604/05/2014 7305/05/2014 8006/05/2014 6907/05/2014 7408/05/2014 9009/05/2014 7810/05/2014 5011/05/2014 2712/05/2014 6113/05/2014 5314/05/2014 7315/05/2014 9616/05/2014 8017/05/2014 7618/05/2014 8219/05/2014 4020/05/2014 5221/05/2014 6522/05/2014 5923/05/2014 10324/05/2014 61.325/05/2014 6526/05/2014 2927/05/2014 7128/05/2014 6729/05/2014 5930/05/2014 6531/05/2014 6701/06/2014 9402/06/2014 10103/06/2014 5204/06/2014 6505/06/2014 5206/06/2014 6707/06/2014 9708/06/2014 7309/06/2014 4810/06/2014 7411/06/2014 6312/06/2014 6113/06/2014 6514/06/2014 6315/06/2014 8016/06/2014 4417/06/2014 6318/06/2014 6919/06/2014 5320/06/2014 7321/06/2014 5722/06/2014 7123/06/2014 36

24/06/2014 5725/06/2014 3426/06/2014 3827/06/2014 1528/06/2014 1929/06/2014 5530/06/2014 52

The next thing required was a column for the NO2 – average (x - )x . Each value in this column is squared to make a new column titled (x - )²x . These are shown below on the left. The (x - )²x is totalled to give Σ [(x - )²].x The totalled value is divided by the number of initial data values giving Σ [(x - )²]/n.x Then the standard deviation (σ) is calculated by finding the square root of the previous value. These values are all displayed below.

(x - x) (x - x)²-9.58833 91.93607-16.5883 275.172710.41167 108.402917.41167 303.16636.41167 41.10951

11.41167 130.226227.41167 751.399715.41167 237.5196-12.5883 158.4661-35.5883 1266.529-1.58833 2.522792-9.58833 91.9360710.41167 108.402933.41167 1116.3417.41167 303.166313.41167 179.872919.41167 376.8129-22.5883 510.2327-10.5883 112.11272.41167 5.816152-3.58833 12.8761140.41167 1633.103-1.28833 1.6597942.41167 5.816152-33.5883 1128.1768.41167 70.756194.41167 19.46283-3.58833 12.876112.41167 5.8161524.41167 19.46283

31.41167 986.69338.41167 1475.456-10.5883 112.11272.41167 5.816152-10.5883 112.11274.41167 19.46283

34.41167 1184.16310.41167 108.4029-14.5883 212.819411.41167 130.22620.41167 0.169472-1.58833 2.5227922.41167 5.8161520.41167 0.169472

17.41167 303.1663-18.5883 345.5260.41167 0.1694726.41167 41.10951-9.58833 91.9360710.41167 108.4029-5.58833 31.229438.41167 70.75619-26.5883 706.9393-5.58833 31.22943-28.5883 817.2926-24.5883 604.586-47.5883 2264.649-43.5883 1899.943-7.58833 57.58275-10.5883 112.1127

Σ [(x - x)²]20923.72183

Σ [(x - x)²]/n348.7286972

σ18.67427903

σ18.67427903

NO2 (x)5 day moving

average534673 64.280 68.469 77.274 78.290 72.278 63.850 61.227 53.861 52.853 6273 72.696 75.680 81.476 74.882 6640 6352 59.665 63.859 68.06

103 70.6661.3 63.4665 65.8629 58.6671 58.267 58.259 65.865 70.467 77.294 75.8

101 75.852 72.865 67.452 66.667 70.897 67.473 71.848 7174 63.863 62.261 65.265 66.463 62.680 6344 63.863 61.869 60.453 6373 64.657 5871 58.836 5157 47.234 3638 32.615 32.219 35.85552

C) With the given data, 5 day moving averages can be calculated. To do this, the first 5 values are averaged using the average formula on Excel. The calculated average value is placed in the middle of the 5 values in a new column, which the left hand table will show, it is the third row down in the value table. This is continued between values 1-5, 2-6, 3-7 etc until all moving averages possible are calculated. A graph can then be drawn to compare how the data varies compared to the original graph. This graph is shown on the next page.

D) In the first graph displaying the raw data it shows no real correlation or trend. The time of year does not seem to affect the NO2 levels in the air. The 5 day moving average graph is a lot less fluctuant. It is more stable and closer packed, showing the slight rise or fall within the 5 days.

Question 3

A) The data given is the stiffness modulus of old and new bitumen additives. It is measured in MPa.

Old additive New additive2400 25002297 23742430 22112754 29532104 27632211 19992765 21272378 22622534 24912224 27772295 30152180 22112491 21701948 24352162 25432230 25012110 21002156 23232054 23942222 2240

A) The first task is to determine whether or not there is a difference between the two medians. This is carried out using the Mann-Whitney U test. This can be carried out as there are two samples that are unmatched and have at least 5 measurements in each sample.

Looking at the data, hypotheses can be given:

H0 – the medians are the same. H1 – the medians are not the same.

The first thing required in the Mann-Whitney test is to sort each column of data in order of smallest to largest and labeling the first column (Old Additive) ‘A’ and the second column (New Additive) ‘B’. So for the data above: Old additive(A) New additive (B)

1948 19992054 21002104 21272110 21702156 22112162 22112180 22402211 22622222 23232224 23742230 23942295 24352297 24912378 25002400 25012430 25432491 27632534 27772754 29532765 3015

The next thing to find is the values for U for each data set. This can be done since each column A and B are set out in order from lowest to largest in value. First of all, to find UA the number of value items lower data set B than the considered item in set A. Making sure that if a number is the exact same in both data sets then it becomes 0.5. The total number of ‘lower than’, values are tallied, then they are accumulated to the end. The final total value of lower numbers in set B than A gives the value for UA.

The same is done in set B, except the number of items lower than the selected item in B are tallied then added to give the value for UB.

New additive (B) Tally lower in (A) Total1999 1 12100 2 32127 4 72170 6 132211 7.5 20.52211 7.5 282240 11 392262 11 502323 13 632374 13 762394 14 902435 16 1062491 16.5 122.52500 17 139.52501 17 156.52543 18 174.52763 19 193.52777 20 213.52953 20 233.53015 20 253.5

UB = 253.5

So UA = 146.5 and UB = 253.5. These two values can be tested using the equation:UA + UB = nA x nB

146.5+253.5 = 20 x 20400 = 400

The smaller of the two U values equals U therefore U = 146.5

This value of U can be compared to the critical U value which can be calculated using the following equation:

Old additive(A) Tally lower in (B) Total 1948 0 02054 1 12104 2 32110 2 52156 3 82162 3 112180 4 152211 5 202222 6 262224 6 322230 6 382295 8 462297 8 542378 10 642400 11 752430 11 862491 12.5 88.52534 15 113.52754 16 129.52765 17 146.5

UA = 146.5

U crit=nm2

−1. 96√ nm (n+m+1 )12

U crit=20×20

2−1. 96√20×20 (20+20+1 )

12

U crit=127 . 54

The lowest U value is 146.5 and the critical U value is 127.54. This means that Ucrit < U and the null hypothesis is rejected at the 5% level.

B) To investigate whether there is a difference in the means or not, a t test for the samples can be carried out. This involves the equation:

Where A is the mean of sample A, B is the mean of sample B and SE t is the standard error of the means calculated by:

Where nA and nB are the number of values in sample A and sample B respectively and σ c2

is the estimated common population variance which is calculated by:

Where ơA and ơB are the standard deviations of samples A and B respectively and nA and nB are the number of measurements in sample A and B respectively.

Using these equations will find a value of ‘t’ and this value can be used to compare with the critical value of t found from tables.

The first equation that has to be used is: This is to enable the ability to work out SEt

The first thing to do was to work out the mean of the sample. This will be then subtracted from each measurement in the sample. Each of these values are then squared. The next thing to add all the (x-mean)2 to get the sum of value. This sum of value is then divided by the number of measurements in the sample. The square root of this value is taken to give the standard deviation. The exact same method is carried out to find the standard deviation of set B.

Note, in this case I used the population equation assuming the values in the initial table are from a population and not a sample.

These standard deviations are then squared and subbed into the equation to find the common population variance. The two calculation tables for working out the standard deviation for data sets A and B are shown on the next page.

t= A−BSE t

SEt=√ σc2nA+ σc2

nB

σ c2=

(n A−1 )σ A2 +(nB−1 )σ B2

(n A−1 )+(nB−1 )

σ c2=

(n A−1 )σ A2 +(nB−1 )σ B2

(n A−1 )+(nB−1 )

New additive (B) (B - Bx ) (B - Bx)² Σ[(B - Bx)²] Σ[(B - Bx)²]/n σB2500 80.55 6488.3025 1501478.95 75073.9475 273.99625452374 -45.45 2065.70252211 -208.45 43451.40252953 533.55 284675.60252763 343.55 118026.60251999 -420.45 176778.20252127 -292.45 85527.00252262 -157.45 24790.50252491 71.55 5119.40252777 357.55 127842.00253015 595.55 354679.80252211 -208.45 43451.40252170 -249.45 62225.30252435 15.55 241.80252543 123.55 15264.60252501 81.55 6650.40252100 -319.45 102048.30252323 -96.45 9302.60252394 -25.45 647.70252240 -179.45 32202.3025

Old additive (A) (A - Ax ) (A - Ax)² Σ[(A - Ax)²] Σ[(A - Ax)²]/n σA2400 102.75 10557.56 884241.75 44212.09 210.26670562297 -0.25 0.062430 132.75 17622.562754 456.75 208620.562104 -193.25 37345.562211 -86.25 7439.062765 467.75 218790.062378 80.75 6520.562534 236.75 56050.562224 -73.25 5365.562295 -2.25 5.062180 -117.25 13747.562491 193.75 37539.061948 -349.25 121975.562162 -135.25 18292.562230 -67.25 4522.562110 -187.25 35062.562156 -141.25 19951.562054 -243.25 59170.562222 -75.25 5662.56

The standard deviation for A is 210.2667056 and the standard deviation for B is 273.9962545. These values can be subbed into the following equation and the common population variance can be calculated.

This value is then subbed into the equation below to calculate the standard error of the means.

Then, the value of SEt can be subbed into the equation below to calculate the value of t. Where the means of sets A and B are calculated as:

This value of t can be compared to the critical value of t which is calculated by:(nA + nB – 2)=20+20-2

=38The t value is a lot less than that of t critical.

(Ax) (Bx)2297.25 2419.45

σ c2=

(n A−1 )σ A2 +(nB−1 )σ B2

(n A−1 )+(nB−1 )

σ c2=

(20−1 )210 . 26670562+(20−1 )273 . 99625452

(20−1 )+(20−1 )

σ c2=59643 .0175

SEt=77 . 22889194

SEt=√59643 .017520

+59643. 017520

SEt=√ σc2nA+ σc2

nB

t=2297 .25−2419 . 4577 .22889194

t=−1 .582

Question 4

A) The first scenario considered was where one reading in the “New Additive” table read as zero. A random number from the results table was select and turned to zero. This is shown below.

The first task is to work out if the misreading made an impact to the statistical calculations in question 3. First of all the difference in medians was calculated. This was carried out using the Mann Whitney test. The results are shown below.

Old additive (A)

New additive (B)

2400 25002297 23742430 22112754 29532104 27632211 19992765 21272378 22622534 24912224 27772295 02180 22112491 21701948 24352162 25432230 25012110 21002156 23232054 23942222 2240

The U value for this section is 166.5 which is larger than the U value in question 3 which is 146.5 and even further out from the critical U value of 127.54.

The next thing to compare is whether or not the difference in means is affected when one of the New Additives is read as zero. The t-test is carried out to calculate the results. The results are shown below.

σA σA² (n-1)σA²210.2667056 44212.0875000 840029.6625

sc 2 SEt t 185756.69875 136.29259 0.20948

σB σB² (n-1)σB²572.1025345 327301.310 6218724.890

mean A mean B 2297.2500000 2268.7000000

In this case, the value of t becomes a lot smaller, and closer to zero.

Old additive (A)

Tally Lower in (B) Total New additive (B)

tally lower in (A) Total

1948 1 1 0 0 02054 2 3 1999 1 12104 3 6 2100 2 32110 3 9 2127 4 72156 4 13 2170 6 132162 4 17 2211 7.5 20.52180 5 22 2211 7.5 282211 6 28 2240 11 392222 7 35 2262 11 502224 7 42 2323 13 632230 7 49 2374 13 762295 9 58 2394 14 902297 9 67 2435 16 1062378 11 78 2491 16.5 122.52400 12 90 2500 17 139.52430 12 102 2501 17 156.52491 13.5 115.5 2543 18 174.52534 16 131.5 2763 19 193.52754 17 148.5 2777 20 213.52765 18 166.5 2953 20 233.5

B) The next scenario is where one high value is read in the New Additive column. This was selected at random and multiplied by 10 times its original value. This is new data sets are shown below.

Old additive (A)

New additive (B)

2400 25002297 23742430 22112754 29532104 27632211 199902765 21272378 22622534 24912224 27772295 30152180 22112491 21701948 24352162 25432230 25012110 21002156 23232054 23942222 2240

To compare how this high value affects the difference in median the Mann Whitney test is used again. The results are shown next.

Old additive (A) Tally lower in (B) Total New additive (B)

Tally lower in (A) Total

1948 0 0 2100 2 22054 0 0 2127 4 62104 1 1 2170 6 122110 1 2 2211 7.5 19.52156 2 4 2211 7.5 272162 2 6 2240 11 382180 3 9 2262 11 492211 4 13 2323 13 622222 5 18 2374 13 752224 5 23 2394 14 892230 5 28 2435 16 1052295 7 35 2491 16.5 121.52297 7 42 2500 17 138.52378 9 51 2501 17 155.52400 10 61 2543 18 173.52430 10 71 2763 19 192.52491 11.5 82.5 2777 20 212.52534 14 96.5 2953 20 232.52754 15 111.5 3015 20 252.52765 16 127.5 19990 20 272.5

The U value for this test is 127.5 which is a lot smaller than the original results from the original data sets, and the U value is identical to the 127.54 critical U value.

The next thing is to text how the increase in the value of the new additive table affects the difference in means. A t test was carried out and the answers are shown next.

Mean A Mean B σA σA² (n-1)σA²2297.250 3319.000 210.267 44212.088 840029.663

sc 2 SEt t 7368734.14375 858.41331 1.19028

σB σB² (n-1)σB²3833.178 14693256.200 279171867.800

This shows that t value calculated is larger than the t value calculated in part A, but smaller than the value calculated with the original data items, but not by much.

Bridge Number Span length (m) Rank1 Approx. cost (£million) Rank2 D D2

1 2500 8 380 6 2 42 7800 5 864 3 2 43 126 10 22 9 1 14 41580 2 230 7 -5 255 55000 1 1032 2 -1 16 36690 3 145 8 -5 257 36000 4 1550 1 3 98 5128 7 385 5 2 49 6611 6 820 4 2 410 295 9 3 10 -1 1

0 78TOTAL

Question 5Giving the list of values of the span length of the bridge and the cost required to build this bridge, a graph can be plotted and the correlation can be calculated and compared.

Span length (m) Approx. cost (£million)2500 3807800 864126 22

41580 23055000 103236690 14536000 15505128 3856611 820295 3

First of all, each bridge was numbered from 1-10 as it can be easily traced back when needed. The bridges are then ranked from Span length from highest to lowest giving the title of the next column (Rank1). Then the bridges were ranked in the Approx. cost. The ranks of each bridge depending on length and cost are shown below, 1 being the highest and 10 being the lowest.

Span length (m) Rank1 Approx. cost (£million) Rank22500 8 380 67800 5 864 3126 10 22 9

41580 2 230 755000 1 1032 236690 3 145 836000 4 1550 15128 7 385 56611 6 820 4295 9 3 10

Once these two columns are ranked, the difference in the ranks is then found. These will be added into a new column and will be titled “D” meaning difference. In Excel, an equation was used to calculate these, simply the first rank number minus the second rank number.

Each value in the “D” column will be squared creating a “D2” column. This will then be totalled and used in the equation for the correlation. The value for D2 is 78.

r s=1−6∑ D2

n (n2−1) This is the equation for the coefficient correlation where D2 = 78 and n = 10.

r s=1−6∑ 78

10 (102−1 )

r s=0. 4727

rs

0.472727

With this value of correlation coefficient, it shows there is a slight positive correlation but it is very weak. This is because the value is between 0 and 1. It is higher than zero so there is some sort of positive correlation. It is also quite far from 1 proving that there will be a weak trend.

To see exactly what kind of correlation this will show, a scatter diagram can be plotted of bridge span length against the cost of the bridge. This can easily be done on Excel by selecting the data and inserting it into a scatter diagram.

The graph is shown on the next page. As you can see, the graph starts off with a really strong positive correlation. This is what would be expected, because the longer the bridge, the higher the cost. This is somehow not the case when there seems to be outliers present in the graph. They show that the long length can also be very low in cost compared to the short lengths. There are also quite high costs for bridges with shorter span than low costing ones.

Overall the graph does show that the correlation is positive but very weak due to the outliers. They may be caused by the different materials used on each bridge, how they are made or who they are made by. These types of data were not included in the statistics.

Trend lineTo calculate the trend line’s equation:y=mx+c Where m = gradient C = y-intercept

Gradient = rise/run y-intercept = 350 = 900-350 /55000 = 550/55000 = 0.01

Therefore: equation = y=0.01 x+350

Question 6

A) Given the data below, a Chi-Square Test will be carried out to work out if there is a relationship between the age range of the respondent included in the survey and the answer they have given whether it was to add a 3rd runway at Heathrow, make a new airport at ‘Boris Island’, or add a 2nd runway at Gatwick.

The null hypothesis of the survey is that the age of the respondent does not affect the answer given by the respondent on how to increase London’s capacity of travel.

The first thing that needs to be calculated is the total of the number of people in each age group, and the accumulated number of answers in each option. This is shown below from an Excel table using the “SUM” equation to total each row or column. The totals are added together to give a final total which is the number of people that took part in the survey.

Now, to calculate the expected frequencies, we assume independence of the rows and columns. For example, to calculate the expected frequency for the 93 who responded with the answer “3 rd runway at Heathrow”, looking at the row total (243) and the column total (196). These two are multiplied and divided by the overall total (575). This gives the expected frequency as:

243×196575 = 82.8313

Next, the full expected frequency table is shown with all expected frequencies entered into the table.

Age range (years)

3rd

runway at Heathrow

(A)

‘Boris Island’ (B)

2nd runway at Gatwick

(C)

< 30 93 120 3030 – 45 50 57 42

> 45 53 76 54

Age range (years)

3rd

runway at Heathrow

(A)

‘Boris Island’ (B)

2nd runway at Gatwick

(C)Total

< 30 93 120 30 24330 – 45 50 57 42 149

> 45 53 76 54 183

Total 196 253 126 575

In the next step, a table is created with one column showing the observed figures (O) and the next column listing the corresponding Expected frequencies (E). The next column is a simple subtraction between O and E. This column is titles (O-E) and is used in Excel by the subtract equation for example “=(I2-J2)”. Each of the (O-E) values are squared and listed into another column titled “(O-E)²”. These values are all divided by the values for E in the previous columns of the table and listed in another column which is titled “(O-E)²/E”. This table and the listed columns are shown below.

O E (O-E) (O-E)² (O-E)²/E93 82.8313 10.1687 103.4024 1.248349

120 106.92 13.08 171.0864 1.60013530 53.2487 -23.2487 540.5018 10.1505250 50.78957 -0.78957 0.623413 0.01227457 65.56 -8.56 73.2736 1.11765742 32.65043 9.349565 87.41437 2.6772853 62.37913 -9.37913 87.96809 1.41021776 80.52 -4.52 20.4304 0.25373154 40.10087 13.89913 193.1858 4.817497

The SUM formula was used to calculate the sum of all the values in the last column which gives the value of the test statistic (χ2). This was done by using the SUM formula in Excel. The answer is shown below.

Total: x² 23.28766

The next thing that needs to be calculated is the number of degrees of freedom (df).

df = (number of rows – 1) x (number of columns – 1)df = (3 – 1) x (3 – 1)df = 2 x 2df = 4

The tabular 95% value of x2 with the degrees freedom of 4 is 9.49. The value of x2 the calculations obtained was 23.29, so the value obtained is not significant at the 5% level and therefore gives the conclusion that the age of the respondent does not affect the answer given in the question asked.

Age range (years)

3rd

runway at Heathrow

(A)

‘Boris Island’ (B)

2nd runway at Gatwick

(C)Total

< 30 82.8313 106.92 53.2486957 24330 – 45 50.7896 65.56 32.6504348 149

> 45 62.3791 80.52 40.1008696 183

Total 196 253 126 575