math portfolio 2 final

17
Math Portfolio 2 The following information was given to show the height (in centimeters) of men’s high jump at the Olympic games. Note that the Olympic games did not occur in 1940 and 1944. Winning Men’s High Jump Height at Olympic Games Year Height (cm) 1932 1936 1948 1952 1956 1960 1964 1968 1972 1976 1980 197 203 198 204 212 216 218 224 223 225 236 Height of Jump at the Olympic Games Window Settings of the Graph

Upload: ibrahim795

Post on 17-Oct-2014

1.325 views

Category:

Documents


12 download

TRANSCRIPT

Page 1: Math Portfolio 2 Final

Math Portfolio 2 The following information was given to show the height (in centimeters) of men’s high jump at the Olympic games. Note that the Olympic games did not occur in 1940 and 1944.

!

Winning Men’s High Jump Height at Olympic Games

Year

Height (cm)

1932 1936 1948 1952 1956 1960 1964 1968 1972 1976 1980

197 203 198 204 212 216 218 224 223 225 236

Height of Jump at the Olympic Games Window Settings of the Graph

Page 2: Math Portfolio 2 Final

! As is shown in the graphs above, the winning height of men’s high jump at the Olympic games is graphed. The x-axis is labelled as the year in which the Olympic games took place and follows the pattern of occurring every four years starting from 1932, with the exception of 1940 and 1944 as the Olympic games did not occur those years. The y-axis is labelled as the height (in centimeters) of the gold medalists’ high jump scores at the games.

Parameters The parameters of this data would be the year in which the Olympic games took place because it is what determines the system of the high jump score. In other words, it is the year which would determine the score of the high jump as it would show how well the gold medalist trained and what technology was used during that year to aid them in achieving that score. For example, if 1932 is person A’s first score, then they would get better and achieve higher scores as time progresses. So by 1948, Person A would have a higher score than his score in 1932 because this person has undergone more training and has better equipment. Also, the chart displays the year on the x-axis (i.e. the year being the first row on the chart) and, as is shown in the graph, the year is also placed on the x-axis. Moreover, if the data is put into an equation, there would be an x value which would determine the value of the y , which in this case is the height of the high jump. For example, if an equation such as, y = 3x2 + 7x + 8 is used to model the data, the y would be the height and the x would be the year in which the event took place. The year (the independent variable) would be the value that determines the height (the dependent variable) of the gold medalist jumper at that given year. The x value would be needed in order to find out what the height ( y ) is. Thus, the year, is the parameter of the above set of data that shows the heights of the high jump during the Olympic games.

Constraints The constraints of this task are that, when performing a regression analysis on the data, it would be difficult to find the exact equation that models the data perfectly as this data does have some outliers and does not completely follow a pattern. The data does not follow an absolute pattern such as the points in the equation y = x2 and so it would be difficult to determine an exact value (i.e. a future or a previous high jump score) based on a regression equation modeled by the data. Furthermore, the regression analysis would not take into account any outliers that could have resulted and would throw off estimates of future scores. Thus, the regression analysis would only give an estimate and would not necessarily be the actual value. Another constraint of this task would be that due to the Olympic games not having occurred in the years 1940 and 1944, it would reduce the accuracy of any interpretation/analysis and equation that is derived from the data due to the two points missing. This is because the absence of those two years make a gap in the data and pattern that it could have contained. Also, these two points are not the last or the

Page 3: Math Portfolio 2 Final

first points in this set of data, rather the points are located within the data itself (i.e. the points are not the first or the last).

Standardizing the Data

! In order to standardize the data, I will let the years from 1932 to 1980 be represented as reasonable x-values. In other words I will set x=0 at 1932, x=4 at 1936 and so on. The purpose of standardizing the data is so that the numbers of the data are not random and so that they follow a set pattern. Also, it simply puts the years at a common slate and makes them easier to work with. The table of values of the standardized gold medalist high jump heights is shown below. Note, that the standardized year in which the Olympic games were held is indicated by the shaded row. Also note that the standardized year does not follow the pattern of the year in which the Olympic Games were held i.e. 1932 is when x=1, 1936 is when x=2 and so on. It rather follows the pattern where each year, regardless of whether or not the Olympic Games were held that year or not is indicated.

The graphs of the standardized year would look as follows. There is not much change in the actual look of the graph. There is a slight change as the window settings have been altered to fit the data, however no major changes to shape of the graph itself. Please note that the first coordinate is a positive x-value. There have been technical problems which have limited the appearance of the first coordinate when the Xmin is set to 0, which is why the Xmin is set at -10.

Standardized Year of High Jump Window Settings of the Graph

Year

Standardized Year

Height (cm)

1932 1936 1948 1952 1956 1960 1964 1968 1972 1976 1980

0 4 16 20 24 28 32 36 40 44 48

197 203 198 204 212 216 218 224 223 225 236

Page 4: Math Portfolio 2 Final

Modeling the Data

The best type of function to model the behavior of this set of data would be the root function. The equation for the parent function of the root function is y = x . The graph of the root parent function is shown below.

This, does not look fully like the plot of data that was constructed. This is due to the fact that shifts, stretches/compressions and/or restrictions would need to be applied to the parent function. I chose to use this function because its shape is very similar to the data. It is difficult to see the resemblance between both the data points and the root function at such a zoomed in setting, however, when the graph is zoomed out a little, it becomes easier to see how both graphs have a similar shape. Below is a display of the graph of the Olympic high jump heights at a different zoom setting.

The reason as to why I chose the root function to model this equation is because of its domain and range (i.e. all of the values lying in the positive quadrant of the

Graph of y = x

Window Settings of the GraphHigh Jump Heights

Page 5: Math Portfolio 2 Final

cartesian plane). The root function curve has a similar domain and range. Its domain is

D : x ∈, x ≥ 0}{ and the range would be R : y∈,{ y ≥ 0} , which is very similar to how the graph is modeled (i.e. all the values being on the positive quadrant of the plane). Moreover, another reason as to why this graph was chosen is because of its shape. The shape is very similar to the pattern that the long jump height data follows in that the coordinates go up to a certain point and then increase. However, the key point is that the rate of increase is contextually extremely slow to insignificant. Similarly, the graph of the high jump height increases at a point, however as each year passes by, it increases at an extremely slow rate that the increase is hardly noticeable when looking at the data in a short time period. For example, if the root function is zoomed in (representing a short time period), the change would be difficult to see. The graph below illustrates this judgement.

! The zoom in shows how the data is not increasing by a large amount, however, there is a slow increase. This further emphasizes how the increase is so slow, that it is considered insignificant. In order to create the equation to model this data, the a , b , c and d values must be identified in the base function. The base function is y = af k(x − d)[ ]+ c , where a shows a vertical stretch or compression and a reflection on the x-axis, f is the parent function, k is a horizontal stretch or compression and a reflection on the y-axis, x is a variable, d is shift along the x-axis and c is a shift along the y-axis. Now all these values must be determined and inserted into the equation of y = a k(x − d) + c i.e. the root function. The c-value is 197. This is determined by the y-intercept of the graph of high jump heights being at 197. The d-value would be 0, as there is no need for a horizontal shift. The a-value would also have no change and, thus, would stay at a value of 1. The k-value, however would have a value of 31. This means that the graph of y = x would be stretched vertically by a factor of 31. Furthermore, I obtained the value of 31

Window Settings of the GraphZoomed In Function y = x

Page 6: Math Portfolio 2 Final

systematically by substituting values so that my y-value would be close to the last coordinate (48,236). My exact method was as follows: Substitute the x-value into the existing equation and solve for y , i.e. f (x) = 31• 48 +197 f (x) 236 . Thus, the

equation to model the set of data is f (x) = 31x +197 . This function is graphed below.

This makes more sense when put into the right context. Below is the graph of the function f (x) = 31x +197 with the high jump score coordinates.

As is evident in the above graph, the equation of y = 31x +197 does not clearly model the data and is, therefore, inaccurate. A number of differences have arisen between the data. Firstly, the most obvious one is that the graph of the root function only comes close to three points i.e. (0,197) (4,203) and (48,236). These point are close enough for the equation to be considered part of the shape of the root function, however, the curve is simply not in the right position to be considered a model of the set of data. Another difference in the two plots of data is that the shape of the root function does not precisely correspond with the shape of the high jump height data. Furthermore, the reason why this equation is flawed is because it is trying to model

Graph of the Function f (x) = 31x +197 Window Settings of the Graph

Window Settings of the GraphHigh Jumps Heights and Graph of f (x) = 31x +197

Page 7: Math Portfolio 2 Final

after every data point, including outliers. The outliers are throwing off the accuracy of the root equation and thus need to follow a pattern that is evident within the graph. The limitations of the model are that the function only represents two points on the actual set of data points and so it is limited to showing only those two points. Moreover, if one were to predict a given high jump score with the equation, the answer would be inaccurate and would thus throw the data off completely. The same would happen for estimating a future high jump score; the data would be inaccurate due to the curve only coming into contact with two of the points and not coming close to any others. In order to refine the modeling of the data points, the pattern within the data points of the high jump heights needs to be clearly identified. This is illustrated below.

The circled area represents a pattern. More specifically it shows the square root function. This would make it easier to model the equation as now the outliers are identified which will not cause any interference in terms of the accuracy in the modeling of the graph. After many attempts of trial and error of changing the original function to model this data i.e. y = 31x +197 . The final equation to model the gold

medal heights is y = (132x + 80) +155 . The equation when graphed alongside the gold medal long jump heights data is shown below.

Pattern in the High Jump Score Data

Window Settings of the GraphHigh Jump Heights and Graph of y = (132x + 80) +155

Page 8: Math Portfolio 2 Final

As is shown in the previous graph, the new, refined equation of y = (132x + 80) +155 is much better to model the data as it is closer to more points on the graph, making it a better equation of best fit. Moreover, it better fits the pattern that was noted previously, making it a better function to model the set of data than the previous equation. The d-value of 80 in the equation is actually (-80) because when added to the base function y = af k(x − d)[ ]+ c there are two negative signs which simplify into a positive. The reason why the d-value was included is because a negative shift was needed, which pushes the whole function right and into the upper left quadrant of the cartesian plane and thus making domain D : x ∈, x ≥ −80}{ .

Using the Linear Function to Model the Data

To model the data a linear function can also be used. This is due to the fact the curve of the linear function, like the root function, is always increasing. If the slope of the linear function is very low then the rate of increase would be similar to that of the root function i.e. a slow increase. Below is an illustration of how the linear and root functions are very much alike.

The judgement that the linear and root functions have similar slopes is clearly expressed in the above diagram of the functions y = 1

5x +1.5 and y = x . As is evident

from the diagrams, the linear equation and the root equation have similar slopes which means that they both increase very slowly. This shows why the linear function would be a great fit for modeling the data.

Window Settings of the GraphGraph of Linear and Root Functions

Page 9: Math Portfolio 2 Final

Finding the Equation of the Linear Function Using a GDC

The equation for the line of best fit of the high jump heights data will be obtained using a GDC. The equation will be in the form y = ax + b , where a is the slope of the curve and b is a vertical shift. The r and r2 values will be given, which are the correlation coefficients and the coefficient of determination respectively. These values are known as the linear regression for the set of data. The correlation coefficient is a number value between -1 and 1 that measures how closely two variables are related. If the data points on a graph are a perfect positive relationship i.e. the points are exactly straight, then there is an r value of 1 and, if vice versa, the r value is -1. The exact meaning of the correlation coefficient is that the closer the absolute value of the coefficient is to 1, the closer the variables are related and vice versa. The coefficient of determination is the square of the r value and is used to find the correlation between two variables that are not constantly increasing. For example, to find the correlation between points that form a parabola, the coefficient of determination would be used as the parabola is not always increasing i.e. increases up till a certain point and then decreases. On the other hand, when looking at data points that form an exponential curve, the coefficient of demand would be used as the variables are constantly increasing. The correlation coefficient will be used to measure the degree of the data’s correlation as the data is constantly increasing (although it is apparent that the high jump scores will not increase further, the data itself does not stop increasing). The a , b , r and r2 values are shown in the image below.

The r value or correlation coefficient, shows that the data points are closely related when it is analyzed by a linear regression (i.e. when a line of best fit is very close to the data points). As is evident, the equation for the linear model is y = 0.755x +194.138 . Notice how the slope of the curve is very low i.e. below 1, which was observed previously. The

Linear Regression of High Jump Heights

Page 10: Math Portfolio 2 Final

next page shows the equation when it is graphed alongside the data points and the root function equation y = (132x + 80) +155 .

A few differences should be noted between the two graphs used to model the data. The first is that the linear function is able to represent all the data points, whereas the root function is restricted to only the pattern that was identified previously. Another difference is that the domain and range of the linear function are D : x ∈{ } and R : y∈{ } respectively. Note, that the x and y values in the linear function are all number including the negatives, whereas, in the root function the y values are only in the positive plane. Also note that the root function has a starting point (0,163.944) (i.e.

y = 132(0)+ 80[ ] +155 163.944 ) whereas the linear function does not have a starting

point. Another difference is that the linear function has a steeper slope than the root function, which is evident by the linear curve ending ( i.e. at the point (48,236)) at a higher point than the square root curve. This shows that if the points on the data set continue, they will continue to increase at higher rate than the root function. The linear function, in this case would be less accurate if estimating future scores as the nature of the data is that it can only reach a certain point and cannot increase any higher. This is due to the fact that the long jumpers can only jump so high even if they undergo many extra years of training. The linear function, thus is accurate when modeling the data set in the context of the data only, and in the notion that the data will continue to rise. However, the root function is more contextually sound, as it understands that the even though the data follows a pattern, there is a limit to its pattern. Furthermore, the root function operates in a way that it increases very slowly after it reaches a certain point which can be expected in future scores. This is because the future scores will not constantly increase because this data comes from a human sport and the human is a finite being that can only go up to certain point. Moreover, the data will have a very low chance of increasing after it reaches its maximum. The notion that the slope of the root function decreases is illustrated by looking

Linear Regression Line, y = (132x + 80) +155 and High Jump Heights

Window Settings of the Graph

Page 11: Math Portfolio 2 Final

at its derivative at certain points along the curve. Using the GDC, derivative at various points along the curve will be found. See below.

Derivative When x = 4 . Equation is y = 2.677x +168.951

Derivative When x = 6 . Equation is y = 2.235x +171.119

Window Settings of the Derivative Graphs

Derivative When x = 48 . Equation is y = 0.824x +195.549

Derivative When x = 50 . Equation is y = 0.808x +196.355

Page 12: Math Portfolio 2 Final

As is shown above, it is clear that the slope of the root function increases, but at a decreasing rate. For example, the difference of the slope from when x is 4 and when x is 6 is ~0.442. This slope decreases as it moves along the curve horizontally which is shown by the following calculation. The difference of the slope when x is 48 and x is 50 is ~0.16.

Estimating the Scores for 1940 and 1944

To estimate the gold medalist high jump scores if the Olympic games occurred in the years 1940 and 1944, the standardized values for these two years will be substituted into the linear regression equation. The equation is as follows, y = 0.7550655542x +194.1382598 . Note that the values have not been rounded - this is because it will make the data more accurate as rounding the data restricts accuracy. Now, the x-value of 8 (1940) will be substituted into the equation. y = (0.7550655542• 8)+194.1382598 200.179 . Therefore, the coordinate for the score in 1940 is (8,200.179). The same steps will be done to find the coordinate for the score in 1944. y = (0.7550655542•12)+194.1382598 203.199 . Therefore, the coordinate for the score in 1944 is (12,203.199). Below is a graph of the 1940 and 1944 scores with arrows indicating their coordinates.

The reason as to why I chose to use the linear regression model to find the coordinates is because the linear model is more accurate than the root function model. This is because the root model is human made and is likely to contain errors, and tends to be less precise than the linear function which is found by means of a technological device. The technological device (GDC) is designed in a way that it contains fewer errors than human made calculations and is also more accurate. A really obvious reason as to why the data is sound is because the values are not outliers and they are very close to the real values. Another reason why the linear function was used is because it is a model of all of the data, whereas the root function is just limited to a pattern within the data set. The

Estimates of the 1940 and 1944 Scores

Page 13: Math Portfolio 2 Final

image on the next page shows how data scores for 1940 and 1944 would be inaccurate if determined through the root function.

Another reason as to why the linear regression function was used to estimate the values is because when the 1940 and 1944 values are plotted alongside the linear regression equation (i.e. the one obtained without the presence of the new coordinates) it is part of the line of best fit. This can be more clearly visualized below.

As is shown in the image above, both of the points lie on the linear regression curve and are, therefore accurate when shown alongside the other, real values.

Estimating Scores for 1984 and 2016

To estimate the 1984 and 2016 scores for the gold medal high jump at the Olympic games, the standardized values of the years will be substituted into the equation,

Graphs of Estimated High Jump Heights and Linear Regression

High Jump Heights and Graph of y = (132x + 80) +155

Page 14: Math Portfolio 2 Final

y = (132x + 80) +155 . The reason for inserting the values into the root equation and not

the linear function equation is because of the fact that, as was stated previously, the linear function is less accurate when extrapolating because it is under the assumption that the data will constantly increase. Contrarily, the data will stop at a certain point in a long term period because the results are extracted from humans which are finite beings. On the other hand, the root function is more contextually based and is therefore, more suitable for predicting the scores. The method for finding the value for 1984 is as follows: 52 is the standardized value for the year 1984. Now, substitute 52 into the equation y = (132x + 80) +155 like this,

y = (132• 52)+ 80[ ] +155 = (6864 + 80) +155 = 6944 +155 = 4 434 +155 238.331.

Thus, the coordinate for the 1984 winning high jump score would be (52,238.331). Below is the graph of the point alongside the original high jump scores data and the root model equation.

The data here is increasing, however by a short increment as the slope of the function reflects its very slow rate of increase. Now, the graph for the estimated scores in 2016 will be calculated. The standardized value for the year 2016 is 84 (i.e. 2016 - 1932 = 84). 84 will be substituted into the equation y = (132x + 80) +155 and isolated for y .

y = (132• 84)+ 80[ ] +155 = (11088 + 80) +155 = 11168 +155 = 4 698 +155 260.679 .

Thus, the coordinate for the height during 2016 is (84,260.679). On the next page is its graph alongside the extrapolated 1984 coordinate and the genuine high jump scores.

Graph of High Jump Data, 1984 Value and Root Function Window Settings of the Graph

Page 15: Math Portfolio 2 Final

The answers for both values seem to be following the trend set by the root function which is a slow, gradual increase. Also, note that the values have increased by a higher amount than in 1984. This is perhaps due to increased training time and/or technological advances which have aided in high jump scores.

Additional Data

The following data was given to show additional gold medalist high jump scores.

Year

Standardized

Year

Height (cm)

1896 1904 1908 1912 1920 1928 1984 1988 1992 1996 2000 2004 2008

-36 -28 -24 -20 -12 -4 52 56 60 64 68 72 76

190 180 191 193 193 194 235 238 234 239 235 236 236

This data will be graphed below alongside the root and linear model functions.

Window Settings of the GraphGraph of High Jump Data, Extrapolated Values and Root Function

Window Settings of the GraphAdditional Points with Linear and Root Models

Page 16: Math Portfolio 2 Final

The root function does not model the data well as its slope is too steep for the data and the root function keeps on going higher at a rate that is faster than the rate of increase of the data itself. The linear function is a little bit better than the root function because it comes closer to more of the points, however, it does model the newer data well. The linear function, is therefore, only good for interpolating values that were given (i.e. the original scores from 1932 to 1980). This data’s graph in the negative area starts off high and then goes dips down and then increases again. This pattern also occurs at the points after 1980 however it occurs more frequently. This dip and increase pattern occurs throughout the data but after each few series of this pattern, it slowly increases. In other words, this dip and increase pattern happens throughout the data but increases also. The modification that need to be made to the root function is that a vertical compression and a horizontal stretch needs to be applied. The reason for the vertical compression is obvious, as it will reduce the slope to make it fit the data better. The horizontal stretch is to prolong the slope at one area. For example, if, before the horizontal stretch, the slope between x = 4 and x = 5 is 1, then the stretch would increase the length of the slope. So then, for example, the slope from x = 4 to x = 8 would be 1.

Page 17: Math Portfolio 2 Final

Math SL Portfolio 2Gold Medal Heights

Ibrahim AsadullahMHF4UAWednesday, February 8, 2012