pertemuan 1 statistika ekonomi dan bisnis [compatibility mode](1)
TRANSCRIPT
Pertemuan Pertama
• Pendahuluan dan Distribusi Frekuensi– Arti dan Kegunaan Statistika– Macam-macam Data: Data Kuantitaif dan Data
Kualitatif– Pengertian tentang Populasi dan Sampel
• Ukuran-ukuran Sentral dan Persebaran– Nilai sentral secara ringkas– Deviasi Standar Data– Koefisien Variasi Data– Perhitungan Kuartil dan Persentil
Sebelum Memulai• Pilih Ketua Kelas• Buat email kelas• Metode kuliah yang akan dipakai : SCL• Pembagian kelompok• Tugas kelompok• Har Kuliah sebelum UTS
(3, 10, 17 Sept + 24, 1, 8, 15 Okt)
Arti dan Kegunaan Statistka
• Apa Statistika itu?– Statistics is the science of collecting, organizing,
presenting, analyzing, and interpreting numerical datato assist in making more effective decisions.
• Apa Kegunaan Statistika?– Statistical techniques are used extensively by
marketing, accounting, quality control, consumers,professional sports people, hospital administrators,educators, politicians, physicians, etc..
• A. Qualitative or Attribute Data (variable) - thecharacteristic being studied is nonnumeric.EXAMPLES: Gender, religious affiliation, type ofautomobile owned, state of birth, eye color areexamples.
• B. Quantitative Data (variable) - information is reportednumerically.EXAMPLES: balance in your checking account, minutesremaining in class, or number of children in a family.
Macam-macam Data:Data Kualitatif dan Data Kuantitaif
Pengertian tentangPopulasi dan Sampel
A population is a collection of all possible individuals, objects, ormeasurements of interest.
A sample is a portion, or part, of the population of interest
• The central tendency is the middle ortypical values of a distribution.
• Central tendency can be assessed using adot plot, histogram or more precisely withnumerical statistics.
Central TendencyCentral Tendency
Statistic Formula Excel Formula Pro Con
Mean =AVERAGE(Data)Familiar anduses all thesampleinformation.
Influenced byextremevalues.1
1 n
ii
xn
Central TendencyCentral Tendency
• Six Measures of Central Tendency
Median
Middlevalue insortedarray
=MEDIAN(Data)Robust whenextreme datavalues exist.
Ignoresextremes andcan beaffected bygaps in datavalues.
Statistic Formula Excel Formula Pro Con
Mode
Mostfrequentlyoccurringdata value
=MODE(Data)
Useful forattributedata ordiscrete datawith a smallrange.
May not beunique,and is nothelpful forcontinuousdata.
Central TendencyCentral Tendency• Six Measures of Central Tendency
Midrange =0.5*(MIN(Data)+MAX(Data))
Easy tounderstandandcalculate.
Influencedby extremevalues andignoresmost datavalues.
min max
2
x x
Statistic Formula Excel Formula Pro Con
Geometricmean (G) =GEOMEAN(Data)
Useful forgrowthrates andmitigateshighextremes.
Lessfamiliarandrequirespositivedata.
Trimmedmean
Same as themean exceptomit highestand lowestk% of datavalues (e.g.,5%)
=TRMEAN(Data, %)
Mitigateseffects ofextremevalues.
Excludessome datavaluesthat couldberelevant.
Central TendencyCentral Tendency
• Six Measures of Central Tendency
1 2 ...nnx x x
• A familiar measure of central tendency.
• In Excel, use function =AVERAGE(Data)where Data is an array of data values.
Population Formula Sample Formula
1
N
ii
x
N
1
n
ii
xx
n
Central TendencyCentral Tendency
• Mean
• For the sample of n = 37 car brands:
1 87 93 98 ... 159 164 173 4639125.38
37 37
n
ii
xx
n
Central TendencyCentral Tendency
Mean
Brand Defects Per 100Lexus 87Cadillac 93Jaguar 98Honda 99Buick 100Mercury 100Hundai 102Infiniti 104Toyota 104Mercedes-Benz 106Audi 109BMW 109Oldsmobile 110Volvo 113Acura 117Chevrolet 119Chrysler 120Dodge 121Lincoln 121Pontiac 122Suburu 123GMC 127Ford 130Mitsubishi 130Saab 133Jeep 136Mini 142Land Rover 148Saturn 149Suzuki 149Kia 153Nissan 154Mazda 157Scion 158Porsche 159Volkswagen 164Hummer 173
• Arithmetic mean is the most familiar average.• Affected by every sample item.
• The balancing point or fulcrum for the data.
Central TendencyCentral Tendency
• Characteristics of the Mean
• Regardless of the shape of the distribution, absolutedistances from the mean to the data points always sumto zero.
1
( ) 0n
ii
x x
Central TendencyCentral Tendency
• Characteristics of the Mean
• Consider the followingasymmetric distribution of quizscores whose mean = 65.
1
( )n
ii
x x
= (42 – 65) + (60 – 65) + (70 – 65) + (75 – 65) + (78 – 65)= (-23) + (-5) + (5) + (10) + (13) = -28 + 28 = 0
• The median (M) is the 50th percentile or midpointof the sorted sample data.
• M separates the upper and lower half of thesorted observations.
• If n is odd, the median is the middle observationin the data array.
• If n is even, the median is the average of themiddle two observations in the data array.
Central TendencyCentral Tendency
• Median
Central TendencyCentral Tendency
• Median
•For n = 8, the median is between the fourth and fifthobservations in the data array.•For n = 9, the median is the fifth observation in the data array.
• Consider the following n = 6 data values:11 12 15 17 21 32
• What is the median?
M = (x3+x4)/2 = (15+17)/2 = 16
11 12 15 16 17 21 32
For even n, Median = / 2 ( / 2 1)
2n nx x
n/2 = 6/2 = 3 and n/2+1 = 6/2 + 1 = 4
Central TendencyCentral Tendency
• Median
• Consider the following n = 7 data values:12 23 23 25 27 34 41
• What is the median?
M = x4 = 25
12 23 23 25 27 34 41
For odd n, Median = ( 1) / 2nx
(n+1)/2 = (7+1)/2 = 8/2 = 4
Central TendencyCentral Tendency
• Median
• Use Excel’s function =MEDIAN(Data) whereData is an array of data values.
• For the 37 vehicle quality ratings (odd n) theposition of the median is(n+1)/2 = (37+1)/2 = 19.
• So, the median is x19 = 121.
• When there are several duplicate data values,the median does not provide a clean “50-50”split in the data.
Central TendencyCentral Tendency
• Median
• The median is insensitive to extreme data values.• For example, consider the following quiz scores for 3
students:
Tom’s scores:20, 40, 70, 75, 80 Mean =57, Median = 70, Total = 285
Jake’s scores:60, 65, 70, 90, 95 Mean = 76, Median = 70, Total = 380
Mary’s scores:50, 65, 70, 75, 90 Mean = 70, Median = 70, Total = 350
• What does the median for each student tell you?
Central TendencyCentral Tendency
• Characteristics of the Median
• The most frequently occurring data value.• Similar to mean and median if data values
occur often near the center of sorted data.• May have multiple modes or no mode.
Central TendencyCentral Tendency
• Mode
Lee’s scores:60, 70, 70, 70, 80 Mean =70, Median = 70, Mode = 70
Pat’s scores:45, 45, 70, 90, 100 Mean = 70, Median = 70, Mode = 45
Sam’s scores:50, 60, 70, 80, 90 Mean = 70, Median = 70, Mode = none
Xiao’s scores:50, 50, 70, 90, 90 Mean = 70, Median = 70, Modes = 50,90
Central TendencyCentral Tendency
• Mode• For example, consider the following quiz scores for 3
students:
• What does the mode for each student tell you?
• Easy to define, not easy to calculate in largesamples.
• Use Excel’s function =MODE(Array)- will return #N/A if there is no mode.- will return first mode found if multimodal.
• May be far from the middle of the distributionand not at all typical.
Central TendencyCentral Tendency
• Mode
• Generally isn’t useful for continuous data sincedata values rarely repeat.
• Best for attribute data or a discrete variable witha small range (e.g., Likert scale).
Central TendencyCentral Tendency
• Mode
• Consider the following P/E ratios for a random sample of68 Standard & Poor’s 500 stocks.
• What is the mode?
Central TendencyCentral Tendency
• Example: Price/Earnings Ratios and Mode
7 8 8 10 10 10 10 12 13 13 13 13 13 13 13 14 1414 15 15 15 15 15 16 16 16 17 18 18 18 18 19 19 19
19 19 20 20 20 21 21 21 22 22 23 23 23 24 25 26 2626 26 27 29 29 30 31 34 36 37 40 41 45 48 55 68 91
• Excel’s descriptive statisticsresults are:
• The mode 13 occurs 7times, but what doesthe dot plot show?
Mean 22.7206Median 19Mode 13Range 84Minimum 7Maximum 91Sum 1545Count 68
Central TendencyCentral Tendency
• Example: Price/Earnings Ratios and Mode
• The dot plot shows local modes (a peak withvalleys on either side) at 10, 13, 15, 19, 23, 26, 29.
• These multiple modes suggest that the mode is not astable measure of central tendency.
Central TendencyCentral Tendency
Example: Price/Earnings Ratios and Mode
• Points scored by the winning NCAA football team tendsto have modes in multiples of 7 because eachtouchdown yields 7 points.
Central TendencyCentral Tendency
• Example: Rose Bowl Winners’ Points
• Consider the dot plot of the points scored by the winningteam in the first 87 Rose Bowl games.
• What is the mode?
• A bimodal distribution refers to the shape of thehistogram rather than the mode of the raw data.
• Occurs when dissimilar populations are combined in onesample. For example,
Central TendencyCentral Tendency
• Mode
• Compare mean and median or look athistogram to determine degree ofskewness.
Central TendencyCentral Tendency
• Skewness
Distribution’sShape
Histogram Appearance Statistics
Skewed left(negativeskewness)
Long tail of histogram points left(a few low values but most data onright)
Mean < Median
Central TendencyCentral Tendency
• Symptoms of Skewness
Symmetric Tails of histogram are balanced(low/high values offset) Mean Median
Skewed right(positiveskewness)
Long tail of histogram points right(most data on left but a few highvalues)
Mean > Median
• For the sample of J.D. Power quality ratings, themean (125.38) exceeds the median (121). Whatdoes this suggest?
Central TendencyCentral Tendency
• Skewness
• The geometric mean (G) is amultiplicative average.
• For the J. D. Power quality data (n=37):
1 2 ...nnG x x x
37 7737 (87)(93)(98)...(164)(173) 2.37667 10 123.38G
• In Excel use =GEOMEAN(Array)
• The geometric mean tends to mitigate theeffects of high outliers.
Central TendencyCentral Tendency
• Geometric Mean
• A variation on the geometric mean used to findthe average growth rate for a time series.
• For example, from1998 to 2002, SpiritAirlines revenuesare:
1
1nnx
Gx
Year
Revenue(mil)
1998 1311999 2272000 3112001 3542002 403
Central TendencyCentral Tendency
• Growth Rates
sc
• The average growth rate is given by taking the geometricmean of the ratios of each year’s revenue to thepreceding year.
• Due to cancellations, only the first and last years arerelevant:
= 1.2421 = .242 or 24.2% per year
• In Excel use =(403/131)^(1/5)-1
Central TendencyCentral Tendency
• Growth Rates
227G
311
131
227
354 311
403
354
55403
1 1131
• The midrange is the point halfway between the lowestand highest values of X.
• Easy to use but sensitive to extreme data values.min max
2
x xMidrange =
• For the J. D. Power quality data (n=37):
min max
2
x xMidrange = 1 37 87 173
1302 2
x x =
• Here, the midrange (130) is higher than the mean(125.38) or median (121).
Central TendencyCentral Tendency
• Midrange
• To calculate the trimmed mean, first remove the highestand lowest k percent of the observations.
• For example, for the n = 68 P/E ratios, we want a 5percent trimmed mean (i.e., k = .05).
• To determine how many observations to trim, multiply k xn = 0.05 x 68 = 3.4 or 3 observations.
• So, we would remove the three smallest and threelargest observations before averaging the remainingvalues.
Central TendencyCentral Tendency
• Trimmed Mean
• Here is a summary of all the measures of centraltendency for the n = 68 P/E values.
• The trimmed mean mitigates the effects of very highvalues, but still exceeds the median.
Mean: 22.72 =AVERAGE(PERatio)Median: 19.00 =MEDIAN(PERatio)
Mode: 13.00 =MODE(PERatio)Geometric Mean: 19.85 =GEOMEAN(PERatio)
Midrange: 49.00 =(MIN(PERatio)+MAX(PERatio))/25% Trim Mean: 21.10 =TRIMMEAN(PERatio,0.1)
Central TendencyCentral Tendency
• Trimmed Mean
Central Tendency
• Trimmed Mean
• The FederalReserve uses a16% trimmedmean to mitigatethe effects ofextremes in itsanalysis of theConsumer PriceIndex.
• Variation is the “spread” of data points about thecenter of the distribution in a sample. Considerthe following measures of dispersion:
Statistic Formula Excel Pro Con
Range xmax – xmin=MAX(Data)-
MIN(Data) Easy to calculateSensitive toextreme datavalues.
DispersionDispersion
Variance(s2) =VAR(Data)
Plays a key rolein mathematicalstatistics.
Non-intuitivemeaning.
21
1
n
ii
x x
n
• Measures of Variation
Statistic Formula Excel Pro Con
Standarddeviation(s)
=STDEV(Data)
Most commonmeasure. Usessame units as theraw data ($ , £, ¥,etc.).
Non-intuitivemeaning. 2
1
1
n
ii
x x
n
DispersionDispersion
• Measures of Variation
Coef-ficient. ofvariation(CV)
None
Measures relativevariation inpercent so cancompare datasets.
Requiresnon-negativedata.
100s
x
Statistic Formula Excel Pro ConMeanabsolutedeviation(MAD)
=AVEDEV(Data) Easy tounderstand.
Lacks “nice”theoreticalproperties.
DispersionDispersion
• Measures of Variation
1
n
ii
x x
n
• The difference between the largest and smallestobservation.Range = xmax – xmin
• For example, for the n = 68 P/E ratios,Range = 91 – 7 = 84
DispersionDispersion
• Range
• The population variance (σ2) isdefined as the sum of squareddeviations around the mean µdivided by the population size.
• For the sample variance (s2), wedivide by n – 1 instead of n,otherwise s2 would tend tounderestimate the unknownpopulation variance σ2.
22 1
N
ii
x
N
22 1
1
n
ii
x xs
n
DispersionDispersion
• Variance
• The square root of the variance.
• Units of measure are the same as X.
Populationstandarddeviation
21
N
ii
x
N
Samplestandarddeviation
21
1
n
ii
x xs
n
• Explains how individual values in a data set varyfrom the mean.
DispersionDispersion
• Standard Deviation
• Excel’s built in functions are
Statistic Excel populationformula
Excel sampleformula
Variance =VARP(Array) =VAR(Array)
Standard deviation =STDEVP(Array) =STDEV(Array)
DispersionDispersion
• Standard Deviation
• Consider the following five quiz scores forStephanie.
DispersionDispersion
• Calculating a Standard Deviation
• Now, calculate the sample standard deviation:
21 2380
595 24.391 5 1
n
ii
x xs
n
• Somewhat easier, the two-sum formula can also beused:
2
212
2 1
(360)28300 28300 259205 595 24.39
1 5 1 5 1
n
ini
ii
x
xn
sn
DispersionDispersion
• Calculating a Standard Deviation
• The standard deviation is nonnegative becausedeviations around the mean are squared.
• When every observation is exactly equal to themean, the standard deviation is zero.
• Standard deviations can be large or small,depending on the units of measure.
• Compare standard deviations only for data setsmeasured in the same units and only if themeans do not differ substantially.
DispersionDispersion
• Calculating a Standard Deviation
• Useful for comparing variables measured indifferent units or with different means.
• A unit-free measure of dispersion
• Expressed as a percent of the mean.
• Only appropriate for nonnegative data. It isundefined if the mean is zero or negative.
100s
CVx
DispersionDispersion
• Coefficient of Variation
• For example:Defect rates(n = 37)
s = 22.89= 125.38 gives CV = 100 × (22.89)/(125.38) = 18%
ATM deposits(n = 100)
s = 280.80= 233.89 gives CV = 100 × (280.80)/(233.89) = 120%
P/E ratios(n = 68)
s = 14.28= 22.72 gives CV = 100 × (14.08)/(22.72) = 62%
x
x
x
100s
CVx
DispersionDispersion
• Coefficient of Variation
• The Mean Absolute Deviation (MAD) reveals theaverage distance from an individual data point tothe mean (center of the distribution).
• Uses absolute values of the deviations aroundthe mean.
• Excel’s function is =AVEDEV(Array)
1
n
ii
x xMAD
n
DispersionDispersion
• Mean Absolute Deviation
• Consider the histograms of hole diameters drilledin a steel plate during manufacturing.
• The desired distribution is outlined in red.
DispersionDispersion
Machine A Machine B
• Central Tendency vs. Dispersion:Manufacturing
Desired mean (5mm)but too much variation.
Acceptable variation butmean is less than 5 mm.
• Take frequent samples to monitor quality.
Machine A Machine B
DispersionDispersion
• Central Tendency vs. Dispersion:Manufacturing
• Consider student ratings of four professors oneight teaching attributes (10-point scale).
DispersionDispersion
• Central Tendency vs. Dispersion:Job Performance
• Jones and Wu have identical means but differentstandard deviations.
DispersionDispersion
• Central Tendency vs. Dispersion:Job Performance
• Smith and Gopal have different means but identicalstandard deviations.
DispersionDispersion
• Central Tendency vs. Dispersion:Job Performance
• A high mean (better rating) and low standarddeviation (more consistency) is preferred. Whichprofessor do you think is best?
DispersionDispersion
• Central Tendency vs. Dispersion:Job Performance