pertemuan 1 statistika ekonomi dan bisnis [compatibility mode](1)

60
Statistika Ekonomi dan Bisnis Agus Salim

Upload: ui

Post on 21-Feb-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Statistika Ekonomi dan Bisnis

Agus Salim

Pertemuan Pertama

• Pendahuluan dan Distribusi Frekuensi– Arti dan Kegunaan Statistika– Macam-macam Data: Data Kuantitaif dan Data

Kualitatif– Pengertian tentang Populasi dan Sampel

• Ukuran-ukuran Sentral dan Persebaran– Nilai sentral secara ringkas– Deviasi Standar Data– Koefisien Variasi Data– Perhitungan Kuartil dan Persentil

Sebelum Memulai• Pilih Ketua Kelas• Buat email kelas• Metode kuliah yang akan dipakai : SCL• Pembagian kelompok• Tugas kelompok• Har Kuliah sebelum UTS

(3, 10, 17 Sept + 24, 1, 8, 15 Okt)

Arti dan Kegunaan Statistka

• Apa Statistika itu?– Statistics is the science of collecting, organizing,

presenting, analyzing, and interpreting numerical datato assist in making more effective decisions.

• Apa Kegunaan Statistika?– Statistical techniques are used extensively by

marketing, accounting, quality control, consumers,professional sports people, hospital administrators,educators, politicians, physicians, etc..

• A. Qualitative or Attribute Data (variable) - thecharacteristic being studied is nonnumeric.EXAMPLES: Gender, religious affiliation, type ofautomobile owned, state of birth, eye color areexamples.

• B. Quantitative Data (variable) - information is reportednumerically.EXAMPLES: balance in your checking account, minutesremaining in class, or number of children in a family.

Macam-macam Data:Data Kualitatif dan Data Kuantitaif

Summary of Types of Data

Type of Data

Pengertian tentangPopulasi dan Sampel

A population is a collection of all possible individuals, objects, ormeasurements of interest.

A sample is a portion, or part, of the population of interest

• The central tendency is the middle ortypical values of a distribution.

• Central tendency can be assessed using adot plot, histogram or more precisely withnumerical statistics.

Central TendencyCentral Tendency

Statistic Formula Excel Formula Pro Con

Mean =AVERAGE(Data)Familiar anduses all thesampleinformation.

Influenced byextremevalues.1

1 n

ii

xn

Central TendencyCentral Tendency

• Six Measures of Central Tendency

Median

Middlevalue insortedarray

=MEDIAN(Data)Robust whenextreme datavalues exist.

Ignoresextremes andcan beaffected bygaps in datavalues.

Statistic Formula Excel Formula Pro Con

Mode

Mostfrequentlyoccurringdata value

=MODE(Data)

Useful forattributedata ordiscrete datawith a smallrange.

May not beunique,and is nothelpful forcontinuousdata.

Central TendencyCentral Tendency• Six Measures of Central Tendency

Midrange =0.5*(MIN(Data)+MAX(Data))

Easy tounderstandandcalculate.

Influencedby extremevalues andignoresmost datavalues.

min max

2

x x

Statistic Formula Excel Formula Pro Con

Geometricmean (G) =GEOMEAN(Data)

Useful forgrowthrates andmitigateshighextremes.

Lessfamiliarandrequirespositivedata.

Trimmedmean

Same as themean exceptomit highestand lowestk% of datavalues (e.g.,5%)

=TRMEAN(Data, %)

Mitigateseffects ofextremevalues.

Excludessome datavaluesthat couldberelevant.

Central TendencyCentral Tendency

• Six Measures of Central Tendency

1 2 ...nnx x x

• A familiar measure of central tendency.

• In Excel, use function =AVERAGE(Data)where Data is an array of data values.

Population Formula Sample Formula

1

N

ii

x

N

1

n

ii

xx

n

Central TendencyCentral Tendency

• Mean

• For the sample of n = 37 car brands:

1 87 93 98 ... 159 164 173 4639125.38

37 37

n

ii

xx

n

Central TendencyCentral Tendency

Mean

Brand Defects Per 100Lexus 87Cadillac 93Jaguar 98Honda 99Buick 100Mercury 100Hundai 102Infiniti 104Toyota 104Mercedes-Benz 106Audi 109BMW 109Oldsmobile 110Volvo 113Acura 117Chevrolet 119Chrysler 120Dodge 121Lincoln 121Pontiac 122Suburu 123GMC 127Ford 130Mitsubishi 130Saab 133Jeep 136Mini 142Land Rover 148Saturn 149Suzuki 149Kia 153Nissan 154Mazda 157Scion 158Porsche 159Volkswagen 164Hummer 173

• Arithmetic mean is the most familiar average.• Affected by every sample item.

• The balancing point or fulcrum for the data.

Central TendencyCentral Tendency

• Characteristics of the Mean

• Regardless of the shape of the distribution, absolutedistances from the mean to the data points always sumto zero.

1

( ) 0n

ii

x x

Central TendencyCentral Tendency

• Characteristics of the Mean

• Consider the followingasymmetric distribution of quizscores whose mean = 65.

1

( )n

ii

x x

= (42 – 65) + (60 – 65) + (70 – 65) + (75 – 65) + (78 – 65)= (-23) + (-5) + (5) + (10) + (13) = -28 + 28 = 0

• The median (M) is the 50th percentile or midpointof the sorted sample data.

• M separates the upper and lower half of thesorted observations.

• If n is odd, the median is the middle observationin the data array.

• If n is even, the median is the average of themiddle two observations in the data array.

Central TendencyCentral Tendency

• Median

Central TendencyCentral Tendency

• Median

•For n = 8, the median is between the fourth and fifthobservations in the data array.•For n = 9, the median is the fifth observation in the data array.

• Consider the following n = 6 data values:11 12 15 17 21 32

• What is the median?

M = (x3+x4)/2 = (15+17)/2 = 16

11 12 15 16 17 21 32

For even n, Median = / 2 ( / 2 1)

2n nx x

n/2 = 6/2 = 3 and n/2+1 = 6/2 + 1 = 4

Central TendencyCentral Tendency

• Median

• Consider the following n = 7 data values:12 23 23 25 27 34 41

• What is the median?

M = x4 = 25

12 23 23 25 27 34 41

For odd n, Median = ( 1) / 2nx

(n+1)/2 = (7+1)/2 = 8/2 = 4

Central TendencyCentral Tendency

• Median

• Use Excel’s function =MEDIAN(Data) whereData is an array of data values.

• For the 37 vehicle quality ratings (odd n) theposition of the median is(n+1)/2 = (37+1)/2 = 19.

• So, the median is x19 = 121.

• When there are several duplicate data values,the median does not provide a clean “50-50”split in the data.

Central TendencyCentral Tendency

• Median

• The median is insensitive to extreme data values.• For example, consider the following quiz scores for 3

students:

Tom’s scores:20, 40, 70, 75, 80 Mean =57, Median = 70, Total = 285

Jake’s scores:60, 65, 70, 90, 95 Mean = 76, Median = 70, Total = 380

Mary’s scores:50, 65, 70, 75, 90 Mean = 70, Median = 70, Total = 350

• What does the median for each student tell you?

Central TendencyCentral Tendency

• Characteristics of the Median

• The most frequently occurring data value.• Similar to mean and median if data values

occur often near the center of sorted data.• May have multiple modes or no mode.

Central TendencyCentral Tendency

• Mode

Lee’s scores:60, 70, 70, 70, 80 Mean =70, Median = 70, Mode = 70

Pat’s scores:45, 45, 70, 90, 100 Mean = 70, Median = 70, Mode = 45

Sam’s scores:50, 60, 70, 80, 90 Mean = 70, Median = 70, Mode = none

Xiao’s scores:50, 50, 70, 90, 90 Mean = 70, Median = 70, Modes = 50,90

Central TendencyCentral Tendency

• Mode• For example, consider the following quiz scores for 3

students:

• What does the mode for each student tell you?

• Easy to define, not easy to calculate in largesamples.

• Use Excel’s function =MODE(Array)- will return #N/A if there is no mode.- will return first mode found if multimodal.

• May be far from the middle of the distributionand not at all typical.

Central TendencyCentral Tendency

• Mode

• Generally isn’t useful for continuous data sincedata values rarely repeat.

• Best for attribute data or a discrete variable witha small range (e.g., Likert scale).

Central TendencyCentral Tendency

• Mode

• Consider the following P/E ratios for a random sample of68 Standard & Poor’s 500 stocks.

• What is the mode?

Central TendencyCentral Tendency

• Example: Price/Earnings Ratios and Mode

7 8 8 10 10 10 10 12 13 13 13 13 13 13 13 14 1414 15 15 15 15 15 16 16 16 17 18 18 18 18 19 19 19

19 19 20 20 20 21 21 21 22 22 23 23 23 24 25 26 2626 26 27 29 29 30 31 34 36 37 40 41 45 48 55 68 91

• Excel’s descriptive statisticsresults are:

• The mode 13 occurs 7times, but what doesthe dot plot show?

Mean 22.7206Median 19Mode 13Range 84Minimum 7Maximum 91Sum 1545Count 68

Central TendencyCentral Tendency

• Example: Price/Earnings Ratios and Mode

• The dot plot shows local modes (a peak withvalleys on either side) at 10, 13, 15, 19, 23, 26, 29.

• These multiple modes suggest that the mode is not astable measure of central tendency.

Central TendencyCentral Tendency

Example: Price/Earnings Ratios and Mode

• Points scored by the winning NCAA football team tendsto have modes in multiples of 7 because eachtouchdown yields 7 points.

Central TendencyCentral Tendency

• Example: Rose Bowl Winners’ Points

• Consider the dot plot of the points scored by the winningteam in the first 87 Rose Bowl games.

• What is the mode?

• A bimodal distribution refers to the shape of thehistogram rather than the mode of the raw data.

• Occurs when dissimilar populations are combined in onesample. For example,

Central TendencyCentral Tendency

• Mode

• Compare mean and median or look athistogram to determine degree ofskewness.

Central TendencyCentral Tendency

• Skewness

Distribution’sShape

Histogram Appearance Statistics

Skewed left(negativeskewness)

Long tail of histogram points left(a few low values but most data onright)

Mean < Median

Central TendencyCentral Tendency

• Symptoms of Skewness

Symmetric Tails of histogram are balanced(low/high values offset) Mean Median

Skewed right(positiveskewness)

Long tail of histogram points right(most data on left but a few highvalues)

Mean > Median

• For the sample of J.D. Power quality ratings, themean (125.38) exceeds the median (121). Whatdoes this suggest?

Central TendencyCentral Tendency

• Skewness

• The geometric mean (G) is amultiplicative average.

• For the J. D. Power quality data (n=37):

1 2 ...nnG x x x

37 7737 (87)(93)(98)...(164)(173) 2.37667 10 123.38G

• In Excel use =GEOMEAN(Array)

• The geometric mean tends to mitigate theeffects of high outliers.

Central TendencyCentral Tendency

• Geometric Mean

• A variation on the geometric mean used to findthe average growth rate for a time series.

• For example, from1998 to 2002, SpiritAirlines revenuesare:

1

1nnx

Gx

Year

Revenue(mil)

1998 1311999 2272000 3112001 3542002 403

Central TendencyCentral Tendency

• Growth Rates

sc

• The average growth rate is given by taking the geometricmean of the ratios of each year’s revenue to thepreceding year.

• Due to cancellations, only the first and last years arerelevant:

= 1.2421 = .242 or 24.2% per year

• In Excel use =(403/131)^(1/5)-1

Central TendencyCentral Tendency

• Growth Rates

227G

311

131

227

354 311

403

354

55403

1 1131

• The midrange is the point halfway between the lowestand highest values of X.

• Easy to use but sensitive to extreme data values.min max

2

x xMidrange =

• For the J. D. Power quality data (n=37):

min max

2

x xMidrange = 1 37 87 173

1302 2

x x =

• Here, the midrange (130) is higher than the mean(125.38) or median (121).

Central TendencyCentral Tendency

• Midrange

• To calculate the trimmed mean, first remove the highestand lowest k percent of the observations.

• For example, for the n = 68 P/E ratios, we want a 5percent trimmed mean (i.e., k = .05).

• To determine how many observations to trim, multiply k xn = 0.05 x 68 = 3.4 or 3 observations.

• So, we would remove the three smallest and threelargest observations before averaging the remainingvalues.

Central TendencyCentral Tendency

• Trimmed Mean

• Here is a summary of all the measures of centraltendency for the n = 68 P/E values.

• The trimmed mean mitigates the effects of very highvalues, but still exceeds the median.

Mean: 22.72 =AVERAGE(PERatio)Median: 19.00 =MEDIAN(PERatio)

Mode: 13.00 =MODE(PERatio)Geometric Mean: 19.85 =GEOMEAN(PERatio)

Midrange: 49.00 =(MIN(PERatio)+MAX(PERatio))/25% Trim Mean: 21.10 =TRIMMEAN(PERatio,0.1)

Central TendencyCentral Tendency

• Trimmed Mean

Central Tendency

• Trimmed Mean

• The FederalReserve uses a16% trimmedmean to mitigatethe effects ofextremes in itsanalysis of theConsumer PriceIndex.

• Variation is the “spread” of data points about thecenter of the distribution in a sample. Considerthe following measures of dispersion:

Statistic Formula Excel Pro Con

Range xmax – xmin=MAX(Data)-

MIN(Data) Easy to calculateSensitive toextreme datavalues.

DispersionDispersion

Variance(s2) =VAR(Data)

Plays a key rolein mathematicalstatistics.

Non-intuitivemeaning.

21

1

n

ii

x x

n

• Measures of Variation

Statistic Formula Excel Pro Con

Standarddeviation(s)

=STDEV(Data)

Most commonmeasure. Usessame units as theraw data ($ , £, ¥,etc.).

Non-intuitivemeaning. 2

1

1

n

ii

x x

n

DispersionDispersion

• Measures of Variation

Coef-ficient. ofvariation(CV)

None

Measures relativevariation inpercent so cancompare datasets.

Requiresnon-negativedata.

100s

x

Statistic Formula Excel Pro ConMeanabsolutedeviation(MAD)

=AVEDEV(Data) Easy tounderstand.

Lacks “nice”theoreticalproperties.

DispersionDispersion

• Measures of Variation

1

n

ii

x x

n

• The difference between the largest and smallestobservation.Range = xmax – xmin

• For example, for the n = 68 P/E ratios,Range = 91 – 7 = 84

DispersionDispersion

• Range

• The population variance (σ2) isdefined as the sum of squareddeviations around the mean µdivided by the population size.

• For the sample variance (s2), wedivide by n – 1 instead of n,otherwise s2 would tend tounderestimate the unknownpopulation variance σ2.

22 1

N

ii

x

N

22 1

1

n

ii

x xs

n

DispersionDispersion

• Variance

• The square root of the variance.

• Units of measure are the same as X.

Populationstandarddeviation

21

N

ii

x

N

Samplestandarddeviation

21

1

n

ii

x xs

n

• Explains how individual values in a data set varyfrom the mean.

DispersionDispersion

• Standard Deviation

• Excel’s built in functions are

Statistic Excel populationformula

Excel sampleformula

Variance =VARP(Array) =VAR(Array)

Standard deviation =STDEVP(Array) =STDEV(Array)

DispersionDispersion

• Standard Deviation

• Consider the following five quiz scores forStephanie.

DispersionDispersion

• Calculating a Standard Deviation

• Now, calculate the sample standard deviation:

21 2380

595 24.391 5 1

n

ii

x xs

n

• Somewhat easier, the two-sum formula can also beused:

2

212

2 1

(360)28300 28300 259205 595 24.39

1 5 1 5 1

n

ini

ii

x

xn

sn

DispersionDispersion

• Calculating a Standard Deviation

• The standard deviation is nonnegative becausedeviations around the mean are squared.

• When every observation is exactly equal to themean, the standard deviation is zero.

• Standard deviations can be large or small,depending on the units of measure.

• Compare standard deviations only for data setsmeasured in the same units and only if themeans do not differ substantially.

DispersionDispersion

• Calculating a Standard Deviation

• Useful for comparing variables measured indifferent units or with different means.

• A unit-free measure of dispersion

• Expressed as a percent of the mean.

• Only appropriate for nonnegative data. It isundefined if the mean is zero or negative.

100s

CVx

DispersionDispersion

• Coefficient of Variation

• For example:Defect rates(n = 37)

s = 22.89= 125.38 gives CV = 100 × (22.89)/(125.38) = 18%

ATM deposits(n = 100)

s = 280.80= 233.89 gives CV = 100 × (280.80)/(233.89) = 120%

P/E ratios(n = 68)

s = 14.28= 22.72 gives CV = 100 × (14.08)/(22.72) = 62%

x

x

x

100s

CVx

DispersionDispersion

• Coefficient of Variation

• The Mean Absolute Deviation (MAD) reveals theaverage distance from an individual data point tothe mean (center of the distribution).

• Uses absolute values of the deviations aroundthe mean.

• Excel’s function is =AVEDEV(Array)

1

n

ii

x xMAD

n

DispersionDispersion

• Mean Absolute Deviation

• Consider the histograms of hole diameters drilledin a steel plate during manufacturing.

• The desired distribution is outlined in red.

DispersionDispersion

Machine A Machine B

• Central Tendency vs. Dispersion:Manufacturing

Desired mean (5mm)but too much variation.

Acceptable variation butmean is less than 5 mm.

• Take frequent samples to monitor quality.

Machine A Machine B

DispersionDispersion

• Central Tendency vs. Dispersion:Manufacturing

• Consider student ratings of four professors oneight teaching attributes (10-point scale).

DispersionDispersion

• Central Tendency vs. Dispersion:Job Performance

• Jones and Wu have identical means but differentstandard deviations.

DispersionDispersion

• Central Tendency vs. Dispersion:Job Performance

• Smith and Gopal have different means but identicalstandard deviations.

DispersionDispersion

• Central Tendency vs. Dispersion:Job Performance

• A high mean (better rating) and low standarddeviation (more consistency) is preferred. Whichprofessor do you think is best?

DispersionDispersion

• Central Tendency vs. Dispersion:Job Performance

Selamat Belajar