統計學 fall 2003
DESCRIPTION
統計學 Fall 2003. 授課教師:統計系余清祥 日期:2003年9月16日 第一週:什麼是統計?. 什麼是統計?. 統計學是研究定義問題、運用資料蒐集、整理、陳示、分析與推論等科學方法, 在不確定( Uncertainty) 情況下, 做出合理決策的科學。. Chapter 1 Data and Statistics. Applications in Business and Economics Data Data Sources Descriptive Statistics - PowerPoint PPT PresentationTRANSCRIPT
1 1 Slide Slide
統計學 Fall 2003
授課教師:統計系余清祥 日期: 2003 年 9 月 16 日 第一週:什麼是統計?
2 2 Slide Slide
什麼是統計 ?
統計學是研究定義問題、運用資料蒐集、整理、陳示、分析與推論等科學方法 , 在不確定 (Uncertainty) 情況下 ,
做出合理決策的科學。
3 3 Slide Slide
4 4 Slide Slide
5 5 Slide Slide
Chapter 1Chapter 1 Data and Statistics Data and Statistics
Applications in Business and EconomicsApplications in Business and Economics DataData Data SourcesData Sources Descriptive StatisticsDescriptive Statistics Statistical InferenceStatistical Inference
6 6 Slide Slide
Applications in Applications in Business and EconomicsBusiness and Economics
AccountingAccounting
Public accounting firms use statistical sampling Public accounting firms use statistical sampling procedures when conducting audits for their procedures when conducting audits for their clients.clients.
FinanceFinance
Financial advisors use a variety of statistical Financial advisors use a variety of statistical information, including price-earnings ratios and information, including price-earnings ratios and dividend yields, to guide their investment dividend yields, to guide their investment recommendations.recommendations.
MarketingMarketing
Electronic point-of-sale scanners at retail checkout Electronic point-of-sale scanners at retail checkout counters are being used to collect data for a counters are being used to collect data for a variety of marketing research applications.variety of marketing research applications.
7 7 Slide Slide
ProductionProduction
A variety of statistical quality control charts A variety of statistical quality control charts are used to monitor the output of a production are used to monitor the output of a production process.process.
EconomicsEconomics
Economists use statistical information in Economists use statistical information in making forecasts about the future of the making forecasts about the future of the economy or some aspect of it.economy or some aspect of it.
Applications in Applications in Business and EconomicsBusiness and Economics
8 8 Slide Slide
DataData
Elements, Variables, and ObservationsElements, Variables, and Observations Scales of MeasurementScales of Measurement Qualitative and Quantitative DataQualitative and Quantitative Data Cross-Sectional and Time Series DataCross-Sectional and Time Series Data
9 9 Slide Slide
Data and Data SetsData and Data Sets
DataData are the facts and figures that are are the facts and figures that are collected, summarized, analyzed, and collected, summarized, analyzed, and interpreted.interpreted.
The data collected in a particular study are The data collected in a particular study are referred to as the referred to as the data setdata set..
10 10 Slide Slide
Elements, Variables, and ObservationsElements, Variables, and Observations
The The elementselements are the entities on which data are the entities on which data are collected.are collected.
A A variablevariable is a characteristic of interest for the is a characteristic of interest for the elements.elements.
The set of measurements collected for a The set of measurements collected for a particular element is called an particular element is called an observationobservation..
The total number of data values in a data set The total number of data values in a data set is the number of elements multiplied by the is the number of elements multiplied by the number of variables.number of variables.
11 11 Slide Slide
Data, Data Sets, Data, Data Sets, Elements, Variables, and ObservationsElements, Variables, and Observations
ElementElementss
VariableVariabless
Data SetData Set DatumDatum
ObservatioObservationn
StockStock Annual Earn/ Annual Earn/
CompanyCompany Exchange Sales($M) Sh. Exchange Sales($M) Sh.($)($)
DataramDataram AMEXAMEX 73.1073.10 0.86 0.86
EnergySouthEnergySouth OTC OTC 74.0074.00 1.67 1.67
KeystoneKeystone NYSE NYSE 365.70 365.70 0.86 0.86
LandCareLandCare NYSE NYSE 111.40 111.40 0.330.33
PsychemedicsPsychemedics AMEXAMEX 17.6017.60 0.13 0.13
12 12 Slide Slide
Scales of MeasurementScales of Measurement
Scales of measurementScales of measurement include: include:• Nominal( 名義 ) data are merely labels or assigned
numbers• Ordinal( 順序 ) data can be arranged in order such as
worst to best or best to worst• Interval data can be arranged in order and the
difference between numbers has meaning• Ratio data differ from interval data in that there is a
definite zero point The scale determines the amount of The scale determines the amount of
information contained in the data.information contained in the data. The scale indicates the data summarization The scale indicates the data summarization
and statistical analyses that are most and statistical analyses that are most appropriate.appropriate.
13 13 Slide Slide
Types of DataTypes of Data
DiscreteDiscrete Discrete or continuousDiscrete or continuous
NominalNominal OrdinalOrdinal IntervalInterval RatioRatioLevels of Levels of
MeasurementMeasurement
Numerical dataNumerical data
QualitativeQualitative QuantitativeQuantitativeData TypesData Types
14 14 Slide Slide
Scales of MeasurementScales of Measurement
NominalNominal
• Data are Data are labels or nameslabels or names used to identify an used to identify an attribute of the element.attribute of the element.
• A A nonnumeric labelnonnumeric label or a or a numeric codenumeric code may may be used.be used.
15 15 Slide Slide
Scales of MeasurementScales of Measurement
NominalNominal
• Example:Example:
Students of a university are classified by Students of a university are classified by the school in which they are enrolled the school in which they are enrolled using a nonnumeric label such as using a nonnumeric label such as Business, Humanities, Education, and so Business, Humanities, Education, and so on.on.
Alternatively, a numeric code could be Alternatively, a numeric code could be used for the school variable (e.g. 1 used for the school variable (e.g. 1 denotes Business, 2 denotes Humanities, denotes Business, 2 denotes Humanities, 3 denotes Education, and so on).3 denotes Education, and so on).
16 16 Slide Slide
Scales of MeasurementScales of Measurement
OrdinalOrdinal
• The data have the properties of nominal The data have the properties of nominal data and the data and the order or rank of the data is order or rank of the data is meaningfulmeaningful..
• A A nonnumeric labelnonnumeric label or a or a numeric codenumeric code may may be used.be used.
17 17 Slide Slide
Scales of MeasurementScales of Measurement
OrdinalOrdinal
• Example:Example:
Students of a university are classified by Students of a university are classified by their class standing using a nonnumeric their class standing using a nonnumeric label such as Freshman, Sophomore, label such as Freshman, Sophomore, Junior, or Senior.Junior, or Senior.
Alternatively, a numeric code could be Alternatively, a numeric code could be used for the class standing variable (e.g. used for the class standing variable (e.g. 1 denotes Freshman, 2 denotes 1 denotes Freshman, 2 denotes Sophomore, and so on).Sophomore, and so on).
18 18 Slide Slide
Scales of MeasurementScales of Measurement
IntervalInterval
• The data have the properties of ordinal data The data have the properties of ordinal data and the interval between observations is and the interval between observations is expressed in terms of a fixed unit of expressed in terms of a fixed unit of measure.measure.
• Interval data are Interval data are always numericalways numeric..
19 19 Slide Slide
Scales of MeasurementScales of Measurement
IntervalInterval
• Example:Example:
Melissa has an SAT score of 1205, while Melissa has an SAT score of 1205, while Kevin has an SAT score of 1090. Melissa Kevin has an SAT score of 1090. Melissa scored 115 points more than Kevin.scored 115 points more than Kevin.
20 20 Slide Slide
Scales of MeasurementScales of Measurement
RatioRatio
• The data have all the properties of interval The data have all the properties of interval data and the ratio of two values is data and the ratio of two values is meaningful.meaningful.
• Variables such as distance, height, weight, Variables such as distance, height, weight, and time use the ratio scale.and time use the ratio scale.
• This This scale must contain a zero valuescale must contain a zero value that that indicates that nothing exists for the variable indicates that nothing exists for the variable at the zero point.at the zero point.
21 21 Slide Slide
Scales of MeasurementScales of Measurement
RatioRatio
• Example:Example:
Melissa’s college record shows 36 credit Melissa’s college record shows 36 credit hours earned, while Kevin’s record shows hours earned, while Kevin’s record shows 72 credit hours earned. Kevin has twice 72 credit hours earned. Kevin has twice as many credit hours earned as Melissa.as many credit hours earned as Melissa.
22 22 Slide Slide
Qualitative and Quantitative DataQualitative and Quantitative Data
Data can be further classified as being Data can be further classified as being qualitative or quantitative.qualitative or quantitative.
The statistical analysis that is appropriate The statistical analysis that is appropriate depends on whether the data for the variable depends on whether the data for the variable are qualitative or quantitative.are qualitative or quantitative.
In general, there are more alternatives for In general, there are more alternatives for statistical analysis when the data are statistical analysis when the data are quantitative.quantitative.
23 23 Slide Slide
Qualitative DataQualitative Data
Qualitative dataQualitative data are labels or names used to are labels or names used to identify an attribute of each element.identify an attribute of each element.
Qualitative data use either the nominal or Qualitative data use either the nominal or ordinal scale of measurement.ordinal scale of measurement.
Qualitative data can be either numeric or Qualitative data can be either numeric or nonnumericnonnumeric..
The statistical analysis for qualitative data are The statistical analysis for qualitative data are rather limited.rather limited.
24 24 Slide Slide
Quantitative DataQuantitative Data
Quantitative dataQuantitative data indicate either how many or indicate either how many or how much.how much.
• Quantitative data that measure how many Quantitative data that measure how many are are discretediscrete..
• Quantitative data that measure how much Quantitative data that measure how much are are continuouscontinuous because there is no because there is no separation between the possible values for separation between the possible values for the data..the data..
Quantitative data are always Quantitative data are always numericnumeric.. Ordinary arithmetic operations are meaningful Ordinary arithmetic operations are meaningful
only with quantitative data.only with quantitative data.
25 25 Slide Slide
Cross-Sectional and Time Series DataCross-Sectional and Time Series Data
Cross-sectional dataCross-sectional data are collected at the same are collected at the same or approximately the same point in time.or approximately the same point in time.
• Example: data detailing the number of Example: data detailing the number of building permits issued in June 2000 in each building permits issued in June 2000 in each of the counties of Texasof the counties of Texas
Time series dataTime series data are collected over several are collected over several time periods.time periods.
• Example: data detailing the number of Example: data detailing the number of building permits issued in Travis County, building permits issued in Travis County, Texas in each of the last 36 monthsTexas in each of the last 36 months
26 26 Slide Slide
Data SourcesData Sources
Existing SourcesExisting Sources
• Data needed for a particular application Data needed for a particular application might already exist might already exist within a firmwithin a firm. Detailed . Detailed information is often kept on customers, information is often kept on customers, suppliers, and employees for example.suppliers, and employees for example.
• Substantial amounts of business and Substantial amounts of business and economic data are available from economic data are available from organizations that specialize in collecting organizations that specialize in collecting and maintaining dataand maintaining data..
27 27 Slide Slide
Data SourcesData Sources
Existing SourcesExisting Sources
• Government agenciesGovernment agencies are another are another important source of data , and the data important source of data , and the data types include census (types include census ( 普查普查 ) ) and survey (and survey ( 抽抽樣樣 ) ) data. data.
• Data are also available from a variety of Data are also available from a variety of industry associationsindustry associations and and special-interest special-interest organizationsorganizations..
28 28 Slide Slide
Data SourcesData Sources
InternetInternet
• The The InternetInternet has become an important has become an important source of data.source of data.
• Most government agencies, like the Bureau Most government agencies, like the Bureau of the Census (www.census.gov), make their of the Census (www.census.gov), make their data available through a web site.data available through a web site.
• More and more companies are creating web More and more companies are creating web sites and providing public access to them.sites and providing public access to them.
• A number of companies now specialize in A number of companies now specialize in making information available over the making information available over the Internet.Internet.
29 29 Slide Slide
Statistical StudiesStatistical Studies• Statistical studies can be classified as either Statistical studies can be classified as either
experimental or observational.experimental or observational.• In In experimental studiesexperimental studies the variables of the variables of
interest are first identified. Then one or more interest are first identified. Then one or more factors are controlled so that data can be factors are controlled so that data can be obtained about how the factors influence the obtained about how the factors influence the variables.variables.
• In In observationalobservational (nonexperimental) (nonexperimental) studiesstudies no no attempt is made to control or influence the attempt is made to control or influence the variables of interest.variables of interest.• A A surveysurvey is perhaps the most common type is perhaps the most common type
of observational study.of observational study.
Data SourcesData Sources
30 30 Slide Slide
Data Acquisition ConsiderationsData Acquisition Considerations
Time RequirementTime Requirement• Searching for information can be time Searching for information can be time
consuming.consuming.• Information might no longer be useful by the Information might no longer be useful by the
time it is available.time it is available. Cost of AcquisitionCost of Acquisition
• OrganizationsOrganizations often charge for information even often charge for information even when it is not their primary business activity.when it is not their primary business activity.
Data ErrorsData Errors• Using any data that happens to be available or Using any data that happens to be available or
that were acquired with little care can lead to that were acquired with little care can lead to poor and misleading information.poor and misleading information.
31 31 Slide Slide
Descriptive StatisticsDescriptive Statistics
Descriptive statisticsDescriptive statistics are the tabular, are the tabular, graphical, and numerical methods used to graphical, and numerical methods used to summarizesummarize data. data.
32 32 Slide Slide
91 78 93 57 75 52 99 80 97 6271 69 72 89 66 75 79 75 72 76104 74 62 68 97 105 77 65 80 10985 97 88 68 83 68 71 69 67 7462 82 98 101 79 105 79 69 62 73
91 78 93 57 75 52 99 80 97 6271 69 72 89 66 75 79 75 72 76104 74 62 68 97 105 77 65 80 10985 97 88 68 83 68 71 69 67 7462 82 98 101 79 105 79 69 62 73
Example: Hudson Auto RepairExample: Hudson Auto Repair
The manager of Hudson Auto would like The manager of Hudson Auto would like to haveto have
a better understanding of the cost of parts used a better understanding of the cost of parts used in thein the
engine tune-ups performed in the shop. She engine tune-ups performed in the shop. She examinesexamines
50 customer invoices for tune-ups. The costs of 50 customer invoices for tune-ups. The costs of parts,parts,
rounded to the nearest dollar, are listed below.rounded to the nearest dollar, are listed below.
33 33 Slide Slide
Example: Hudson Auto RepairExample: Hudson Auto Repair
Tabular Summary (Frequencies and Percent Tabular Summary (Frequencies and Percent Frequencies)Frequencies)
PartsParts Percent Percent Cost ($)Cost ($) FrequencyFrequency
FrequencyFrequency
50-5950-59 2 2 4 4
60-6960-69 1313 2626
70-7970-79 1616 3232
80-8980-89 7 7 1414
90-9990-99 7 7 1414
100-109100-109 5 5 1010
Total 50Total 50 100 100
34 34 Slide Slide
Example: Hudson Auto RepairExample: Hudson Auto Repair
Graphical Summary (Histogram)Graphical Summary (Histogram)
PartsCost ($)PartsCost ($)
22446688
10101212141416161818
Fre
qu
en
cy
Fre
qu
en
cy
50 60 70 80 90 100 11050 60 70 80 90 100 110
35 35 Slide Slide
Example: Hudson Auto RepairExample: Hudson Auto Repair
Numerical Descriptive StatisticsNumerical Descriptive Statistics• The most common numerical descriptive The most common numerical descriptive
statistic is the statistic is the averageaverage (or (or meanmean). ). • Hudson’s average cost of parts, based on Hudson’s average cost of parts, based on
the 50 tune-ups studied, is $79 (found by the 50 tune-ups studied, is $79 (found by summing the 50 cost values and then summing the 50 cost values and then dividing by 50).dividing by 50).
36 36 Slide Slide
Statistical InferenceStatistical Inference
Statistical inferenceStatistical inference is the process of using is the process of using data obtained from a small group of elements data obtained from a small group of elements (the (the samplesample) to make estimates and test ) to make estimates and test hypotheses about the characteristics of a hypotheses about the characteristics of a larger group of elements (the larger group of elements (the populationpopulation).).
37 37 Slide Slide
Example: Hudson Auto RepairExample: Hudson Auto Repair
Process of Statistical InferenceProcess of Statistical Inference
1. 1. Population Population consists of allconsists of all
tune-ups. Averagetune-ups. Averagecost of parts iscost of parts is
unknownunknown.
2. 2. A sample of 50A sample of 50engine tune-ups engine tune-ups
is examined.is examined.
3. 3. The sample data The sample data provide a sampleprovide a sampleaverage cost ofaverage cost of
$79 per tune-up.$79 per tune-up.
4. 4. The value of the The value of the sample average is usedsample average is usedto make an estimate ofto make an estimate of the population average.the population average.
38 38 Slide Slide
Population (all votes cast)Population (all votes cast)
Population Verses a SamplePopulation Verses a Sample
Sample (selected votes Sample (selected votes for observation)for observation)
39 39 Slide Slide
Basic DefinitionsBasic Definitions
Descriptive Statistics Descriptive Statistics (( 敘述性統計量敘述性統計量 ): ): the the
collection and description of datacollection and description of data Inferential Statistics(Inferential Statistics( 推論性統計量推論性統計量 ): ):
analyzing, decision making or estimation based analyzing, decision making or estimation based on the dataon the data
Population(Population( 母體母體 ): ): the set of all possible the set of all possible measurements that is of interestmeasurements that is of interest
Sample(Sample( 樣本樣本 ): ): the portion of the population the portion of the population from which information is gatheredfrom which information is gathered
40 40 Slide Slide
End of Chapter 1End of Chapter 1