week 1 introduction to statistics

7
Week 1 – Introduction to Statistics Course Learning Outcomes Covered 1. Understand operational definitions as they pertain to the study of statistics; 2. Distinguish between sample and population data; Class Agenda This week we will cover: Course Introduction Introduction to Statistics o Defining Statistics o Sampling Theory o Key Terms o Types of Data o Levels of Measurement Introduction to Statistics What is Statistics?

Upload: others

Post on 24-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Week 1 Introduction to Statistics

Week 1 – Introduction to Statistics

Course Learning Outcomes Covered

1. Understand operational definitions as they pertain to the study of statistics;

2. Distinguish between sample and population data;

Class Agenda This week we will cover: Course Introduction

Introduction to Statistics

o Defining Statistics o Sampling Theory o Key Terms o Types of Data o Levels of Measurement

Introduction to Statistics

What is Statistics?

Page 2: Week 1 Introduction to Statistics

Types of Statistics Descriptive Statistics: The methods that are used to summarize the important characteristics of an available set of data. Descriptive statistics deals with methods of organizing, summarizing, and presenting data in a convenient and informative way. One form of descriptive statistics uses graphical techniques to present data. Another form of descriptive statistics uses numerical techniques (mean, median, standard deviation, …) to summarize data. Inferential Statistics: The methods that use sample data to make inferences (or generalizations) about a population. Example: A professor of statistics, calculated the mean age of a sample of 60 students in his statistic class STAT 1123. Using this information, he concluded that the mean age of all students at the Lakeshore Campus was 19.5 years old.

General Overview

Retrieved from https://www.sigmamagic.com/blogs/online-sample-size-calculators/

Page 3: Week 1 Introduction to Statistics

Sampling Theory In an ideal world, we would just gather information about the entire population of interest and have perfect information. However, due to time and budget constraints (and probably other factors as well), this is virtually impossible. Statistical theory shows that highly accurate results can be obtained from proper sampling techniques. The most important feature of a good sample is that it should have the same characteristics as the population it is being used to represent. This is what we call a "representative sample". To achieve this goal, various methods of random sampling can be employed: Simple Random Sampling: A method of sampling in which each sample of size n has an equally likely chance of being drawn as any other sample of size n. Stratified Sampling: A method of sampling in which the population is divided into groups called strata – this is often based upon some characteristic(s) of the population – and then a proportionate sample is randomly drawn from each stratum. Cluster Sampling: A method of sampling in which the population is divided into clusters – again, this is often based upon some characteristic(s) of the population – and then some of the clusters are randomly selected. All members of the randomly selected clusters are sampled. Systematic Sampling: A method of sampling in which a random starting point is chosen and every nth member of the population is sampled. For improved randomness, the population can be randomly ordered before beginning. Note: Truly random sampling is performed with replacement, but this is not often followed in practice. There is also one non-random method of sampling that is often performed in practice. Convenience Sampling: A method of sampling in which data is collected from readily available sources/participants.

Page 4: Week 1 Introduction to Statistics

When sampling from a population, it is extremely rare for the sample statistics to perfectly match the true population parameter. The difference between the two is known as sampling error. Sampling error is a result of the randomness of the sample. Non-sampling error would be an error caused by something other than the natural sampling process. For example, lack of responses or faulty data collection equipment. Sampling bias is sometimes grouped into non-sampling error. However, it is more specifically a result that becomes skewed towards certain members or groups within the population when the sample taken is not truly random.

Key Terms Populations vs. Samples Population: the complete collection of all elements (scores, people, measurements, and so on) to be studied. Sample: a set of data drawn from the studied population. Example: All of the students in our class form a (small) population. If I select five students from our class, they would form a sample. Parameters vs. Statistics Parameter: a descriptive measure of a population. This is a set number (constant) but is generally unknown and difficult to find. Statistic: a descriptive measure of a sample. This will vary between different samples. Example: If I take the average final grade of all of the students in our class, I will have a parameter of the (small) population that is our class. If, however, I take the average of the five students I selected for the sample, I would have a statistic. Variables vs. Data Variable: a random variable is a characteristic or measurement that can be determined for each member of a population. Variables are often denoted by capital letters such as X or Y. Data: data, collected during the sampling process, are the actual observed values of the variable. Data is often denoted by lower case letters and will usually include a subscript since there will be numerous data points observed for each variable. Example: Final grade is a random variable that could differ for each of you by the end of the term. At the end of the term, your actual final grades would form a set of data. The single random variable (X) would have 30+ observed values x1, x2, x3, …,x29, x30, …

Page 5: Week 1 Introduction to Statistics

Types of Data/Variables Qualitative Data (or categorical): data that can be separated into different categories that are distinguished by some nonnumeric characteristic.

Example: gender, country of birth, hair colour Quantitative Data: data consisting of numbers representing counts or measurements.

Example: height, age, counts Quantitative data can be further broken down into Discrete Data: data that has a finite number of possible values or a countable number of possible values. Commonly, discrete variables cannot take on decimal values, but there are some exceptions to this rule.

Example: counts, shoe size (not foot length) Continuous Data: data that has infinitely many possible values that can be associated with points on a continuous scale in such a way that there are no gaps or interruptions.

Example: height, length of time, crime rate (as a percent)

Levels of Measurement

1. The nominal level: (Data cannot be arranged in an ordering scheme. → no ranking) The process of placing cases into categories and counting their frequency of occurrence.

Example: Skin colours, gender (male, female), marital status (single, married, divorced, widowed)

Page 6: Week 1 Introduction to Statistics

2. The ordinal level: (Data appear to be nominal, but their values are in order.) The process of ordering or ranking cases in terms of the degree to which they have any given characteristic.

Example: Level of pain, shoe size (US).

3. The interval (and ratio) level: data are real numbers, such as heights, weights, incomes, time, temperature and distance. By contrast to the ordinal level, the interval and ratio levels of measurement indicate not only the ordering of categories but also a consistent distance between them. Most truly numeric data will be ratio level data. However, when there is no 'true zero' the data is limited to being interval.

Ratio Example: Speed of vehicle, the length of a prison sentence, a defendant’s number of prior convictions.

Interval Example: temperature (°C), year of graduation.

Why are all these different types and levels important? Well, the data's type and level of measurement will determine (limit) what methods (graphical and numerical) of descriptive statistics can be performed. (You may notice this when working in various statistical software like SPSS. If you try to create a bar graph with scale level data – SPSS's term for ratio and interval – it will revert to a histogram.) Treating Some Ordinal Data as Interval Distinction between ordinal and interval not always clear-cut Ordinal measures may be treated at the interval level when ordered categories are fairly

even Assumption of equidistance Increases statistical options in analysis

Page 7: Week 1 Introduction to Statistics

Retrieved from https://medium.com/@rndayala/data-levels-of-measurement-4af33d9ab51a