data in context chapter 1 of data basics. frameworks today, we will be presenting two frameworks for...

Post on 17-Jan-2018

223 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

We’ll begin by organizing concepts that underlie statistical information. Frameworks

TRANSCRIPT

Data in context

Chapter 1 of Data Basics

Frameworks

Today, we will be presenting two frameworks for thinking about the content of data services.

A. Statistics and Data• What are statistics? What are data?• A chart of statistical information

B. Continuum of Access• Understanding dissemination channels

We’ll begin by organizing concepts that underlie statistical information.

Frameworks

Statistics are ubiquitous

“Statistics are generated today about nearly every activity on the planet. Never before have we had so much statistical information about the world in which we live. Why is this type of information so abundant? For one thing, statistics have become a form of currency in today’s information society. Through computing technology, society has become very proficient in calculating statistics from the vast quantities of data that are collected. As a result, our lives involve daily transactions revolving around some use of statistical information.”

Data Basics, page 1.1

What are we talking about?

Statistics and DataStatistics

• numeric facts/figures in the form of summaries

• created from data, i.e, already processed

• presentation-readyData

• numeric files created and organized for analysis

• requires processing• not ready for display• methodology-driven

Statistics and Data

Statistics and Data

Stories are told through statistics

The National Population Survey used in this example had over 80,000 respondents in 1996-97 sample and the Canadian Community Health Survey in 2005 has over 130,000 cases. How do we tell the stories about each of these respondents?

We create summaries of these life experiences using statistics.

Dimensions of statistics

Six dimensions or variables in this tableThe cells in the table are the number ofestimated smokers.

GeographyRegion

TimePeriods

Unit of Observation AttributesSmokersEducationAgeSex

Definitions use classifications

The definitions for concepts and variables use classification systems to assign categories or values to the properties of the concepts. For example, Region in this table consists of Canada and the ten provinces.

Some classifications are based on standards while others are based on convention or practice.

For example, Standard Geography classifications

Definitions use classifications

Classifications involve categories

CategoriesSex

TotalMaleFemale

Periods1994-19951996-1997

Statistics are about definitions

Each characteristic or variable that is measured or recorded about the unit of observation must be clearly defined. Statistics Canada has definitions for some of the more frequently used concepts and variables on its website under “Definitions, data sources and methods.”

The Census Dictionary is an important source for definitions of the concepts and variables in each Census.

Definitions and metadata

All of the definitions and information that describe the unit of observation, the universe, the sampling method, the concepts and the variables are critical to understand both the data and the statistics derived from the data.

We use to talk about codebooks and about the User’s Guide and Data Dictionary when speaking of data documentation. Now we refer to this documentation as metadata, which has been expanded to include documentation throughout the life cycle of a survey. The Data Documentation Initiative 3.0 standard is being used to organize this information.

Methods producing data Observational

MethodsExperimental

MethodsComputational

Methods

Focus is on developing observational instruments to collect data

Focus is on manipulating causal agents to measure change in a response agent

Focus is on modeling phenomena through mathematical equations

Correlation Causation Prediction

Replicate the analysis (same data or similar) Replicate the experiment Replicate the simulation

Statistics summarize observations

Statistics summarize experiment results

Statistics summarize simulation results

Methods producing data

A particular discipline or field will tend to be dominated by one of these three methods, although outputs may also exist from the other two methods.

Consequently, the knowledge disseminated within a field is often fairly homogeneous in how statistical information is used and reported.

Knowing this and the life cycle in which statistics are produced can help in the search for statistics.

Summary

Statistics are derived from observational, experimental or simulated data .

A table is a format for displaying statistics and presents a summary or one view of the data.

Tables are structured around geography, time and attributes of the unit of observation.

Statistics are dependent on definitions. Statistics summarize individual stories into

common or general stories.

top related