introduction statistics introduction professor ke-sheng cheng department of bioenvironmental systems...

76
STATISTICS Introduction Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

Upload: lauren-hampton

Post on 26-Dec-2015

223 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

STATISTICS IntroductionIntroduction

Professor Ke-Sheng ChengDepartment of Bioenvironmental Systems Engineering

National Taiwan University

Page 2: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

• Lecture notes will be posted on class website– www.rslabntu.net– Textbook: A modern introduction to probability and statistics / Dekking

et al. [Electronic book]

• Grades– Homeworks (40%) [No homework copying.]– Midterm (30%), Final (30%)

• The R language will be used for data analysis.• A tutorial session is arranged on Thursday (6:00 –

7:30 pm). Attendance of the tutorial session is voluntary.

• Class attendance rule– No recording of class attendance; however,– If you are more than 15 minutes late for the class, please do NOT enter

the classroom until the next class session.

04/19/23 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

2

Page 3: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

04/19/23 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

3

Page 4: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

What is “statistics”?

• Statistics is a science of “reasoning” from data.

• A body of principles and methods for extracting useful information from data, for assessing the reliability of that information, for measuring and managing risk, and for making decisions in the face of uncertainty.

04/19/23 4Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 5: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

• The major difference between statistics and mathematics is that statistics always needs “observed” data, while mathematics does not.

• An important feature of statistical methods is the “uncertainty” involved in analysis.

04/19/23 5Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 6: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

• Statistics is the discipline concerned with the study of variability, with the study of uncertainty and with the study of decision-making in the face of uncertainty. As these are issues that are crucial throughout the sciences and engineering, statistics is an inherently interdisciplinary science.

04/19/23 6Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 7: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

• One of the objectives of this course is to facilitate students with a critical way of thinking.– Weather forecasting– Realtime Flood forecasting– Projection of rainfall extremes under certain climate

change scenarios

04/19/23 7Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 8: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

Sources of uncertainties

• Data (sampling) uncertainty• Parameter uncertainty• Model structure uncertainty– An exemplar illustration

04/19/23 8Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 9: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

-100

0

100

200

300

400

500

600

700

800

0 10 20 30 40 50 60 70 80

You are given a set of (x,y) data. Apparently, Y is dependent on X.

04/19/23 9Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 10: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

Observed data with uncertainties (Linear model)

y = 9.9507x - 48.343

R2 = 0.9534

-100

0

100

200

300

400

500

600

700

800

0 10 20 30 40 50 60 70 80

04/19/23 10Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 11: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

Observed data with uncertainties(Power model)

y = 3.5218x1.2335

R2 = 0.8852

-100

0

100

200

300

400

500

600

700

800

0 10 20 30 40 50 60 70 80

The linear model fits the data better than the power model.

04/19/23 11Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 12: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

Theoretical model:

y = 3.5218x1.2335

R2 = 0.8852

-100

0

100

200

300

400

500

600

700

800

0 10 20 30 40 50 60 70 80

)60,0(~,8.2 3.1 NiidXY

Sum of squared errors (SSE) of estimates of the linear and power models (with respect to the theoretical model) are 12011.7 and 8950.08, respectively.

Theoretical model

The power model performs better than the linear model.

04/19/23 12Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 13: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

Key topics in statistics

• Probability• Estimation• Test of hypotheses• Regression• Forecasting• Quality control• Simulation• …

04/19/23 13Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 14: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

Deterministic vs Stochastic Models

• An abstract model is a description of the essential properties of a phenomenon that is formulated in mathematical terms. – An abstract model is used as a theoretical

approximation of reality to help us understand the world around us.

04/19/23 14Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 15: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

• Essentially, all models are wrong, but some are useful.

• Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful. (George E. P. Box)– Normal distribution for men’s height, grades in a

statistics class, etc.

04/19/23 15Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 16: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

Types of abstract models

• Deterministic model– A deterministic model describes a phenomenon

whose outcome is fixed.

• Stochastic model– A random/stochastic model describes the

unpredictable variation of the outcomes of a random experiment.

04/19/23 16Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 17: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

Examples

• Deterministic model– Suppose we wish to measure the area covered by

a lake that, for all practical purposes, appears to have a circular shoreline. Since we know the area A=r2, where r is the radius, we would attempt to measure the radius and substitute it in the formula.

04/19/23 17Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 18: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

• Stochastic model– Consider the experiment of tossing a balanced coin and

observing the upper face. It is not possible to predict with absolute accuracy what the upper face will be even if we repeat the experiment so many times. However, it is possible to predict what will happen in the long run. We can say that the probability of heads on a single toss is ½.

– P(more than 60 heads in 100 trials)

04/19/23 18Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 19: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

Variability and Uncertainty

• Determinism versus stochasticism• Is the real world deterministic or stochastic?– Determinism• We can perfectly predict future weather/climate if we

know all physics of the weather system and the initial conditions of an ancient year are given. • In reality, we do not know all physics of the weather

system. Many models (numeric weather prediction models and general circulation models) have been developed and no models are perfect.• Variabilities exist no matter the real world is

deterministic or stochastic.04/19/23 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental

Systems Engineering, National Taiwan University19

Page 20: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

• Variability of errors due to non-perfect models and/or incomplete initial conditions.• Examples of variabilities in a deterministic process.

– Deterministic variability (perfectly predictable under complete initial condition.

– Prediction errors under incomplete initial condition.

– Stochasticism• Variation due to randomness (probability) existing in

one or more components of a system.• Models may consist of both deterministic and

stochastic components. • Under stochasticism, a perfect stochastic model (then it

is no longer a model) is a model that perfectly describe the deterministic and stochastic behaviors of the system.

04/19/23 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

20

Page 21: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

• Even if we have a perfect stochastic model, we are not able to make perfect predictions. However, we can give a perfect statistical inference about our predictions.– Prediction errors are unpredictable (uncertainties), but their

properties can be perfectly described.

• In practice, we can never have a perfect model. Prediction errors are integral of errors due to non-perfect model (in both deterministic and stochastic components) and the inherent randomness.

04/19/23 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

21

Page 22: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

• An example of seemingly random deterministic variability.

04/19/23 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

22

Page 23: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

• An example of deterministic variability which looks seemingly random.

04/19/23 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

23

Page 24: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

04/19/23 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

24

Page 25: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

04/19/23 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

25

Page 26: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

Practical Applications of Statistics

• Iris recognition– An Iris code consists of 2048 bits.– The iris code of the same person may change at

different times and different places. Thus one has to allow for a certain percentage of mismatching bits when identifying a person.

– Of the 2048 bits, 266 may be considered as uncorrelated.

04/19/23 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

26

Hamming distance is defined as the fraction of mismatches between two iris codes.

Page 27: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

04/19/23 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

27

A modern introduction to probability and statistics : understanding why and how / Dekking et al.

Page 28: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

• Economic Warfare Analysis during World War II– In order to obtain more reliable estimates of German war

production, experts from the Economic Warfare Division of the American Embassy and the British Ministry of Economic Warfare started to analyze markings and serial numbers obtained from captured German equipment.

– Each piece of enemy equipment was labeled with markings, which included all or some portion of the following information: (a) the name and location of the maker; (b) the date of manufacture; (c) a serial number; and (d) miscellaneous markings such as trademarks, mold numbers, casting numbers, etc.

04/19/23 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

28

A modern introduction to probability and statistics : understanding why and how / Dekking et al.

Page 29: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

– The first products to be analyzed were tires taken from German aircraft shot over Britain and from supply dumps of aircraft and motor vehicle tires captured in North Africa. The marking on each tire contained the maker’s name, a serial number, and a two-letter code for the date of manufacture.

– The first step in analyzing the tire markings involved breaking the two-letter date code.• It was conjectured that one letter represented the month

and the other the year of manufacture, and that there should be 12 letter variations for the month code and 3 to 6 for the year code. This, indeed, turned out to be true. The following table presents examples of the 12 letter variations used by four different manufacturers.

04/19/23 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

29

Page 30: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

– For each month, the serial numbers could be recoded to numbers running from 1 to some unknown largest number N.

– The observed (recoded) serial numbers could be seen as a subset of this.

– The objective was to estimate N for each month and each manufacturer separately by means of the observed (recoded) serial numbers.

04/19/23 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

30

Page 31: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

– With a sample of about 1400 tires from five producers, individual monthly output figures were obtained for almost all months over a period from 1939 to mid-1943.

– The following table compares the accuracy of estimates of the average monthly production of all manufacturers of the first quarter of 1943 with the statistics of the Speer Ministry that became available after the war.

04/19/23 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

31

Page 32: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

– The accuracy of the estimates can be appreciated even more if we compare them with the figures obtained by Allied intelligence agencies. They estimated, using other methods, the production between 900 000 and 1 200 000 per month!

04/19/23 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

32

A modern introduction to probability and statistics : understanding why and how / Dekking et al.

Page 33: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

Practical Applications of Statistics

• Ebola outbreak in west Africa

04/19/23 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

33

(as of Aug. 26, 2014)

Page 34: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

2014 West Africa Ebola

• Total cases since the beginning of the 2014 outbreak

04/19/23 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

34

Page 35: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

2014 West Africa Ebola

• Total death counts since the beginning of the 2014 outbreak

04/19/23 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

35

Page 36: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

2014 West Africa Ebola

• Death rate since the beginning of the 2014 outbreak

04/19/23 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

36

Page 37: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

Random Experiment and Sample Space

• An experiment that can be repeated under the same (or uniform) conditions, but whose outcome cannot be predicted in advance, even when the same experiment has been performed many times, is called a random experiment.

04/19/23 37Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 38: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

• Examples of random experiments– The tossing of a coin. – The roll of a die. – The selection of a numbered ball (1-50) in an urn.

(selection with replacement) – The time interval between the occurrences of two

higher than scale 6 earthquakes. – The amount of rainfalls produced by typhoons in

one year (yearly typhoon rainfalls).

04/19/23 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

38

Page 39: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

• The following items are always associated with a random experiment: – Sample space. The set of all possible outcomes,

denoted by . – Outcomes. Elements of the sample space,

denoted by . These are also referred to as sample points or realizations.

– Events. Subsets of for which the probability is defined. Events are denoted by capital Latin letters (e.g., A,B,C).

04/19/23 39Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 40: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

Definition of Probability

• Classical probability• Frequency probability• Probability model

04/19/23 40Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 41: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

Classical (or a priori) probability

• If a random experiment can result in n mutually exclusive and equally likely outcomes and if nA of these outcomes have

an attribute A, then the probability of A is the fraction nA/n .

04/19/23 41Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 42: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

• Example 1.

Compute the probability of getting two heads if a fair coin is tossed twice. (1/4)

• Example 2.

The probability that a card drawn from an ordinary well-shuffled deck will be an ace or a spade. (16/52)

04/19/23 42Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 43: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

Remarks

• The probabilities determined by the classical definition are called “a priori” probabilities since they can be derived purely by deductive reasoning.

04/19/23 43Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 44: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

• The “equally likely” assumption requires the experiment to be carried out in such a way that the assumption is realistic; such as, using a balanced coin, using a die that is not loaded, using a well-shuffled deck of cards, using random sampling, and so forth. This assumption also requires that the sample space is appropriately defined.

04/19/23 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

44

Page 45: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

• Troublesome limitations in the classical definition of probability: – If the number of possible outcomes is infinite; – If possible outcomes are not equally likely.

04/19/23 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

45

Page 46: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

Relative frequency(or a posteriori) probability

• We observe outcomes of a random experiment which is repeated many times. We postulate a number p which is the probability of an event, and approximate p by the relative frequency f with which the repeated observations satisfy the event.

04/19/23 46Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 47: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

• Suppose a random experiment is repeated n times under uniform conditions, and if event A occurred nA times, then the relative frequency for which A occurs is fn(A) = nA/n. If the limit of fn(A) as n approaches infinity exists then one can assign the probability of A by:

P(A)= .)(lim Af n

n

04/19/23 47Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 48: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

• This method requires the existence of the limit of the relative frequencies. This property is known as statistical regularity. This property will be satisfied if the trials are independent and are performed under uniform conditions.

04/19/23 48Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 49: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

• Example 3

A fair coin was tossed 100 times with 54 occurrences of head. The probability of head occurrence for each toss is estimated to be 0.54.

04/19/23 49Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 50: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

• The chain of probability definition

04/19/23 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

50

Random experimen

t

Sample space

Event space

Probability space

Page 51: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

Probability Model

04/19/23 51Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 52: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

Event and event spaceEvent and event spaceAn event is a subset of the sample space. The class of all events associated with a given random experiment is defined to be the event space.

04/19/23 52Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 53: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

Remarks

04/19/23 53Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 54: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

• Probability is a mapping of sets to numbers. • Probability is not a mapping of the sample

space to numbers. – The expression is not defined.

However, for a singleton event , is defined.

for )(P}{ })({P

04/19/23 54Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 55: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

Probability space

• A probability space is the triplet (, , P[]), where is a sample space, is an event space, and P[] is a probability function with domain .

• A probability space constitutes a complete probabilistic description of a random experiment. – The sample space defines all of the possible

outcomes, the event space defines all possible things that could be observed as a result of an experiment, and the probability P defines the degree of belief or evidential support associated with the experiment.

04/19/23 55Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 56: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

Conditional probability

04/19/23 56Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 57: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

Bayes’ theorem

04/19/23 57Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 58: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

Multiplication rule

04/19/23 58Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 59: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

Independent events

04/19/23 59Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 60: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

• The property of independence of two events A and B and the property that A and B are mutually exclusive are distinct, though related, properties.

• If A and B are mutually exclusive events then AB=. Therefore, P(AB) = 0. Whereas, if A and B are independent events then P(AB) = P(A)P(B). Events A and B will be mutually exclusive and independent events only if P(AB)=P(A)P(B)=0, that is, at least one of A or B has zero probability.

04/19/23 60Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 61: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

• But if A and B are mutually exclusive events and both have nonzero probabilities then it is impossible for them to be independent events.

• Likewise, if A and B are independent events and both have nonzero probabilities then it is impossible for them to be mutually exclusive.

04/19/23 61Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 62: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

Summarizing data

• Qualitative data– Frequency table

Freq Relative freqRed 14 0.156

Green 16 0.178Blue 21 0.233

Yellow 9 0.100White 25 0.278Pink 5 0.056

04/19/23 62Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 63: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

– Bar chart

0

5

10

15

20

25

30

Red Green Blue Yellow White Pink

04/19/23 63Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 64: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

• Quantitative data

– Histogram

35 57 43 78 77 7775 88 86 78 79 4133 88 75 72 75 7773 50 50 24 60 4060 87 59 73 83 9085 88 33 65 82 3178 95

04/19/23 64Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 65: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

• Boxplot

04/19/23 65Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 66: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

04/19/23 66Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 67: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

04/19/23 67Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 68: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

04/19/23 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

68

Page 69: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

• Dealing with outliers– Should the outliers be discarded or should they be

retained?– An example of outlier presence• Typhoon Morakot

04/19/23 69Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 70: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

Typhoon Morakot

• Cumulative rainfall (Aug 7, 0:00 – 24:00)

04/19/23 70Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 71: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

• Cumulative rainfall (Aug 8, 0:00 – 24:00)

04/19/23 71Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 72: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

• Cumulative rainfall (Aug 9, 0:00 – 24:00)

04/19/23 72Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 73: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

Cumulative rainfall in mm

• 2009/08/07 00:00 ~ 2009/08/09 17:00

04/19/23 73Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 74: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

Measures of Central Tendency

• Mean– Sum of measurements divided by the number of

measurements.• Median– Middle value when the data are sorted.

• Mode– Value or category that occurs most frequently.

04/19/23 74Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 75: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

Measures of Variation

• Standard Deviation - summarizes how far away from the mean the data value typically are.

• Range

n

iin xx

ns

1

2

)1(

1

minmax xxR

04/19/23 75Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

Page 76: Introduction STATISTICS Introduction Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

Reading assignment

• IPSUR– Chapter 2– Chapter 3• 3.1.1, 3.1.3, 3.1.4• 3.3• 3.4.3, 3.4.4, 3.4.5, 3.4.6, 3.4.7

• AMIPS– Chapter 2– Chapter 3

04/19/23 Lab for Remote Sensing Hydrology and Spatial Modeling Department of Bioenvironmental Systems Engineering, National Taiwan University

76