statistical inference - lecture 1: probability theory

Statistical InferenceLecture 1: Probability Theory

MING GAO

DASE @ ECNU(for course related communications)

[email protected]

Mar. 2, 2021

Outline

1 Introduction

2 Set Theory

3 Basics of Probability TheoryThe Calculus of ProbabilitiesCounting

4 Random Variable and DistributionsRandom VariableDistribution FunctionsDensity and Mass Functions

5 Take-aways

MING GAO (DaSE@ECNU) Statistical Inference Mar. 2, 2021 2 / 61

Introduction

Probability as a mathematical framework for:

reasoning about uncertainty

deriving approaches to inference problems

Introduction

Probability as a mathematical framework for:

reasoning about uncertainty

deriving approaches to inference problems

Set Theory

Experiment and sample space

Experiment

An experiment is a procedure that yields one of a given set of pos-sible outcomes.

Sample space

The sample space, denoted as Ω, of the experiment is the set ofpossible outcomes.

Example

Roll a die one time,Ω = 1, 2, 3, 4, 5, 6.We toss a coin twice (Head= H, Tail = T),Ω = HH,HT ,TH,TT.

“List” (set) of possibleoutcomes

List must be:1 Mutually exclusive2 Collectively exhaustive

Art: to be at the “right”granularity

Set Theory

Experiment

Sample space

Example

Set Theory

Experiment

Sample space

Example

Set Theory

Experiment

Sample space

Example

Set Theory

Continuous sample space

For this case, sample spaceΩ = (x , y)|0 ≤ x , y ≤ 1.Note that the sample space isinfinite and uncountable.

In this course, we consider thecountable sample spaces anduncountable sample spaces.Thus, we call the learning contentto be the discrete probability andcontinuous probability,respectively.

Set Theory

Event and set operators

An event, represented as a set, is a subset of the sample space.

Example

Toss at least one headB = HH,HT ,TH ⊂Ω;

Toss at least three headC = ∅ ⊂ Ω.

There are 2|Ω| events foran experiments;

Events therefore have allset operations.

Operators

Containment:A ⊂ B ⇔ x ∈ A⇒ x ∈ B;

Union:A ∪ B = x |x ∈ A or x ∈ B;Intersection:A ∩ B = x |x ∈ A and x ∈ B;Difference:A− B = x : x ∈ A ∧ x 6∈ B;Complement: Ac = x |x 6∈ A;

Set Theory

Example

Operators

Set Theory

Example

Operators

Set Theory

Set identities

Table of set identities

equivalence name

A ∩ U = A Identity lawsA ∪ ∅ = A

A ∩ A = A Idempotent lawsA ∪ A = A

(A ∩ B) ∩ C = A ∩ (B ∩ C ) Associative laws(A ∪ B) ∪ C = A ∪ (B ∪ C )

A ∪ (A ∩ B) = A Absorption lawsA ∩ (A ∪ B) = A

A ∩ A = ∅ Complement laws

A ∪ A = U

Set Theory

Logical equivalence Cont’d

Table of set identities

equivalence name

A ∪ U = U Domination lawsA ∩ ∅ = ∅

A ∩ B = B ∩ A Commutative lawsA ∪ B = B ∪ A

(A ∩ B) ∪ C = (A ∪ C ) ∩ (B ∪ C ) Distributive laws(A ∪ B) ∩ C = (A ∩ C ) ∪ (B ∩ C )

(A ∩ B) = A ∪ B De Morgan’s laws

(A ∪ B) = A ∩ B

Set Theory

Extensions of set operators

If A1,A2, · · · ,An, · · · is a collection of events, all defined on a samplespace Ω, then

Union:⋃∞

i=1 Ai = x ∈ Ω|x ∈ Ai for some i;Intersection:

⋂∞i=1 Ai = x ∈ Ω|x ∈ Ai for all i;

Example

Let Ω = (0, 1] and define Ai = [ 1i , 1]. then⋃∞

i=1 Ai = limn→+∞⋃n

i=1[ 1i , 1] = limn→+∞[ 1

n , 1] = (0, 1];⋂∞i=1 Ai = limn→+∞

⋂ni=1[ 1

i , 1] = limn→+∞[1, 1] = 1;

Set Theory

Extensions of set operators Cont’d

If Γ is an uncountable collection of events, all defined on a samplespace Ω, then

Union:⋃

a∈Γ Ai = x ∈ Ω|x ∈ Aa for some a;Intersection:

⋂a∈Γ Ai = x ∈ Ω|x ∈ Aa for all a;

Example

Let Γ = R+ and define Aa = (0, a]. then⋃a∈Γ Aa = lima→+∞

⋃ai=1(0, a] = (0,+∞];⋂

a∈Γ Aa = lima→+∞⋂a

i=1(0, a] = ∅;

Set Theory

Disjoint and partition

Two events A and B are disjoint (or mutually exclusive) ifA∩B = ∅. The events A1,A2, · · · ,An, · · · are pairwise disjoint(or mutually exclusive) if Ai ∩ Aj = ∅ for all i 6= j .

If A1,A2, · · · ,An, · · · are pairwise disjoint and⋃∞

i=1 Ai = Ω,then the collection A1,A2, · · · ,An, · · · forms a partition of Ω.

The collection Ai = [i , i + 1) for i ∈ N consists of pairwisedisjoint sets. Furthermore, we have

⋃∞i=1 Ai = [0,+∞).

The collection of sets Ai = [i , i + 1) for i ∈ N forms a partitionof [0,+∞).

Set Theory

⋃∞i=1 Ai = [0,+∞).

Set Theory

⋃∞i=1 Ai = [0,+∞).

Set Theory

⋃∞i=1 Ai = [0,+∞).

Basics of Probability Theory

Sigma Algebra

A collection of subsets of Ω is called a sigma algebra (or Borel field),denoted as B, if it satisfies the following three properties

∅ ∈ B;

If A ∈ B, then Ac ∈ B;

If A1,A2, · · · ∈ B, then⋃∞

i=1 Ai ∈ B (B is closed undercountable unions).

In addition, from DeMorgan’s Laws it follows that B is closed undercountable intersection, i.e.,

( ∞⋃i=1

)c=∞⋂i=1

Aci ∈ B.

Sigma Algebra

∅ ∈ B;

If A1,A2, · · · ∈ B, then⋃∞

( ∞⋃i=1

)c=∞⋂i=1

Aci ∈ B.

Sigma Algebra

∅ ∈ B;

If A1,A2, · · · ∈ B, then⋃∞

( ∞⋃i=1

)c=∞⋂i=1

Aci ∈ B.

Sigma Algebra

∅ ∈ B;

If A1,A2, · · · ∈ B, then⋃∞

( ∞⋃i=1

)c=∞⋂i=1

Aci ∈ B.

Example

If Ω is finite or countable, the sigma algebra of Ω can be defined as

B = A|A ⊆ Ω.

If Ω has n elements, there are 2n sets in B;

If Ω is countable infinite, the cardinality of B is ℵ0;

Let Ω = (−∞,+∞), then B is chosen to contain all sets ofthe form

[a, b], (a, b], (a, b) and [a, b)

for all real numbers a and b.

Example

B = A|A ⊆ Ω.

[a, b], (a, b], (a, b) and [a, b)

Example

B = A|A ⊆ Ω.

[a, b], (a, b], (a, b) and [a, b)

Probability function

Definition

Given a sample space Ω and an associated sigma algebra B, a prob-ability function is a function P with domain B that satisfies

Nonnegativity: P(A) ≥ 0 for all A ∈ BNormalization: P(Ω) = 1 and P(∅) = 0;

Additivity: If A1,A2, · · · ∈ B are pairwise disjoint, thenP(⋃∞

i=1 Ai ) =∑∞

i=1 P(Ai ).

Note that

The three properties given in the definition are usually referredto as the Axioms of probability (or the Kolmogorov Axioms).

Any function P that satisfies the Axioms of probability is calleda probability function.

Definition

Nonnegativity: P(A) ≥ 0 for all A ∈ B

Normalization: P(Ω) = 1 and P(∅) = 0;

i=1 Ai ) =∑∞

i=1 P(Ai ).

Note that

Definition

i=1 Ai ) =∑∞

i=1 P(Ai ).

Note that

Definition

i=1 Ai ) =∑∞

i=1 P(Ai ).

Note that

Definition

i=1 Ai ) =∑∞

i=1 P(Ai ).

Note that

Definition

i=1 Ai ) =∑∞

i=1 P(Ai ).

Note that

Example of defining probability

Consider the simple experiment of tossing a fair coin, so Ω = H,T,hence the reasonable probability function is

P(H) = P(T).

P(H,T) = P(H) + P(T) = 1;

P(H) = P(T) = 12 ;

We can also define probability over an unfair coin, e.g.,P(H) = 1

3 and P(T) = 23 .

P(H) = P(T).

P(H,T) = P(H) + P(T) = 1;

P(H) = P(T) = 12 ;

3 and P(T) = 23 .

P(H) = P(T).

P(H,T) = P(H) + P(T) = 1;

P(H) = P(T) = 12 ;

3 and P(T) = 23 .

P(H) = P(T).

P(H,T) = P(H) + P(T) = 1;

P(H) = P(T) = 12 ;

3 and P(T) = 23 .

Basics of Probability Theory The Calculus of Probabilities

Outline

1 Introduction

2 Set Theory

5 Take-aways

Probability operators

Operators

Let Ω be the sample space, A and B be two events:

1 If A ∩ B = ∅, then

P(A ∪ B) = P(A) + P(B);

P(Ac) = P(Ω)− P(A) = 1− P(A);

3 If A and B are independent, then

P(A ∩ B) = P(A) · P(B);

4 The conditional probability of A given B, denoted by P(A|B),is computed as

P(A|B) =P(A ∩ B)

Operators

P(A ∪ B) = P(A) + P(B);

P(Ac) = P(Ω)− P(A) = 1− P(A);

P(A ∩ B) = P(A) · P(B);

P(A|B) =P(A ∩ B)

Operators

P(A ∪ B) = P(A) + P(B);

P(Ac) = P(Ω)− P(A) = 1− P(A);

P(A ∩ B) = P(A) · P(B);

P(A|B) =P(A ∩ B)

Operators

P(A ∪ B) = P(A) + P(B);

P(Ac) = P(Ω)− P(A) = 1− P(A);

P(A ∩ B) = P(A) · P(B);

P(A|B) =P(A ∩ B)

Probability properties I

If P is a probability function, A and B are any two sets in B, then

P(∅) = 0, where ∅ is the empty set;

P(A) ≤ 1;

P(B ∩ Ac) = P(B)− P(A ∩ B);