@let@token lecture #3: probability theoryncbs.knu.ac.kr/teaching/oc_files/opc_lecture3.pdflecture...

58
Lecture #3: Probability Theory Basic concepts of Probability Random Variables Transformations of random variables Multiple Random Variables 4.1 Statistical Independence Stochastic Process White noise Lecture #3: Probability Theory Website: http://ncbs.knu.ac.kr Email: [email protected] College of IT Engineering Kyungpook National University Kalyana C. Veluvolu (#IT1 - 817) OPTIMAL CONTROL - ELEC732001 #1

Upload: others

Post on 19-Feb-2021

10 views

Category:

Documents


0 download

TRANSCRIPT

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Lecture #3: Probability Theory

    Website: http://ncbs.knu.ac.kr

    Email: [email protected] of IT Engineering

    Kyungpook National University

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #1

    http://ncbs.knu.ac.kr/Teaching/index.htm

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Why Probability Theory?

    • In our attempt, this course’s objective, to estimate asystem’s outcome (signal), we will be trying toextract meaningful information about the outcomeand uncertainties (noise characteristics).

    • In order to accomplish this, we need to know aboutthe uncertainty, its characteristics and how it can behandled.

    • Various real-world situations that involve uncertaintyare:• throwing of dice;• measuring of a physical parameter such as length,current, temperature etc;

    • sampling a batch of manufactured items

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #2

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Why Probability Theory?

    • Probability theory is a universally accepted tool forexpressing degrees of confidence or doubt aboutsome proposition (outcomes) in the presence ofuncertainty, or randomness, in some sense.

    • This class reviews about the probability theory.Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #3

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Outline

    1 Basic concepts of Probability

    2 Random Variables

    3 Transformations of random variables

    4 Multiple Random Variables4.1 Statistical Independence

    5 Stochastic Process

    6 White noise

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #4

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Definitions: Sample spaceProbability theory starts with the consideration of asample space.

    The sample space is the set of all possible outcomes inany physical experiment.

    For example, if a coin is tossed twice and after each tossthe face that shows is recorded, then the possibleoutcomes of this particular experiment are

    HH , HT , TH , TT

    with H denoting the occurrence of head and T for tails.Then,

    S = {HH , HT , TH , TT} (1)is called as the sample space of this coin-tossingexperiment.

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #5

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Definitions

    In general, sample space is

    • countably infinite (finite): For example, toss a coinuntil the first time head shows up and record thetrail number.

    • uncountably infinite: For example, consider theexperiment of choosing a number at random fromthe interval [0, 1]

    Let Ω be the sample space of an experiment E . Thenany subset A of Ω, including the empty set φ and theentire sample space Ω is called an event. Events maycontain even one single sample point ω, in which casethe event is a Singletonset ω.

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #6

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Events, sets and words

    Experiment: toss a coin 3 times.Which of the following describes the event{THH ,HTH ,HHT} ?(1) Exactly one head {TTH ,HTT ,THT}(2) Exactly one tail {THH ,HTH ,HHT}(3) at most one tail {THH ,HTH ,HHT ,HHH}(4) none of the aboveAnswer: (2) Exactly one tailNotice that the same event E ⊂ Ω may be described inwords in multiple ways - exactly 2 heads and exactly 1tail.

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #7

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Events, sets and words

    Experiment: toss a coin 3 times.Q: Which of following equals the event “exactly twoheads” ?A = THH ,HTH ,HHT ,HHHB = THH ,HTH ,HHTC = HTH ,THH(1) A (2) B (3) C (4) B or C

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #8

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Probability measure

    Given a sample space Ω, a probability or aprobabilitymeasure on Ω is a function P on subsets of Ωsuch that

    a) P(A) > 0 for any A ⊆ Ω;b) P(Ω) = 1;

    c) Given disjoint subsets A1,A2, · · · of Ω,P(∪∞i=1Ai) =

    ∑i=1∞ P(Ai). This property is known

    as countable additivity.

    Let Ω be a finite sample space consisting of N samplepoints. We say that the sample points are equally likelyif P(ω) = 1

    Nfor each sample point ω.

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #9

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Probability

    • Let Ω be a finite sample space consisting of Nequally likely sample points. Let A be any event andsuppose A contains n distinct sample points. Then

    P(A) =n

    N=

    No.of sample points favourable to A

    Total number of sample points

    • For instance, our experiment may be rolling asix-sided fair dice. Event A may be defined as thenumber 4 showing up on the top surface of the dieafter we roll. Then probability of Event A is 1/6.Similarly for any number in the dice.

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #10

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Probability and Induction• By convention, calibrated probabilities lies from 0 to

    1. All propositions (outcomes) fall somewhere inbetween. If probability is zero, the occurrence ofthat event is impossible. If probability is one, thenthe occurrence of that event is certain.

    • Probability statements that we make can be basedon our past experience, or on our personaljudgments.

    • Whether our probability statements are based onpast experience or subjective personal judgments,they obey a common set of rules, which we can useto treat probabilities in a mathematical framework,and also for making decisions on predictions, forunderstanding complex systems, or as intellectualexperiments and for entertainment.

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #11

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Probability and Induction

    • Suppose that we know somehow from pastobservations the probability P(A) of an event A in agiven experiment. What conclusion can we drawabout the occurrence of this event in a single futureperformance of this experiment?

    • Suppose that P(A) = 0.6. In this case, the number0.6 gives us only a “certain degree of confidence”that the event A will occur. The known probabilityis thus used as a ” measure of our belief” about theoccurrence of A in a single trial. In a single trial, theevent A will either occur or will not occur. If it doesnot, this will not be a reason for questioning thevalidity of the assumption that P(A) = 0.6.

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #12

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Probability

    Example: What is the probability of being dealt four ofa kind in a five card poker ?Solution:Event: Dealing Four of a kind in a poker game.Total number of outcomes: number of subset of size fivethat can be picked from a deck of 52 cards = total

    number of possible poker hands =

    (525

    )= 2, 598, 960

    Number of times event occurs: Out of all those hands,there are 48 possible hands containing four aces, 48possible hands containing four kings, and so on. So thereare a total of 13× 48 hands containing four of a kind.P(event) = Number of time A occurs

    Total number of outcomes= (13)(48)

    2598960= 0.024

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #13

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Probability and set

    operations on events

    Events are A, L, RRule 1: Complements. P(Ac) = 1− P(A)Rule 2: Disjoint events. If L and R are disjoint thenP(L ∪ R) = P(L) + P(R)Rule 3: Inclusion-exclusion principle. Any L and R:P(L ∪ R) = P(L) + P(R)− P(L ∩ R)

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #14

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Definitions: EventExample: Lucky Larry has a coin that youre quite sureis not fair.

    • He will flip the coin twice (independently)• Its your job to bet whether the outcomes will be the

    same (HH, TT) or different (HT, TH)

    Solution: Lets the probability of heads is P(H) = p andprobability of tail is P(T ) = 1− P(H) = (1− p) = q.Since, the flips are disjoint the probabilities for

    Event — Probability

    HH — p2

    TT — q2

    HT — pq

    TH — pq

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #15

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    So, the probability of same outcome isP(same) = P(HH) + P(TT ) = p2 + q2

    The probability of different outcomes isP(diff ) = P(HT ) + P(HT ) = 2p + qArithmetic: If a 6= b the (a − b)2 > 0⇔ a2 + b2 > 2abSince, the coin is unfair, we know p 6= q.Thusp2 + q2 > 2pq ⇒ P(same) > P(different)

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #16

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Conditional ProbabilityP(A|B) is the conditional probability of A given B , thatis, the probability that A occurs given the fact that Boccurred, defined as:

    P(A|B) = P(A,B)P(B)

    • P(A,B) is the joint probability of A and B , that is,the probability that events A and B both occur.

    • The probability P(A) or P(B) is called an a prioriprobability because it applies to the probability of anevent apart from any previously known information.

    • A conditional probability is called an a posterioriprobability because it applies to a probability giventhe fact that some information about a possiblyrelated event is already known.

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #17

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Conditional Probability

    The conditional probability of event A given event B canbe defined if the probability of B is nonzero.Example: Suppose that that A is the appearance of a 4on a die, and B is the appearance of an even number ona die. How much is P(A|B)?Solution:Event A: Appearance of a 4 on a die - P(A) = 1/6Event B: Appearance of an even number on a die -P(B) = 3/6 = 1/2Joint probability: Appearance of 4 and even number -P(A,B) = 1/6Then, Conditional probability isP(A|B) = P(A,B)

    P(B)= 1/6

    1/2= 1

    3

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #18

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Conditional Probability

    Consider the eight shapes in above figure. Find theP(gray |circle)?

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #19

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Bayes’ RuleFor two events A and B, we can write the conditionalprobability as:

    P(A|B) = P(A,B)P(B)

    ;

    P(B |A) = P(A,B)P(A)

    From above two equations, we can obtain...

    P(A|B)P(B) = P(B |A)P(A)Bayes’ rule is often written by re-arranging the aboveequation to obtain:

    P(A|B) = P(B |A)P(A)P(B)

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #20

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Bayes’ rule

    Consider the eight shapes in above figure. Find theP(gray |circle)?

    P(circle|gray) = P(circle|gray)P(circle)P(gray)

    P(circle|gray) = 1/3 ; P(circle) = 3/8; P(gray) = 5/8Conditional probabilityP(gray |circle) = P(circle|gray)P(circle)

    P(gray)= (1/5)(5/8)

    (3/8)= 1/5

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #21

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Random VariablesRandom variable (RV) is defined as a functional mappingfrom a set of experimental outcomes (the domain) to aset of real numbers (the range).

    • For example, the roll of a die can be viewed as a RVif we map the appearance of one dot on the die tothe output one, the appearance of two dots on thedie to the output two, and so on.

    • If we define X as an RV that represents the roll of adie, then the probability that X will be a four isequal to 1/6. If we then roll a four, the four is arealization of the RV X. If we then roll the die againand get a three, the three is another realization ofthe RV X. However, the RV X exists independentlyof any of its realizations.

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #22

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Random Variables• This distinction between an RV and its realizations

    is important for understanding the concept ofprobability. Realizations of an RV are not equal tothe RV itself.

    • The probability of X = 4 is equal to 1/6, thatmeans that there is a 1 out of 6 chance that eachrealization of X will be equal to 4. However, the RVX will always be random and will never be equal to aspecific value.

    • An RV can be either continuous or discrete. Thethrow of a die is a discrete random variable becauseits realizations belong to a discrete set of values.The high temperature tomorrow is a continuousrandom variable because its realizations belong to acontinuous set of values.

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #23

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Probability distribution

    function

    The fundamental property of an RV X is its probabilitydistribution function (PDF) FX (x), defined as

    FX (x) = P(X ≤ x)

    where x is a nonrandom independent variable orconstant.Properties:

    1 FX (x) ∈ [0, 1]2 FX (−∞) = 0; FX (∞) = 13 FX (a) ≤ FX (b) if a ≤ b4 P(a < X < b) = FX (b)− FX (a)

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #24

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Probability density functionThe probability density function (pdf) fX (x) is defined asthe derivative of the probability distribution function.

    fX (x) =dFX (x)

    dx

    Properties:

    1 FX (x) =∫ x−∞ fX (z)dz

    2 fX (x) ≥ 03∫∞−∞ fX (x) = 1

    4 P(a < X < b) =∫ ba

    fX (x)dx

    Note: probability distribution function is a probability ofrandom variable. Probability density function containsinformation about probability but it is not a probabilitysince can have any value positive, even larger than one.

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #25

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Moments

    First Moment: The expected value of an RV X is definedas its average value over a large number of experiments.This can also be called the expectation, the mean, or theaverage of the RV.Suppose we run the experiment for N times and observeof m different outcomes. We observe that outcome A1occurs n1 times, A2 occurs n2 times, so on, Am occursnm times. Then the expected value of X is computed as:

    E (X ) =1

    N

    m∑i=1

    Aini

    E (X ) is also often written as E (x), X̄ , or x̄

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #26

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Mean

    Suppose that we roll a die an infinite number of times.We would expect to see each possible number 1/6 of thetime each. We can compute the expected value of theroll of the die as

    E (X ) = limN←∞

    1

    N[1(N/6) + · · ·+ 6(N/6)] = 3.5

    Note that the expected value of an RV is not necessarilywhat we would expect to see when we run a particularexperiment. For example, even though the aboveexpected value of X is 3.5, we will never see a 3.5 whenwe roll a die.

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #27

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Mean and VarianceIf a function, say g(X ), acts upon an RV, then theoutput of the function is also an RV.For example, if X is the roll of a die, thenP(X = 4) = 1/6.If g(X ) = X 2, then P[g(X ) = 16] = 1/6. We cancompute the expected value of any function g(X ) as

    E [g(X )] =

    ∫ ∞−∞

    g(x)fX (x)dx

    where fX (x) is the pdf of X.If g(X ) = X , then we compute expected value of X as

    E [X ] =

    ∫ ∞−∞

    xfX (x)dx

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #28

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    VarianceThe variance of a random variable is a measure of howmuch we expect the random variable to vary from itsmean.Case 1: If the RV X always is equal to one value (forexample, the die is loaded and we always get a 4 whenwe roll the die), then the variance of X is equal to 0.Case 2: If X can take on any value between ±∞ withequal probability, then the variance of X is equal to ∞.The variance of a random variable is defined as

    σ2X = E [(X − x̄)2] =∫ ∞−∞

    (x − x̄)2fX (x)dx

    The notation to indicate a random variable X with meanx̄ and variance σ2 is

    X ∼ (x̄ , σ2)Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #29

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Skewness and Kurtosis

    The skew of an RV is a measure of the asymmetry of thepdf around its mean. Skew is defined as

    skew = E [(X − x̄)3]Kurtosis is a measure of whether the data are peaked orflat relative to a normal distribution. That is, data setswith high kurtosis tend to have a distinct peak near themean, decline rather rapidly, and have heavy tails.

    Kurtosis = E [(X − x̄)4]Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #30

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Uniform DistributionAn RV is called uniform it its pdf is a constant valuebetween two limits. This indicates that the RV has anequally likely probability of obtaining any value betweenits limits, but a zero probability of obtaining a valueoutside its limits.

    fX (x) =

    {1

    b−a x ∈ [a, b]0 otherwise

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #31

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Uniform Distribution

    In this example we will find the mean and variance of anRV that is uniformly distributed between 1 and 3. Thepdf of the RV is given as

    fX (x) =

    {12

    x ∈ [1, 3]0 otherwise

    Mean is computed as:x̄ =

    ∫∞−∞ xfX (x)dx =

    ∫ 31

    (1/2)xdx = 2Variance is computed as:σ2X =

    ∫∞−∞(x − x̄)

    2fX (x)ds =∫ 31

    (1/2)(x − 2)2dx = 1/3

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #32

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Uniform Distribution

    In this example we will find the mean and variance of anRV that is uniformly distributed between 1 and 3. Thepdf of the RV is given as

    fX (x) =

    {15

    x ∈ [1, 6]0 otherwise

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #33

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Normal distributionAn RV is called Gaussian or normal if its pdf is given by

    fX (x) =1

    σ√

    2πexp[−(x − x̄)2

    2σ2]

    where x̄ and σ2 are mean and variance of the GaussianRV. We use the notation X ∼ N(x̄ , σ2)

    If the mean changes, the pdf will shift to the left or right.If the variance increases, the pdf will spread out. If thevariance decreases, the pdf will be squeezed in.

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #34

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Transformations of random

    variableswhat happens to the pdf of an RV when we pass the RVthrough some function?Suppose that we have two RVs, X and Y, related to oneanother by the monotonic functions g() and h():Y = g(X )X = g−1(Y ) = h(Y )If we know the pdf of X [fX (X )], then we can computethe pdf of Y [fY (y)] as follows:

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #35

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    RV transformationsIn this example, we will find the pdf of a linear functionof a Gaussian RV. Suppose that X ∼ N(x̄ , σ2x) andY = g(X ) = aX + b, where a 6= 0 and b are any realconstants. Then

    The RV Y is Gaussian with mean ȳ = ax̄ + b andvariance σ2Y = a

    2σ2X . This example shows that a lineartransformation of a Gaussian RV results in a newGaussian RV.

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #36

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    RV transformations

    Suppose that we pass a Gaussian RV X ∼ N(0, σ2x)through the nonlinear function Y = g(X ) = X 3:

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #37

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Multiple Random VariablesFor example, if X and Y are RVs, then their distributionfunctions are defined as:FX (x) = P(X ≤ x); FY (y) = P(Y ≤ y)Now we define the probability that both X ≤ x andY ≤ y as the joint probability distribution function of Xand Y: FXY (x , y) = P(X ≤ x ,Y ≤ y).Properties:

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #38

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Statistical IndependenceTwo events are independent if the occurrence of oneevent has no effect on the probability of the occurrenceof the other event. RVs X andY are independent if theysatisfy the following relation:P(X ≤ x ,Y ≤ y) = P(X ≤ x)P(Y ≤ y)• The central limit theorem says that the sum of

    independent RVs tends toward a Gaussian RV,regardless of the pdf of the individual RVs thatcontribute to the sum. This is why so many RVs innature seem to have a Gaussian distribution.

    • For example, temperature on any given day in anygiven location follows a Gaussian distribution. Thisis because temperature is affected by clouds, wind,air pressure and other factors. Each of these factorsis determined by other random factors.

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #39

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Covariance

    Covariance between two scalar RVs X and Y can bedefined as:CXY = E [(X − X̄ )(Y − Ȳ )] = E [(XY )− X̄ Ȳ ]Correlation coefficient of two scalar RVs X and Y as:

    ρ =CXYσxσy

    The correlation coefficient is a normalized measurementof the independence between two RVs X and Y. If XandY are independent, then ρ = 0 (although theconverse is not necessarily true). If Y is a linear functionof X then ρ = 1.

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #40

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    CorrelationThe correlation between two scalar RVs X and Y can bedefined as RXY = E (XY )

    • Two RVs are said to be uncorrelated ifRXY = E (X )E (Y ).

    • From the definition of independence, if two RVs areindependent then they are also uncorrelated.Independence implies uncorrelatedness, butuncorrelatedness does not necessarily implyindependence.

    • Two RVs are said to be orthogonal if RXY = 0. Iftwo RVs are uncorrelated, then they are orthogonalonly if at least one of them is zero-mean. If two RVsare orthogonal, then they may or may not beuncorrelated

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #41

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Uncorrelated

    Example: Show that the outcome of two rolls of thedice are uncorrelated and not orthogonal.Solution: Let two rolls of the dice are represented bythe RVs X andY.Then Uncorrelated means E (XY ) = E (X )E (Y ) and notorthogonal means RXY 6= 0The two RVs are independent because one roll of the diedoes not have any effect on a second roll of the die.Each roll of the die has an equally likely probability (1/6)of being a 1, 2, 3, 4, 5, or 6.Therefore,E (X ) = E (Y ) = 1 + 2 + 3 + 4 + 5 + 6/6 = 3.5

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #42

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    There are 36 possible combinations of the two rolls ofthe die. We could get the combination (1,1), (1,2), andso on. Each of these 36 combinations have an equallylikely probability (1/36).Therefore,the correlation between X and Y isRXY = E (XY ) = 1/36

    ∑6i=1

    ∑6j=1 ij = 12.25 =

    E (X )E (Y ).Since E (XY ) = E (X )E (Y ), we see that X and Y areuncorrelated. However, RXY 6= 0, so X and Y are notorthogonal.

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #43

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Correlated and Not

    orthogonalA slot machine is rigged so you get 1 or -1 with equalprobability the first spin X, and the opposite number thesecond spinY. We have equal probabilities of obtaining(X, Y) outcomes of (1, -1) and ( -1, 1).Solution:The two RVs are dependent because the realization of Ydepends on the realization of X.We also see thatE (X ) = E (Y ) = 0E (XY ) = (1)(−1) + (−1)(1)/2 = −1We see that X andY are correlated becauseE (XY ) 6= E (X )E (Y ).We also see that X andY are not orthogonal becauseRXY 6= 0.

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #44

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Uncorrelated and

    Orthogonal

    A slot machine is rigged so you get -1, 0, or +1 withequal probability the first spin X. On the second spinYyou get 1 if X = 0, and 0 if X 6= 0.Solution:

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #45

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Suppose that x and y are independent RVs, and the RV zis computed as z = g(x) + h(y). In this example, we willcalculate the mean of z:

    As a special case of this example, we see that the meanof the sum of two independent RVs is equal to the sumof their means. That is, E (x + y) = E (x) + E (y) if xand y are independent.

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #46

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Suppose we roll a die twice. What is the expected valueof the sum of the two outcomes?Solution:We use X and Y to refer to the two rolls of the die, andwe use Z to refer to the sum of the two outcomes.Therefore, Z = X + Y . Since X and Y are independent,we have Each roll of the die has an equally likelyprobability (1/6) of being a 1, 2, 3, 4, 5, or 6. Therefore,E (X ) = E (Y ) = 1 + 2 + 3 + 4 + 5 + 6/6 = 3.5E (Z ) = E (X ) + E (Y ) = 3.5 + 3.5 = 7

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #47

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Stochastic ProcessA stochastic process, also called a random process, is avery simple generalization of the concept of an RV. Astochastic process X(t) is an RV X that changes withtime. A stochastic process can be one of four types.

    • If the RV at each time is continuous and time iscontinuous, then X (t) is a continuous randomprocess. For example, the temperature at eachmoment of the day is a continuous random processbecause both temperature and time are continuous.

    • If the RV at each time is discrete and time iscontinuous, then X (t) is a discrete random process.For example, the number of people in a givenbuilding at each moment of the day is a discreterandom process because the number of people is adiscrete variable and time is continuous.

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #48

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Stochastic Process

    • If the RV at each time is continuous and time isdiscrete, then X(t) is a continuous randomsequence. For example, the high temperature eachday is a continuous random sequence becausetemperature is continuous but time is discrete (dayone, day two, etc.).

    • If the RV at each time is discrete and time isdiscrete, then X(t) is a discrete random sequence.For example, the highest number of people in agiven building each day is a discrete randomsequence because the number of people is a discretevariable and time is also discrete.

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #49

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Stochastic Process

    Since a stochastic process is an RV that changes withtime, it has a distribution and density function that arefunctions of time.The PDF of X (t) is FX (x , t) = P(X (t) ≤ x)The pdf of X (t) is fX (x , t) =

    dFX (x ,t)dx

    If X (t) is a randomvector, then the derivative above is taken once withrespect to each element of x.The mean and covariance of X (t) are also functions oftime:

    • Mean: x̄(t) =∫∞−∞ xf (x , t)dx

    • Covariance:CX (t) = E [X (t)− bar(x)(t)][X (t)− bar(x)(t)]T =∫∞−∞[x − bar(x)(t)][x − bar(x)(t)]

    T f (x , t)dx

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #50

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Stochastic Process -

    Examples

    • The high temperature each day can be considered astochastic process. However, this process is notstationary. The high temperature on a day in Julymight be an RV with a mean of 100 degreesFahrenheit, but the high temperature on a day inDecember might have a mean of 30 degrees. This isa stochastic process whose statistics change withtime, so the process is not stationary.

    • The closing price of the stock market is an RVwhose mean generally increases with time.Therefore, the stock market price is a non-stationarystochastic process.

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #51

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Stochastic Process

    Suppose we have a stochastic process X (t). Furthersuppose that the process has a realization x(t). The timeaverage of X (t) is denoted asA[X (t)], and the timeautocorrelation of X (t) is denoted as R[X (t)]. Thesequantities are defined for continuous-time randomprocesses as

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #52

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Ergodic processAn ergodic process is a stationary random process forwhich

    A[X (t)] = E (x); R[X (t), τ ] = RX (τ)

    • In the real world, we are often limited to only a fewrealizations of a stochastic process.

    • For example, if we measure the fluctuation of avoltmeter reading, we are actually only measuringonly one realization of a stochastic process. We cancompute the time average, time autocorrelation, andother time-based statistics of the realization.

    • If the random process is ergodic, then we can usethose time averages to estimate the statistics of thestochastic process.

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #53

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    White noise and colored

    noiseIf the RV X (t1) is independent from the RV X (t2) for allt1 6= t2 then X (t) is called white noise. Otherwise, X(t)is called colored noise.The whiteness or color content of a stochastic processcan be characterized by its power spectrum. The powerspectrum SX (ω) of a wide-sense stationary stochasticprocess X (t) is defined as the Fourier transform of theautocorrelation. The autocorrelation is the inverseFourier transform of the power spectrum.

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #54

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    White and colored noiseThe power spectrum is sometimes referred to as thepower density spectrum, the power spectral density, orthe power density. The power of a wide-sense stationarystochastic process is defined as

    PX = 1/2π

    ∫ ∞−∞

    SX (ω)dω

    A discrete-time stochastic process X (t) is called whitenoise if

    where δk is the delta function.Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #55

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    White noise

    • Discrete-time white noise shows that it does nothave any correlation with itself except at the presenttime.

    • If X (k) is a discrete-time white noise process, thenthe RV X (n) is uncorrelated with X (m) unlessn = m. This shows that the power of adiscrete-time white noise process is equal at allfrequencies: SX (ω) = RX (0)∀ω ∈ [−π, π]

    • For a continuous-time random process, white noiseis defined similarly. White noise has equal power atall frequencies (like white light):SX (ω) = RX (0)∀ω

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #56

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Power Spectrum

    Suppose that a zero-mean stationary stochastic processhas the autocorrelation functionRX (t) = σ

    2e−β|t| where β is a positive real number.The power spectrum is computed as:

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #57

  • Lecture #3:ProbabilityTheory

    Basic concepts ofProbability

    Random Variables

    Transformationsof randomvariables

    Multiple RandomVariables

    4.1 StatisticalIndependence

    StochasticProcess

    White noise

    Power Spectrum

    The variance of the stochastic process is computed as

    Kalyana C. Veluvolu (#IT1 − 817) OPTIMAL CONTROL - ELEC732001 #58

    Basic concepts of ProbabilityRandom VariablesTransformations of random variablesMultiple Random Variables4.1 Statistical Independence

    Stochastic ProcessWhite noise