random walk presented by changqing li mathematics probability statistics

18
Random walk Presented by Changqing Li Mathematics Probability Statistics

Upload: amberlynn-mosley

Post on 17-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Random walk Presented by Changqing Li Mathematics Probability Statistics

Random walk

Presented by Changqing Li

Mathematics•Probability•Statistics

Page 2: Random walk Presented by Changqing Li Mathematics Probability Statistics

What is a Random Walk?

An Intuitive understanding: A series of movement which direction and size are randomly decided (e.g., the path a drunk person left behind).

Formal Definition: Let a fixed vector in the d-dimensional Euclidean space and a sequence of independent, identically distributed (i.i.d.) real-valued random variables in . The discrete-time stochastic process defined by

is called a d-dimensional random walk nn XXXS 10

0XdR 1, nX n

dR

1: nSS n

Page 3: Random walk Presented by Changqing Li Mathematics Probability Statistics

Why Random Walks?

A random walk (RW) is a useful model in understanding stochastic processes across a variety of scientific disciplines.

Random walk theory supplies the basic probability theory behind BLAST ( the most widely used sequence alignment theory).

Page 4: Random walk Presented by Changqing Li Mathematics Probability Statistics

Definitions (cont.)

If and RVs take values in , then is called d-dimensional lattice random walk.

In the lattice walk case, if we only allow the jump from to where or , then the process is called d-dimensional sample random walk.

0X nXdI

1, nSn

),...,( 1 dxxX ),...,( 11 ddxxY 1

1k 1

Page 5: Random walk Presented by Changqing Li Mathematics Probability Statistics

Definitions (cont.)

A random walk is defined as restricted walk if the walk is limited to the interval [a, b].

The endpoints a and b are called absorbing barriers if the random walk eventually stays there forever;

or reflecting barriers if the walk reaches the endpoint and bounces back.

Page 6: Random walk Presented by Changqing Li Mathematics Probability Statistics

Example: DNA sequence alignment modeled as RW

| | | ||| || |||

ggagactgtagacagctaatgctataGaacgccctagccacgagcccttatc

Simple scoring schemes:at a position: +1, same nucleotides -1, different nucleotides

*

Page 7: Random walk Presented by Changqing Li Mathematics Probability Statistics

Example: simple RW

Ladder Point (LP):the point in the walk lower than any previously reached points.

Excursion: the part of the walk from a LP until the highest point attained before the next LP.

Excursions in Fig: 1, 1, 4, 0, 0, 0, 3;

BLAST theory focused on the maximum heights achieved by these excursions.

Ladder point

Page 8: Random walk Presented by Changqing Li Mathematics Probability Statistics

Example : General RW

Consider arbitrary scoring scheme (e.g. substitution matrix)

Page 9: Random walk Presented by Changqing Li Mathematics Probability Statistics

General Walk

Suppose generally the possible step sizes are, and their respective probabilities are, The mean of step size is negative, i.e.,

The mgf of S(step size) is,

ddcc ,1,...,0,...,1, dcc ppp ,...,1,

0)(

d

cjjjpSE

d

cj

jjepm )(

Page 10: Random walk Presented by Changqing Li Mathematics Probability Statistics

General Walk

There exists unique positive , such that,

To consider the walk that start at 0, with stopping boundary at -1 and without upper boundary, impose an artificial barrier at

The possible stopping points can be,

And Wald’s Identity states, where, is the total displacement when

the walk stops.

*

1*

d

cj

jjep

0y

.1,...,,...,1, dyycc

1)(*

NTeE NT

Page 11: Random walk Presented by Changqing Li Mathematics Probability Statistics

General Walk

Thus,

Where, is the probability that the walk finishes at the point k.

The mean of number of steps until the walk stops or would be

111

**

dy

yk

kk

ck

kk ePeP

kP

A 0m

d

cj j

c

j jN

jp

jR

SE

TEA 1

)(

)(

Page 12: Random walk Presented by Changqing Li Mathematics Probability Statistics

Random Walks in real life!

In Supernova stars – how “star stuff” gets to be inside us (eventually!)

Page 13: Random walk Presented by Changqing Li Mathematics Probability Statistics

Random Walks (in your body)

Cells inside your bodyHow two liquids (and air!) mix together! (Osmosis)

Page 14: Random walk Presented by Changqing Li Mathematics Probability Statistics

Random Walks and $$ (Wall Street)

Stock Market – predicting the price /cost of a stock in the future

Page 15: Random walk Presented by Changqing Li Mathematics Probability Statistics

Application: BLAST

BLAST is the most frequently used method for assessing which DNA or protein sequences in a large database have significant similarity to a given query sequence;

a procedure that searches for high-scoring local alignments between sequences and then tests for significance of the scores found via P-value.

The null hypothesis to be test is that for each aligned pair of animo acids, the two amino acids were generated by independent mechanism.

Page 16: Random walk Presented by Changqing Li Mathematics Probability Statistics

BLAST : modeling

The positions in the alignment are numbered from left to right as 1, 2,…, N. A score S(j, k) is allocated to each position where the aligned amino acid pair (j,k) is observed, where S(j,k) is the (j,k) element in the substitution matrix chosen.

An accumulated score at position i is calculated as the sum of the scores for the various amino acid comparison at position 1, 2,…,i. As i increases, the accumulated score undergoes a random walk.

Page 17: Random walk Presented by Changqing Li Mathematics Probability Statistics

BLAST : calculating parameters

Let Y1, Y2,… be the respective maximum heights of the excursions of this walk after leaving one ladder point and before arriving the next, and let Ymax be the maximum of these maxima. It is in effect the test statistic used in BLAST. So it is necessary to find its null hypothesis distribution.

The asymptotic probability distribution of any Yi is shown to be the geometric-like distribution. The values of C and in this distribution depend on the substitution matrix used and the amino acid frequencies {pj} and {pj’}. The probability distribution of Ymax also depends on n, the mean number of ladder points in the walk.

*

Page 18: Random walk Presented by Changqing Li Mathematics Probability Statistics

Reference

http://mathworld.wolfram.com/RandomWalk2-Dimensional.html

http://mathworld.wolfram.com/Borel-TannerDistribution.html

http://www.bioss.ac.uk/~dirk/talks/tutorial_Blast.pdf#page=5&zoom=auto,53,792

http://www.jstor.org/discover/10.2307/27851819?uid=3739840&uid=2129&uid=2&uid=70&uid=4&uid=3739256&sid=21102991585977