Markov Chains
By: Han Wang • Essay • 5,254 Words • August 24, 2014 • 670 Views
Markov Chains
1 Discrete time Markov chains
Example: A drunk is walking home from the pub. There are n lampposts between the pub and his home, at each of which he stops to steady himself. After every such stop, he may change his mind about whether to walk home or turn back towards the pub, indepedent of all his previous decisions. He moves homeward with probability p and pubward with probability 1 − p, stopping when he reaches either. How do you describe his trajectory?
Let us look at the times that he reaches either a lamppost or the pub or home. The lampposts are numbered 1 through n, the pub is at location 0 and his home is denoted n + 1. At each time t = 0,1,2,..., he may be at any of these locations, and we’ll let Xt denote his location at time t. We want to know P(Xt+1 = x|X0,...,Xt) for all x and t.
From the description above, it should be clear that, conditional on the drunk’s trajectory up to time t, his position at the next time only depends on Xt, i.e.,
P(Xt+1 = x|X0,...,Xt) = P(Xt+1 = x|Xt). (1) This is called the Markov property, and a process having this property is
called a Markov process or Markov chain. By repeatedly using (1), we get P(Xt+1 = x1,...,Xt+n = xn|X0,...,Xt)
= P(Xt+1 = x1,...,Xt+n = xn|Xt), (2)
for all t and n, and all possible states X0, . . . , Xt+n. In words, it says that the future evolution of the process is conditionally independent of the past given the present.
Formally, a discrete-time Markov chain on a state space S is a process Xt, t = 0,1,2,... taking values in S which satisfies (1) or, equivalently, (2).
1
In all the examples we see in this course the state space S will be discrete (usually finite, occasionally countable). There is a theory of Markov chains for general state spaces as well, but it is outside our scope.
Other examples
- A toy model of the weather with, e.g., 3 states, Sunny, Cloudy, Rainy, and transition probabilities between them.
- A model of language with transition probabilities for, say, successive letters in a word, or successive words in a sentence. A 1-step Markov model may be simplistic for this. Suppose that, instead of having memory 1, a process has some fixed, finite memory. Can it still be modelled as a Markov chain?
- The Ethernet protocol: A computer that has a packet to transmit over the local area network starts in back-off stage 0 and attempts to trans- mit it. Every time it fails (because the packet collides with another computer that is trying to transmit at the same time), it increments its back-off counter. The back-off counter tells it how long to wait be- fore attempting again. Once the packet is successfully transmitted, the back-off counter is reset to 0. In this example, the choice of state is im- portant. If you choose the state as the value of the back-off counter in each time step, then it is not Markovian (because its future evolution depends on how long that back-off counter has had its present value, not just what this value is). However, if you consider the “embedded chain” of the counter values just after the back-off counter changes, then this chain is Markovian.
What does it take to fully describe a Markov chain? Clearly, it suffices to describe all the conditional probabilities in (2), for all t and n and all possible combinations of states. In fact, it suffices to just specify the one step transition probabilities P(Xt+1 = y|Xt = x) for all t, and all x,y ∈ S. Why is this sufficient?
First note that we can represent all the one step transition probabilities in the form of a matrix P(t) with entries Pxy(t) = P(Xt+1 = y|Xt = x). From this, we can compute
P(Xt+2 = z,Xt+1 = y|Xt = x) = P(Xt+1 = y|Xt = x)P(Xt+2 = z|Xt+1 = y,Xt = x) = P(Xt+1 = y|Xt = x)P(Xt+2 = z|Xt+1 = y) = Pxy(t)Pyz(t + 1),
2
and so on.
Thus, to describe a Markov process, it suffices to specify its initial distri- bution μ on S (which may be unit mass on a single state on S), and all the one step transition probability matrices P (t), t = 0, 1, 2, . . .. We will typically be interested in the case in which P (t) = P for all t. In this case, P(Xt+s =y|Xs =x)isthesameasP(Xt =y|X0 =x)foranys,t,xand y. Such a Markov chain is called time homogeneous.