物流管理课件、案例与文献MDP

资源描述

《物流管理课件、案例与文献MDP》由会员分享，可在线阅读，更多相关《物流管理课件、案例与文献MDP（36页珍藏版）》请在金锄头文库上搜索。

1、Brief introduction to Markov Decision Processes,Reference books,Markov decision processes, Martin Puterman, 1994. Introduction to stochastic dynamic programming, Sheldon Ross, 1983.,CONTENT,Five elements of MDP. Finite horizon Markov Decision Processes Infinite-horizon models Discounted Markov decis

2、ion problems The expected reward criterion Average Reward criterion Two research papers.,Model formulation,Decision epochs: Either a discrete set or a continuum; Either a finite or an infinite set. Corresponding problems: Discrete time problems or continuous problems; Finite horizon problems or infi

3、nite horizon problems.,State and action sets: At each decision epoch, the system occupies a state s. We denote the set of possible system states by S, sS. When observing the system in state s, the decision maker may choose action a of allowable actions in state s, a As.,The set of S and As can be ei

4、ther: arbitrary finite sets, arbitrary countably infinite sets, compact subsets of finite dimensional Euclidean space, or non-empty Borel subsets of complete, separable metric spaces.,Rewards and Transition probabilities: The decision maker receives a reward, rt(s,a), as a result of choosing action

5、a in state s at decision epoch t, The system state at the next decision epoch is determined by the probability distribution pt( |s,a). The expected value at decision epoch t is expressed as:,We refer the collection of objects T, S, As, pt( |s,a), rt(s,a) as a Markov decision processes. Markov is use

6、d because the transition probability and reward functions depend on the past only through the current stat of the system and the action selected by the decision maker in the state.,Decision rules,A decision rule prescribes a procedure for action selection in each state at a specified decision epoch.

7、 Markovian and deterministic decision rules dt: SAs; Deterministic and history dependent decision rule; Markovian and randomized decision rules; History dependent and randomized decision rules.,Under what conditions is it optimal to use deterministic Markovian decision rule at each stage?,Finite hor

8、izon Markov Decision Processes,(Existence) Assume S is finite or countable, and that As is finite for each sS, or As is compact, rt(s,a) is continuous in a for each sS, there exists an M for which | rt(s,a) |=M for all a As and sS and pt(s,a) is continuous in a for each j S and s S and t=1,2,N, or A

9、s is compact, rt(s,a) is upper semi-continuous in a for each sS, there exists an M for which | rt(s,a) | M for all a As and sS , and for each j S and s S , pt(s,a) is lower semi-continuous in a for each j S and s S and t=1,2,N. Then there exists a deterministic Markovian policy which is optimal.,Opt

10、imality equations:,With the boundary condition:,An example,A stock-Option model: 1.Let Sk denote the price of a given stock on the kth day (k=1) and suppose that where Xi is i.i.d with distribution F( having finite mean) and are also independent of S0, the initial price. 2. Suppose you own an option

11、 to buy one share of the stock at a fixed price, say c, and you have N days in which to exercise the option. You need never exercise it, but if you do at a time when the stocks price is s, then your profit is s-c. What strategy maximizes your expected profit?,Decision epochs? States and action? Rewa

12、rds and transition probability?,The optimal equation,With the boundary condition,Let Vn(s) denote the maximal expected profit when the stocks price is s and the option has n additional days to run. Then Vn satisfies the optimality equation:,Structure property of the optimal policy,Lemma: Vn(s)-s is

13、decreasing in s and Vn(s) is increasing in s and n. Theorem: The optimal policy has the following form: There are increasing numbers such that if there are n days to go and the present price is s, then one should exercise the option if and only if s sn.,Another example,The optimality of (s,S) policy

14、. Scarfs paper.,Infinite-horizon models,Discounted Markov decision problems. (Existence) Suppose for each v V and s S, there exists an asv As, such that Then there exists a deterministic stationary optimal policy. Further, if d*(s)=as* where then this policy is optimal.,Algorithm to obtain the value

15、 function,Value iteration; Policy iteration; Linear programming.,Example,Qing Li, Shaohui Zheng. Joint inventory replenishment and pricing control for systems with uncertain yield and demand. Operations Research Vol. 54, No. 4, 696705.2006.,The expected reward criterion Positive bounded models; (ass

16、umptions) a. v+(s) - for all s S.,Positive bounded models (Existence) Suppose S is countable, r(s,a) 0 for all a As and s S, then there exists a deterministic stationary optimal policy.,Negative models. We call a MDP negative if r(s,a)0 for all a As and s S. (Existence) Assume S is discrete (finite or countable), and either As is finite for each sS, As is compact, r(s,a) is continuous in a for each sS, and

展开阅读全文