Asked by Vince
We define an infinite-horizon discounted MDP in the following manner. There are three states x,y1,y2 and one action a . The MDP dynamics are independent of the action a as shown below:
At state x , with probability 1 the state transits to y1 , i.e.,
P(y1|x)=1.
Then at state y1 , we have
P(y1|y1)=p,P(y2|y1)=1−p,
which says there is probability p we stay in y1 and probability 1−p the state transits to y2 . Finally, state y2 is the absorbing state so that
P(y2|y2)=1.
The instant reward is set as 1 for starting in state y1 and 0 elsewhere:
R(y1,a,y1)=1,R(y1,a,y2)=1,,R(s,a,s′)=0 otherwise.
The discount factor is denoted by γ ( 0<γ<1 ).
1
2.0 points possible (graded, results hidden)
Define V∗(y1) as the optimal value function of the state y1 . Compute V∗(y1) via Bellman's Equation. (The answer is a formula in terms of γ,p ).
(Enter gamma for γ .)
V∗(y1)=
At state x , with probability 1 the state transits to y1 , i.e.,
P(y1|x)=1.
Then at state y1 , we have
P(y1|y1)=p,P(y2|y1)=1−p,
which says there is probability p we stay in y1 and probability 1−p the state transits to y2 . Finally, state y2 is the absorbing state so that
P(y2|y2)=1.
The instant reward is set as 1 for starting in state y1 and 0 elsewhere:
R(y1,a,y1)=1,R(y1,a,y2)=1,,R(s,a,s′)=0 otherwise.
The discount factor is denoted by γ ( 0<γ<1 ).
1
2.0 points possible (graded, results hidden)
Define V∗(y1) as the optimal value function of the state y1 . Compute V∗(y1) via Bellman's Equation. (The answer is a formula in terms of γ,p ).
(Enter gamma for γ .)
V∗(y1)=
Answers
Answered by
Writeacher
Is this a homework dump? Or a test dump?
Answered by
Vince
Review for final questions
Answered by
Writeacher
OK, be sure to have patience and wait for a math tutor to come online.
There are no AI answers yet. The ability to request AI answers is coming soon!
Submit Your Answer
We prioritize human answers over AI answers.
If you are human, and you can answer this question, please submit your answer.