Asked by Vince

We define an infinite-horizon discounted MDP in the following manner. There are three states x,y1,y2 and one action a . The MDP dynamics are independent of the action a as shown below:

At state x , with probability 1 the state transits to y1 , i.e.,

P(y1|x)=1.

Then at state y1 , we have

P(y1|y1)=p,P(y2|y1)=1−p,

which says there is probability p we stay in y1 and probability 1−p the state transits to y2 . Finally, state y2 is the absorbing state so that

P(y2|y2)=1.

The instant reward is set as 1 for starting in state y1 and 0 elsewhere:

R(y1,a,y1)=1,R(y1,a,y2)=1,,R(s,a,s′)=0 otherwise.

The discount factor is denoted by γ ( 0<γ<1 ).

1
2.0 points possible (graded, results hidden)
Define V∗(y1) as the optimal value function of the state y1 . Compute V∗(y1) via Bellman's Equation. (The answer is a formula in terms of γ,p ).

(Enter gamma for γ .)

V∗(y1)=

5 years ago

Answered by Writeacher

Is this a homework dump? Or a test dump?

5 years ago

Answered by Vince

Review for final questions

5 years ago

Answered by Writeacher

OK, be sure to have patience and wait for a math tutor to come online.

5 years ago

There are no AI answers yet. The ability to request AI answers is coming soon!

Submit Your Answer

We prioritize human answers over AI answers.

If you are human, and you can answer this question, please submit your answer.

Asked by Vince

Answers

Submit Your Answer

Related Questions