# An Introduction to Artificial Intelligence Week 10

Course Name: An Introduction to Artificial Intelligence

#### These are An Introduction to Artificial Intelligence Answers Week 10

For Question 1 – 3
Ram has the opportunity to make one of 2 bets (say A,B) or invest equally in both bets or make no bets each of which is based on the outcome of a cricket match. The payoffs to Ram on winning/losing each of the bets are as described in the table below:

#### Q1. If Ram employs minimax regret to decide in this situation, what action does he take?a. Makes bet Ab. Makes bet Bc. Invest equally in A and Bd. Makes no bet

Q2. If Ram employs the Hurwicz criterion to decide, for which of the following values of the coefficient of realism does Ram choose to not make a bet?
a. 0.2
b. 0.5
c. 0.7
d. 0.4

These are An Introduction to Artificial Intelligence Answers Week 10

Q3. Assume that an insider tells Ram that he can tell Ram beforehand whether Ram will win or lose a bet. Also assume that all bets have an equal likelihood of success and failure. What is the maximum amount of money Ram should be willing to pay the agent for this information?

Q4. For an MDP of discrete finite state space S and discrete finite action space, what is the memory size of the transition function, in the most general case?
a. O(|S|^2)
b. O(|S||A|)
c. O(|S|^2|A|)
d. O(|S||A|^2)

These are An Introduction to Artificial Intelligence Answers Week 10

For Question 5 – 7 :
Consider the MDP given below for a robot trying to walk.

The MDP has three states: S={Standing,Moving,Fallen} and two actions: moving the robot legs slowly (a) and moving the robot legs aggressively (b), denoted by the colour black and green respectively. The task is to perform policy iteration for the above MDP with discount factor 1.

Q5. We start with a policy 𝜋(s) = a for all s in S and V 𝜋 (s) = 0 for all s. What is the value of the Fallen state after one iteration of bellman update during policy evaluation?

Q6. Suppose we perform the policy improvement step just after one iteration of bellman update as in Q5, what is the updated policy. Write in the order of actions for Standing, Moving and Fallen.
Example, if the policy is 𝜋(Standing) = b, 𝜋(Moving) = b, 𝜋(Fallen) = a, write the answer as bba.

These are An Introduction to Artificial Intelligence Answers Week 10

Q7. After one iteration of policy evaluation as in Q5, what is the value of Q(state,action) where state = Moving and action = b?

Q8. If the utility curve of an agent varies as m^2 for money m, then the agent is:
a. Risk-prone
b. Risk-averse
c. Risk-neutral
d. Can be any of these

These are An Introduction to Artificial Intelligence Answers Week 10

Q9. Which of the following statements are true regarding Markov Decision Processes (MDPs)?
a. Discount factor is not useful for finite horizon MDPs.
b. We assume that the reward and cost models are independent of the previous state transition history, given the current state.
c. MDPs assume full observability of the environment
d. Goal states may have transitions to other states in the MDP

Q10. Which of the following are true regarding value and policy iteration?
a. Value iteration is guaranteed to converge in a finite number of steps for any value of epsilon and any MDP, if the MDP has a fixed point.
b. The convergence of policy iteration is dependent on the initial policy.
c. Value iteration is generally expected to converge in a lesser number of iterations as compared to policy iteration.
d. In each iteration of policy iteration, value iteration is run as a subroutine, using a fixed policy