An Introduction to Artificial Intelligence Week 11

Course Name: An Introduction to Artificial Intelligence

Course Link: Click Here

These are An Introduction to Artificial Intelligence Answers Week 11


Q1. Which of the following statements are true?
a. A model-based learner learns the optimal policy given a model of the state space
b. A passive learner requires a policy to be fed to it
c. A strong simulator can jump to any part of the state space to begin a simulation
d. An active learner learns the optimal policy and also decides which action to take next

Answer: a, c, d


Q2. Suppose you are performing model-based passive learning according to a given policy. Following this policy, you have reached State A a total of 100 times. State A has 4 possible transitions to next states: A, B, C, and D. The policy stipulates that you take the action a at this state. Taking action a, you end up in state A 61 times, state B 22 times and state C 17 times. Assuming add-one smoothing, what is the value of T(A, a, B)?

Answer: 0.221


These are An Introduction to Artificial Intelligence Answers Week 11


Q3. For the next three questions, consider the following trajectories obtained by running some simulations in an unknown environment following a given policy. The state space is {A,B,C} and the action space is {a,b}. Assume discount factor is 0.5. Each sample is represented as (State, Action, Reward, Next state).
Run 1: (A, a, 0,B)
Run 2: (C, b, -1,A), (A, a, 0,B)
Run 3: (C, b, -1,B)
Run 4: (A, a, 0,B)
Run 5: (A, a, 0,C), (C, b, -1,B)
Using model-free passive learning, give an empirical estimate of VΠ(A).

Answer: -0.125


Q4. Assume that the above samples are fed sequentially to a Temporal Difference learner. Assume all values of states are initialised to 0 and alpha is kept constant at 0.5. What will be the learned value of VΠ(A)?

Answer: -0.19


These are An Introduction to Artificial Intelligence Answers Week 11


Q5. Assume that the above samples are fed to a Q-learner. What is the value of Q(A,a)? Assume that all Q-values are initialized as 0. The discount factor is 0.5 and the learning rate is also 0.5.

Answer: 0


Q6. Suppose we compute the optimal policy given the current Q-values. What is the action under optimal policy at state C?
Type a or b.

Answer: a


These are An Introduction to Artificial Intelligence Answers Week 11


Q7. Which of the following is correct regarding Boltzmann exploration?
a. It focuses on exploration initially, and more on exploitation as time passes
b. It is guaranteed to discover all reachable states from the start state, given infinite time
c. It leans more towards exploitation as temperature is increased
d. The probability of an action being chosen at a particular state varies exponentially with its Q-value at that point in time

Answer: a, b, d


Q8. Which of the following is required for the convergence of Q-learning to the optimal Q-values?
a. Policy used to generate episodes for learning should be optimal.
b. All states are visited infinitely often over infinitely many samples.
c. Suitable initialisation of Q-values before learning updates.
d. Very large (>>1) learning rate.

Answer: b. All states are visited infinitely often over infinitely many samples.


These are An Introduction to Artificial Intelligence Answers Week 11


Q9. Which of the following statements are correct?
a. If an agent does not perform sufficient exploration in the choice of actions in the environment, it runs the risk of never getting large rewards.
b. If the agent has perfect knowledge of the transition and reward model of the environment, exploration is not needed.
c. Degree of exploration should be increased as the learning algorithm performs more and more updates.
d. Exploration is not required in model-based RL algorithms.

Answer: a, b


These are An Introduction to Artificial Intelligence Answers Week 11


Q10. Which of the following statement(s) is/are correct for Model-based and Model-free reinforcement learning methods?
a. Model-based learning usually requires more parameters to be learnt.
b. Model-free learning can simulate new episodes from past experience.
c. Model-based methods are more sample efficient.
d. None of the above.

Answer: a. Model-based learning usually requires more parameters to be learnt.


These are An Introduction to Artificial Intelligence Answers Week 11



These are An Introduction to Artificial Intelligence Answers Week 11

More Solutions of An Introduction to Artificial Intelligence: Click Here

More NPTEL Solutions: https://progiez.com/nptel-assignment-answers/


These are An Introduction to Artificial Intelligence Answers Week 11
The content uploaded on this website is for reference purposes only. Please do it yourself first.
Home
Account
Cart
Search