# Introduction to Machine Learning | Week 12

Session: JAN-APR 2024

Course name: Introduction to Machine Learning

#### Q1. Statement 1: Empirical error is always greater than generalisation error.Statement 2: Training data and test data have different underlying(true) distributions.Choose the correct option:Statement 1 is true. Statement 2 is true. Statement 2 is the correct reason for statemnet 1.Statement 1 is true. Statement 2 is true. Statement 2 is not the correct reason for statemnet 1.Statement 1 is true. Statement 2 is false.Both statements are false.

Q2. Let P(Ai)=2−i. Calculate the upper bound for P(⋃5i=1Ai) using union bound (rounded to 3 decimal places).
0.937
0.984
0.969
1

These are Introduction to Machine Learning Week 12 Assignment 12 Answers

Q3. Which of the following is/are the shortcomings of TD Learning that Q-learning resolves?
TD learning cannot provide values for (state, action) pairs, limiting the ability to extract an optimal policy directly
TD learning requires knowledge of the reward and transition functions, which is not always available
TD learning is computationally expensive and slow compared to Q-learning
TD learning often suffers from high variance in value estimation, leading to unstable learning
TD learning cannot handle environments with continuous state and action spaces effectively

Q4. Given 100 hypothesis functions, each trained with 10^6 samples, what is the lower bound on the probability that there does not exist a hypothesis function with error greater than 0.1?
1 − 200e^−2⋅10^4
1 − 100e^10^4
1 − 200e^10^2
1 − 200e^−2⋅10^2

These are Introduction to Machine Learning Week 12 Assignment 12 Answers

Q5. The VC dimension of a pair of squares is:
3
4
5
6

Q6. What is V(X4) after one application of the given formula?
1
0.9
0.81
0

These are Introduction to Machine Learning Week 12 Assignment 12 Answers

Q7. What is V(X1) after one application of given formula?
-1
-0.9
-0.81
0

Q8. What is V(X1) after V converges?
0.54
-0.9
0.63
0

These are Introduction to Machine Learning Week 12 Assignment 12 Answers

Q9. The behavior of an agent is called a policy. Formally, a policy is a mapping from states to actions. In our case, we have two actions: left and right. We will denote the action for our policy as A.
Clearly, the optimal policy would be to choose action right in every state. Which of the following can we use to mathematically describe our optimal policy using the learnt V?
For options (c) and (d), T is the transition function defined as: T(state, action) = next state. (more than one options may apply)

A={LeftRightifV(SL)>V(SR)otherwise
A={LeftRightifV(SR)>V(SL)otherwise
A=argmaxa({V(T(S,a))})
A=argmina({V(T(S,a))})

Q10. In games like Chess or Ludo, the transition function is known to us. But what about Counter Strike or Mortal Combat or Super Mario? In games where we do not know T, we can only query the game simulator with current state and action, and it returns the next state. This means we cannot directly argmax or argmin for V(T(S,a)). Therefore, learning the value function V is not sufficient to construct a policy. Which of these could we do to overcome this? (more than 1 may apply)

Assume there exists a method to do each option. You have to judge whether doing it solves the stated problem.
Directly learn the policy
Learn a different function which stores value for state-action pairs (instead of only state like V does)
Learn T along with V
Run a random agent repeatedly till it wins. Use this as the winning policy

These are Introduction to Machine Learning Week 12 Assignment 12 Answers

Session: JULY-DEC 2023

Course Name: Introduction to Machine Learning

#### These are Introduction to Machine Learning Week 12 Assignment 12 Answers

Q1. You want to make an RL agent for a game where 2 players compete to win (like Chess and Go). Which among the given would be the best approach for this?
Play against best human players
Iteratively play against the best (fixed) version of itself
Play against a supervised agent trained on demonstrations of best human players
Watch thousands of games being played and learn the patterns in an unsupervised manner

Answer: Iteratively play against the best (fixed) version of itself

Q2. Statement 1: Empirical error is always greater than generalisation error.
Statement 2: Training data and test data have different underlying(true) distributions.
Choose the correct option:

Statement 1 is true. Statement 2 is true. Statement 2 is the correct reason for statement 1.
Statement 1 is true. Statement 2 is true. Statement 2 is not the correct reason for statement 1.
Statement 1 is true. Statement 2 is false.
Both statements are false.

These are Introduction to Machine Learning Week 12 Assignment 12 Answers

Q3. The Chernoff-Hoeffding bound for a classifier h indicates how close the empirical error is to the generalized error as a function of the number of samples in the data set.
P(|ε(h)−ε^(h)|>γ)≤2e−2γ2m
You test it and find that increasing the number of samples does not give a more accurate estimate. What could be the problem?

Choice of γ is unsuitable
The mean is not from Bernoulli distribution.
Choice of hypothesis function is wrong
Samples are not i.i.d

These are Introduction to Machine Learning Week 12 Assignment 12 Answers

Q4. Let P(Ai)=2−i. Calculate the upper bound for P(⋃5i=1Ai) using union bound (rounded to 3 decimal places).
0.937
0.984
0.969
1

These are Introduction to Machine Learning Week 12 Assignment 12 Answers

Q5. Statement A: Reinforcement learning is a type of unsupervised learning.
Statement B: Reinforcement learning does not have labels.

Both statements are true. Statement B is the correct explanation for statement A.
Both statements are true. Statement B is NOT the correct explanation for statement A.
Statement A is true. Statement B is false.
Statement A is false. Statement B is true.
Both statements are false.

Answer: Statement A is false. Statement B is true.

These are Introduction to Machine Learning Week 12 Assignment 12 Answers

Q6. What is a policy in reinforcement learning?
A mapping from states to actions
A mapping from states to rewards
A mapping from actions to rewards
A mapping from actions to next state

Answer: A mapping from states to actions

These are Introduction to Machine Learning Week 12 Assignment 12 Answers

Q7. Given 100 hypothesis functions, each trained with 106 samples, what is the lower bound on the probability that there does not exist a hypothesis function with error greater than 0.1?
1−200e−2⋅104
1−100e104
1−200e102
1−200e−2⋅102

These are Introduction to Machine Learning Week 12 Assignment 12 Answers

Session: JAN-APR 2023

Course Name: Introduction to Machine Learning

#### Q1. Which of the following measure best analyze the performance of a classifier?a. Precisionb. Recallc. Accuracyd. Time complexitye. Depends on the application

Answer: e. Depends on the application

Q2. As discussed in the lecture, most of the classifiers minimize the empirical risk. Which among the following is an exceptional case?
a. Perceptron learning algorithm
b. Artificial Neural Network
c. Support Vector Machines
d. both (a) and (b)
e. None of the above

These are Introduction to Machine Learning Week 12 Assignment 12 Answers

Q3. What do you expect to happen to the variance component of the generalisation error of your model as the size of the training data set increases?
a. Increase in variance
b. Decrease in variance
c. No change in variance error

Q4. After completing Introduction to Machine Learning on NPTEL, you have landed a job as a Data Scientist at YumEll Solutions Inc. Your first assignment as a trainee is to learn a classifier given some data and present insights on it to your manager, who apparently doesn’t seem to have any knowledge on Machine Learning. Which of the following classification models would you pick to best explain the nature of the data and the underlying distribution to your manager?
a. Linear Models
b. Support Vector Machines
c. Decision Trees
d. Artificial Neural Networks

These are Introduction to Machine Learning Week 12 Assignment 12 Answers

Q5. What happens when your model complexity (such as interaction terms in linear regression, order of polynomial in SVM, etc.) increases?
a. Model Bias increases
b. Model Bias decreases
c. Variance of the model increases
d. Variance of the model decreases

These are Introduction to Machine Learning Week 12 Assignment 12 Answers

Q6. Suppose we want an RL agent to learn to play the game of golf. For training purposes, we make use of a golf simulator program. Assume that the original reward distribution gives a reward of +10 when the golf ball is hit into the hole and -1 for all other transitions. To aid the agent’s learning process, we propose to give an additional reward of +3 whenever the ball is within a 1 metre radius of the hole. Is this additional reward a good idea or not? Why?

a. Yes. The additional reward will help speed-up learning.
b. Yes. Getting the ball to within a metre of the hole is like a sub-goal and hence, should be rewarded.
c. No. The additional reward may actually hinder learning.
d. No. It violates the idea that a goal must be outside the agent’s direct control.

These are Introduction to Machine Learning Week 12 Assignment 12 Answers

Q7. You want to toss a fair coin a number of times and obtain the probability of getting heads by taking a simple average. What is the estimated number of times you’ll have to toss the coin to make sure that your estimated probability is within 10% of the actual probability, at least 90% of the time?
a. 400*ln(20)
b. 800ln(20)
c. 200*ln(20)

These are Introduction to Machine Learning Week 12 Assignment 12 Answers

Q8. A new phone, E-Corp X1 has been announced and it is what you’ve been waiting for, all along. You decide to read the reviews before buying it. From past experiences, you’ve figured out that good reviews mean that the product is good 90% of the time and bad reviews mean that it is bad 70% of the time. Upon glancing through the reviews section, you find out that the X1 has been reviewed 1269 times and only 127 of them were bad reviews. What is the probability that, if you order the X1, it is a bad phone?
a. 0.1362
b. 0.160
c. 0.840
d. 0.773

These are Introduction to Machine Learning Week 12 Assignment 12 Answers

Q9. You face a particularly challenging RL problem, where the reward distribution keeps changing with time. In order to gain maximum reward in this scenario, does it make sense to stop exploration or continue exploration?
a. Stop exploration
b. Continue exploration

These are Introduction to Machine Learning Week 12 Assignment 12 Answers