# Introduction to Machine Learning | Week 12

**Session: JAN-APR 2024**

**Course name: Introduction to Machine Learning**

**Course Link: Click Here**

**For answers or latest updates join our telegram channel: Click here to join **

#### These are Introduction to Machine Learning Week 12 Assignment 12 Answers

#### Q1. Statement 1: Empirical error is always greater than generalisation error.

Statement 2: Training data and test data have different underlying(true) distributions.

Choose the correct option:

Statement 1 is true. Statement 2 is true. Statement 2 is the correct reason for statemnet 1.

Statement 1 is true. Statement 2 is true. Statement 2 is not the correct reason for statemnet 1.

Statement 1 is true. Statement 2 is false.

Both statements are false.

**Answer: Both statements are false.**

**Q2. Let P(Ai)=2−i. Calculate the upper bound for P(⋃5i=1Ai) using union bound (rounded to 3 decimal places).**

0.937

0.984

0.969

1

**Answer: 0.969**

**For answers or latest updates join our telegram channel: Click here to join **

**These are Introduction to Machine Learning Week 12 Assignment 12 Answers**

**Q3. Which of the following is/are the shortcomings of TD Learning that Q-learning resolves?**

TD learning cannot provide values for (state, action) pairs, limiting the ability to extract an optimal policy directly

TD learning requires knowledge of the reward and transition functions, which is not always available

TD learning is computationally expensive and slow compared to Q-learning

TD learning often suffers from high variance in value estimation, leading to unstable learning

TD learning cannot handle environments with continuous state and action spaces effectively

**Answer: a, d**

**Q4. Given 100 hypothesis functions, each trained with 10^6 samples, what is the lower bound on the probability that there does not exist a hypothesis function with error greater than 0.1?**

1 − 200e^−2⋅10^4

1 − 100e^10^4

1 − 200e^10^2

1 − 200e^−2⋅10^2

**Answer: a) 1 − 200e^−2⋅10^4**

**For answers or latest updates join our telegram channel: Click here to join **

**These are Introduction to Machine Learning Week 12 Assignment 12 Answers**

**Q5. The VC dimension of a pair of squares is:**

3

4

5

6

**Answer: 3**

**Q6. What is V(X4) after one application of the given formula?**

1

0.9

0.81

0

**Answer: 0.9**

**For answers or latest updates join our telegram channel: Click here to join **

**These are Introduction to Machine Learning Week 12 Assignment 12 Answers**

**Q7. What is V(X1) after one application of given formula?**

-1

-0.9

-0.81

0

**Answer: 0**

**Q8. What is V(X1) after V converges?**

0.54

-0.9

0.63

0

**Answer: 0**

**For answers or latest updates join our telegram channel: Click here to join **

**These are Introduction to Machine Learning Week 12 Assignment 12 Answers**

**Q9. The behavior of an agent is called a policy. Formally, a policy is a mapping from states to actions. In our case, we have two actions: left and right. We will denote the action for our policy as A.Clearly, the optimal policy would be to choose action right in every state. Which of the following can we use to mathematically describe our optimal policy using the learnt V?For options (c) and (d), T is the transition function defined as: T(state, action) = next state. (more than one options may apply)**

A={LeftRightifV(SL)>V(SR)otherwise

A={LeftRightifV(SR)>V(SL)otherwise

A=argmaxa({V(T(S,a))})

A=argmina({V(T(S,a))})

**Answer: c) A=argmaxa({V(T(S,a))})**

**Q10. In games like Chess or Ludo, the transition function is known to us. But what about Counter Strike or Mortal Combat or Super Mario? In games where we do not know T, we can only query the game simulator with current state and action, and it returns the next state. This means we cannot directly argmax or argmin for V(T(S,a)). Therefore, learning the value function V is not sufficient to construct a policy. Which of these could we do to overcome this? (more than 1 may apply)**

**Assume there exists a method to do each option. You have to judge whether doing it solves the stated problem.**

Directly learn the policy

Learn a different function which stores value for state-action pairs (instead of only state like V does)

Learn T along with V

Run a random agent repeatedly till it wins. Use this as the winning policy

**Answer: a, b**

**For answers or latest updates join our telegram channel: Click here to join **

**These are Introduction to Machine Learning Week 12 Assignment 12 Answers**

More Weeks of Introduction to Machine Learning: Click here

More Nptel Courses: https://progiez.com/nptel-assignment-answers

**Session: JULY-DEC 2023**

**Course Name: Introduction to Machine Learning**

**Course Link: Click Here**

**These are Introduction to Machine Learning Week 12 Assignment 12 Answers**

**Q1. You want to make an RL agent for a game where 2 players compete to win (like Chess and Go). Which among the given would be the best approach for this?**

Play against best human players

Iteratively play against the best (fixed) version of itself

Play against a supervised agent trained on demonstrations of best human players

Watch thousands of games being played and learn the patterns in an unsupervised manner

**Answer: Iteratively play against the best (fixed) version of itself**

**Q2. Statement 1: Empirical error is always greater than generalisation error.Statement 2: Training data and test data have different underlying(true) distributions.Choose the correct option:**

Statement 1 is true. Statement 2 is true. Statement 2 is the correct reason for statement 1.

Statement 1 is true. Statement 2 is true. Statement 2 is not the correct reason for statement 1.

Statement 1 is true. Statement 2 is false.

Both statements are false.

**Answer: Both statements are false.**

**These are Introduction to Machine Learning Week 12 Assignment 12 Answers**

**Q3. The Chernoff-Hoeffding bound for a classifier h indicates how close the empirical error is to the generalized error as a function of the number of samples in the data set. P(|ε(h)−ε^(h)|>γ)≤2e−2γ2mYou test it and find that increasing the number of samples does not give a more accurate estimate. What could be the problem?**

Choice of γ is unsuitable

The mean is not from Bernoulli distribution.

Choice of hypothesis function is wrong

Samples are not i.i.d

**Answer: C, D**

**These are Introduction to Machine Learning Week 12 Assignment 12 Answers**

**Q4. Let P(Ai)=2−i. Calculate the upper bound for P(⋃5i=1Ai) using union bound (rounded to 3 decimal places).**

0.937

0.984

0.969

1

**Answer: 0.969**

**These are Introduction to Machine Learning Week 12 Assignment 12 Answers**

**Q5. Statement A: Reinforcement learning is a type of unsupervised learning.Statement B: Reinforcement learning does not have labels.**

Both statements are true. Statement B is the correct explanation for statement A.

Both statements are true. Statement B is NOT the correct explanation for statement A.

Statement A is true. Statement B is false.

Statement A is false. Statement B is true.

Both statements are false.

**Answer: Statement A is false. Statement B is true.**

**These are Introduction to Machine Learning Week 12 Assignment 12 Answers**

**Q6. What is a policy in reinforcement learning?**

A mapping from states to actions

A mapping from states to rewards

A mapping from actions to rewards

A mapping from actions to next state

**Answer: A mapping from states to actions**

**These are Introduction to Machine Learning Week 12 Assignment 12 Answers**

**Q7. Given 100 hypothesis functions, each trained with 106 samples, what is the lower bound on the probability that there does not exist a hypothesis function with error greater than 0.1?**

1−200e−2⋅104

1−100e104

1−200e102

1−200e−2⋅102

**Answer: d. 1−200e−2⋅102**

**These are Introduction to Machine Learning Week 12 Assignment 12 Answers**

More Weeks of INTRODUCTION TO MACHINE LEARNING: Click here

More Nptel Courses: Click here

**Session: JAN-APR 2023**

**Course Name: Introduction to Machine Learning**

**Course Link: Click Here**

**These are Introduction to Machine Learning Week 12 Assignment 12 Answers**

**Q1. Which of the following measure best analyze the performance of a classifier?**

a. Precision

b. Recall

c. Accuracy

d. Time complexity

e. Depends on the application

**Answer: e. Depends on the application**

**Q2. As discussed in the lecture, most of the classifiers minimize the empirical risk. Which among the following is an exceptional case?**

a. Perceptron learning algorithm

b. Artificial Neural Network

c. Support Vector Machines

d. both (a) and (b)

e. None of the above

**Answer: c. Support Vector Machines**

**These are Introduction to Machine Learning Week 12 Assignment 12 Answers**

**Q3. What do you expect to happen to the variance component of the generalisation error of your model as the size of the training data set increases?**

a. Increase in variance

b. Decrease in variance

c. No change in variance error

**Answer: b. Decrease in variance**

**Q4. After completing Introduction to Machine Learning on NPTEL, you have landed a job as a Data Scientist at YumEll Solutions Inc. Your first assignment as a trainee is to learn a classifier given some data and present insights on it to your manager, who apparently doesn’t seem to have any knowledge on Machine Learning. Which of the following classification models would you pick to best explain the nature of the data and the underlying distribution to your manager?**

a. Linear Models

b. Support Vector Machines

c. Decision Trees

d. Artificial Neural Networks

**Answer: c. Decision Trees**

**These are Introduction to Machine Learning Week 12 Assignment 12 Answers**

**Q5. What happens when your model complexity (such as interaction terms in linear regression, order of polynomial in SVM, etc.) increases?**

a. Model Bias increases

b. Model Bias decreases

c. Variance of the model increases

d. Variance of the model decreases

**Answer: b, c**

**These are Introduction to Machine Learning Week 12 Assignment 12 Answers**

**Q6. Suppose we want an RL agent to learn to play the game of golf. For training purposes, we make use of a golf simulator program. Assume that the original reward distribution gives a reward of +10 when the golf ball is hit into the hole and -1 for all other transitions. To aid the agent’s learning process, we propose to give an additional reward of +3 whenever the ball is within a 1 metre radius of the hole. Is this additional reward a good idea or not? Why?**

a. Yes. The additional reward will help speed-up learning.

b. Yes. Getting the ball to within a metre of the hole is like a sub-goal and hence, should be rewarded.

c. No. The additional reward may actually hinder learning.

d. No. It violates the idea that a goal must be outside the agent’s direct control.

**Answer: c. No. The additional reward may actually hinder learning.**

**These are Introduction to Machine Learning Week 12 Assignment 12 Answers**

**Q7. You want to toss a fair coin a number of times and obtain the probability of getting heads by taking a simple average. What is the estimated number of times you’ll have to toss the coin to make sure that your estimated probability is within 10% of the actual probability, at least 90% of the time?**

a. 400*ln(20)

b. 800ln(20)

c. 200*ln(20)

**Answer: c. 200*ln(20)**

**These are Introduction to Machine Learning Week 12 Assignment 12 Answers**

**Q8. A new phone, E-Corp X1 has been announced and it is what you’ve been waiting for, all along. You decide to read the reviews before buying it. From past experiences, you’ve figured out that good reviews mean that the product is good 90% of the time and bad reviews mean that it is bad 70% of the time. Upon glancing through the reviews section, you find out that the X1 has been reviewed 1269 times and only 127 of them were bad reviews. What is the probability that, if you order the X1, it is a bad phone?**

a. 0.1362

b. 0.160

c. 0.840

d. 0.773

**Answer: b. 0.160**

**These are Introduction to Machine Learning Week 12 Assignment 12 Answers**

**Q9. You face a particularly challenging RL problem, where the reward distribution keeps changing with time. In order to gain maximum reward in this scenario, does it make sense to stop exploration or continue exploration?**

a. Stop exploration

b. Continue exploration

**Answer: b. Continue exploration**

**These are Introduction to Machine Learning Week 12 Assignment 12 Answers**

More Weeks of Introduction to Machine Learning: Click Here

More Nptel courses: https://progiez.com/nptel

The content uploaded on this website is for reference purposes only. Please do it yourself first. |