INTRODUCTION TO MACHINE LEARNING Week 5
Session: JULY-DEC 2023
Course Name: Introduction to Machine Learning
Course Link: Click Here
These are Introduction to Machine Learning Week 5 Assignment 5 Answers
Q1. The perceptron learning algorithm is primarily designed for:
Regression tasks
Unsupervised learning
Clustering tasks
Linearly separable classification tasks
Non-linear classification tasks
Answer: Linearly separable classification tasks
Q2. The last layer of ANN is linear for ________ and softmax for ________.
Regression, Regression
Classification, Classification
Regression, Classification
Classification, Regression
Answer: Regression, Classification
These are Introduction to Machine Learning Week 5 Assignment 5 Answers
Q3. Consider the following statement and answer True/False with corresponding reason:
The class outputs of a classification problem with a ANN cannot be treated independently.
True. Due to cross-entropy loss function
True. Due to softmax activation
False. This is the case for regression with single output
False. This is the case for regression with multiple outputs
Answer: True. Due to softmax activation
These are Introduction to Machine Learning Week 5 Assignment 5 Answers
Q4. Given below is a simple ANN with 2 inputs X1,X2∈{0,1} and edge weights −3,+2,+2
h={1 if x≥0 0 otherwise
Which of the following logical functions does it compute?
XOR
NOR
NAND
AND
Answer: NAND
These are Introduction to Machine Learning Week 5 Assignment 5 Answers
Q5. Using the notations used in class, evaluate the value of the neural network with a 3-3-1 architecture (2-dimensional input with 1 node for the bias term in both the layers). The parameters are as follows
α=[1 1 0.4 0.6 0.3 0.5]
β=[0.4 0.6 0.9]
Using sigmoid function as the activation functions at both the layers, the output of the network for an input of (0.8, 0.7) will be (up to 4 decimal places)
0.7275
0.0217
0.2958
0.8213
0.7291
0.8414
0.1760
0.7552
0.9442
None of these
Answer: 0.7291
These are Introduction to Machine Learning Week 5 Assignment 5 Answers
Q6. If the step size in gradient descent is too large, what can happen?
Overfitting
The model will not converge
We can reach maxima instead of minima
None of the above
Answer: We can reach maxima instead of minima
These are Introduction to Machine Learning Week 5 Assignment 5 Answers
Q7. On different initializations of your neural network, you get significantly different values of loss. What could be the reason for this?
Overfitting
Some problem in the architecture
Incorrect activation function
Multiple local minima
Answer: Multiple local minima
These are Introduction to Machine Learning Week 5 Assignment 5 Answers
Q8. The likelihood L(θ|X) is given by:
P(θ|X)
P(X|θ)
P(X).P(θ)
P(θ)/P(X)
Answer: P(X|θ)
Q9. Why is proper initialization of neural network weights important?
To ensure faster convergence during training
To prevent overfitting
To increase the model’s capacity
Initialization doesn’t significantly affect network performance
To minimize the number of layers in the network
Answer: To ensure faster convergence during training
Q10. Which of these are limitations of the backpropagation algorithm?
It requires error function to be differentiable
It requires activation function to be differentiable
The ith layer cannot be updated before the update of layer i+1 is complete
All of the above
(a) and (b) only
None of these
Answer: All of the above
These are Introduction to Machine Learning Week 5 Assignment 5 Answers
More Weeks of INTRODUCTION TO MACHINE LEARNING: Click here
More Nptel Courses: Click here
Session: JAN-APR 2023
Course Name: Introduction to Machine Learning
Course Link: Click Here
These are Introduction to Machine Learning Week 5 Assignment 5 Answers
Q1. You are given the N samples of input (x) and output (y) as shown in the figure below. What will be the most appropriate model y=f(x)?

a. y=wx˙withw>0
b. y=wx˙withw<0
c. y=xwwithw>0
d. y=xwwithw<0
Answer: c. y=xwwithw>0
Q2. For training a binary classification model with five independent variables, you choose to use neural networks. You apply one hidden layer with three neurons. What are the number of parameters to be estimated? (Consider the bias term as a parameter)
a. 16
b. 21
c. 34 = 81
d. 43 = 64
e. 12
f. 22
g. 25
h. 26
i. 4
j. None of these
Answer: f. 22
These are Introduction to Machine Learning Week 5 Assignment 5 Answers
Q3. Suppose the marks obtained by randomly sampled students follow a normal distribution with unknown μ. A random sample of 5 marks are 25, 55, 64, 7 and 99. Using the given samples find the maximum likelihood estimate for the mean.
a. 54.2
b. 67.75
c. 50
d. Information not sufficient for estimation
Answer: c. 50
Q4. You are given the following neural networks which take two binary valued inputs x1,x2∈{0,1} and the activation function is the threshold function(h(x)=1 if x>0; 0 otherwise). Which of the following logical functions does it compute?

a. OR
b. AND
c. NAND
d. None of the above.
Answer: a. OR
These are Introduction to Machine Learning Week 5 Assignment 5 Answers
Q5. Using the notations used in class, evaluate the value of the neural network with a 3-3-1 archi- tecture (2-dimensional input with 1 node for the bias term in both the layers). The parameters are as follows
α=[1−10.20.80.40.5]
β=[0.80.40.5]
Using sigmoid function as the activation functions at both the layers, the output of the network for an input of (0.8, 0.7) will be
a. 0.6710
b. 0.9617
c. 0.6948
d. 0.7052
e. 0.2023
f. 0.7977
g. 0.2446
h. None of these
Answer: f. 0.7977
These are Introduction to Machine Learning Week 5 Assignment 5 Answers
Q6. Which of the following statements are true:
a. The chances of overfitting decreases with increasing the number of hidden nodes and increasing the number of hidden layers.
b. A neural network with one hidden layer can represent any Boolean function given sufficient number of hidden units and appropriate activation functions.
c. Two hidden layer neural networks can represent any continuous functions (within a tolerance) as long as the number of hidden units is sufficient and appropriate activation functions used.
Answer: b, c
These are Introduction to Machine Learning Week 5 Assignment 5 Answers
Q7. We have a function which takes a two-dimensional input x=(x1,x2) and has two parameters w=(w1,w2) given by f(x,w)=σ(σ(x1w1)w2+x2) where σ(x)=11+e−x We use backprop- agation to estimate the right parameter values. We start by setting both the parameters to 1. Assume that we are given a training point x2=1,x1=0,y=5. Given this information answer the next two questions. What is the value of ∂f∂w2?
a. 0.150
b. -0.25
c. 0.125
d. 0.098
e. 0.0746
f. 0.1604
g. None of these
Answer: e. 0.0746
These are Introduction to Machine Learning Week 5 Assignment 5 Answers
Q8. If the learning rate is 0.5, what will be the value of w2 after one update using backpropagation algorithm?
a. 0.4197
b. -0.4197
c. 0.6881
d. -0.6881
e. 1.3119
f. -1.3119
g. 0.5625
h. -0.5625
i. None of these
Answer: e. 1.3119
These are Introduction to Machine Learning Week 5 Assignment 5 Answers
Q9. Which of the following are true when comparing ANNs and SVMs?
a. ANN error surface has multiple local minima while SVM error surface has only one minima
b. After training, an ANN might land on a different minimum each time, when initialized with random weights during each run.
c. As shown for Perceptron, there are some classes of functions that cannot be learnt by an ANN. An SVM can learn a hyperplane for any kind of distribution.
d. In training, ANN’s error surface is navigated using a gradient descent technique while SVM’s error surface is navigated using convex optimization solvers.
Answer: a, b, d
These are Introduction to Machine Learning Week 5 Assignment 5 Answers
Q10. Which of the following are correct?
a. A perceptron will learn the underlying linearly separable boundary with finite number of training steps.
b. XOR function can be modelled by a single perceptron.
c. Backpropagation algorithm used while estimating parameters of neural networks actually uses gradient descent algorithm.
d. The backpropagation algorithm will always converge to global optimum, which is one of the reasons for impressive performance of neural networks.
Answer: a, c
These are Introduction to Machine Learning Week 5 Assignment 5 Answers
More Weeks of Introduction to Machine Learning: Click Here
More Nptel courses: https://progiez.com/nptel
Session: JUL-DEC 2022
These are Introduction to Machine Learning Week 5 Assignment 5 Answers
Course Name: INTRODUCTION TO MACHINE LEARNING
Link to Enroll: Click Here
Q1. If the step size in gradient descent is too large, what can happen?
a. Overfitting
b. The model will not converge
c. We can reach maxima instead of minima
d. None of the above
Answer: b. The model will not converge
Q2. Recall the XOR(tabulated below) example from class where we did a transformation of features to make it linearly separable. Which of the following transformations can also work?

a. X‘1=X21,X‘2=X22X′1=X12,X′2=X22
b. X‘1=1+X1,X‘2=1−X2X′1=1+X1,X′2=1−X2
c. X‘1=X1X2,X‘2=−X1X2X′1=X1X2,X′2=−X1X2
d. X‘1=(X1−X2)2,X‘2=(X1+X2)2X′1=(X1−X2)2,X′2=(X1+X2)2
Answer: c. X‘1=X1X2,X‘2=−X1X2X′1=X1X2,X′2=−X1X2
These are Introduction to Machine Learning Week 5 Assignment 5 Answers
Q3. What is the effect of using activation function f(x)=xf(x)=x for hidden layers in an ANN?
a. No effect. It’s as good as any other activation function (sigmoid, tanh etc).
b. The ANN is equivalent to doing multi-output linear regression.
c. Backpropagation will not work.
d. We can model highly complex non-linear functions.
Answer: b. The ANN is equivalent to doing multi-output linear regression.
Q4. Which of the following functions can be used on the last layer of an ANN for classification?
a. Softmax
b. Sigmoid
c. Tanh
d. Linear
Answer: b, c
These are Introduction to Machine Learning Week 5 Assignment 5 Answers
Q5. Statement: Threshold function cannot be used as activation function for hidden layers.
Reason: Threshold functions do not introduce non-linearity.
a. Statement is true and reason is false.
b. Statement is false and reason is true.
c. Both are true and the reason explains the statement.
d. Both are true and the reason does not explain the statement.
Answer: a. Statement is true and reason is false.
Q6. We use several techniques to ensure the weights of the neural network are small (such as random initialization around 0 or regularisation). What conclusions can we draw if weights of our ANN are high?
a. Model has overfitted.
b. It was initialized incorrectly.
c. At least one of (a) or (b).
d. None of the above.
Answer: c. At least one of (a) or (b).
These are Introduction to Machine Learning Week 5 Assignment 5 Answers
Q7. On different initializations of your neural network, you get significantly different values of loss. What could be the reason for this?
a. Overfitting
b. Some problem in the architecture
c. Incorrect activation function
d. Multiple local minima
Answer: a. Overfitting
Q8. The likelihood L(θ|X)L(θ|X) is given by:
a. P(θ|X)P(θ|X)
b. P(X|θ)P(X|θ)
c. P(X).P(θ)P(X).P(θ)
d. P(θ)P(X)P(θ)P(X)
Answer: b. P(X|θ)P(X|θ)
These are Introduction to Machine Learning Week 5 Assignment 5 Answers
Q9. You are trying to estimate the probability of it raining today using maximum likelihood estimation. Given that in nn days, it rained nrnr times, what is the probability of it raining today?
a. nrnnrn
b. nrnr+nnrnr+n
c. nnr+nnnr+n
d. None of the above.
Answer: a. nrnnrn
Q10. Choose the correct statement (multiple may be correct):
a. MLE is a special case of MAP when prior is a uniform distribution.
b. MLE acts as regularisation for MAP.
c. MLE is a special case of MAP when prior is a beta distribution .
d. MAP acts as regularisation for MLE.
Answer: a, d
These are Introduction to Machine Learning Week 5 Assignment 5 Answers
More NPTEL Solutions: https://progiez.com/nptel

This content is uploaded for study, general information, and reference purpose only.