# Deep Learning | Week 9

Course Name: Deep Learning

#### Q1. Suppose Elina has trained a neural network 3 times (Experiment A, B and C) with someunknown optimizer. Each time she has kept all other hyper-parameters same, but changed onlyone hyper-parameter. From the three given loss curves, can you identify what is that hyper-parameter?

a. Batch size
b. Learning rate
c. Number of hidden layers
d. Loss function

Q2. This question has Statement 1 and Statement 2. Of the four choices given after the statements, choose the one that best describes the two statements.
Statement 1: Mini-batch gradient descent will always overshoot the optimum point even with a lower learning rate value.
Statement 2: Mini-batch gradient might oscillate in its path towards convergence and oscillation can reduced by momentum optimizer.

a. Statement 1 is True and Statement 2 is False
b. Statement 1 is False and Statement 2 is True
c. Statement 1 is True and Statement 2 is True
d. Statement 1 is False and Statement 2 is False

Answer: b. Statement 1 is False and Statement 2 is True

These are NPTEL Deep Learning Week 9 Assignment 9 Answers

Q3. This question has Statement 1 and Statement 2. Of the four choices given after the statements, choose the one that best describes the two statements.
Statement 1: Apart from the learning rate, Momentum optimizer has two hyper parameters whereas Adam has just one hyper parameter in its weight update equation
Statement 2: Adam optimizer and stochastic gradient descent have the same weight update rule

a. Statement 1 is True and Statement 2 is False
b. Statement 1 is False and Statement 2 is True
c. Statement 1 is True and Statement 2 is True
d. Statement 1 is False and Statement 2 is False

Answer: d. Statement 1 is False and Statement 2 is False

Q4. Which of the following options is true?
b. In Stochastic Gradient Descent, a small batch of sample is selected randomly instead of the whole data set for each iteration. Too large update of weight values leading to faster convergence
c. In big data applications Stochastic Gradient Descent increases the computational burden
d. Stochastic Gradient Descent is a non-iterative process

or
c. In big data applications Stochastic Gradient Descent increases the computational burden

These are NPTEL Deep Learning Week 9 Assignment 9 Answers

Q5. Which of the following is a possible edge of momentum optimizer over of mini-batch gradient descent?
a. Mini-batch gradient descent performs better than momentum optimizer when the surface of the loss function has a much more elongated curvature along X-axis than along Y-axis
b. Mini-batch gradient descent always performs better than momentum optimizer
c. Mini-batch gradient descent will always overshoot the optimum point even with a lower learning rate value
d. Mini-batch gradient might oscillate in its path towards convergence which can reduced by momentum optimizer

Answer: d. Mini-batch gradient might oscillate in its path towards convergence which can reduced by momentum optimizer

These are NPTEL Deep Learning Week 9 Assignment 9 Answers

Q6. Which of the following is true?
a. Adam is a replacement optimization algorithm for stochastic gradient descent for training deep learning models in local minima
b. Apart from the learning rate, Momentum optimizer has two hyper parameters whereas Adam has just one hyper parameter in its weight update equation
c. Adam optimizer and stochastic gradient descent have the same weight update rule
d. None of the above

Answer: a. Adam is a replacement optimization algorithm for stochastic gradient descent for training deep learning models in local minima

These are NPTEL Deep Learning Week 9 Assignment 9 Answers

Q7. Which of the following is the correct property of RMSProp optimizer?
a. RMSProp divides the learning rate by an exponentially decaying average of squared gradients
b. RMSProp has a constant learning rate
c. RMSProp divides the learning rate by an exponentially increasing average of squared gradients
d. RMSProp decays the learning rate by a constant value

Answer: a. RMSProp divides the learning rate by an exponentially decaying average of squared gradients

These are NPTEL Deep Learning Week 9 Assignment 9 Answers

Q8. Why it is at all required to choose different learning rates for different weights?
a. To avoid the problem of diminishing learning rate
b. To avoid overshooting the optimum point
c. To reduce vertical oscillations while navigating the optimum point
d. This would aid to reach the optimum point faster

Answer: d. This would aid to reach the optimum point faster

These are NPTEL Deep Learning Week 9 Assignment 9 Answers

Q9. This question has Statement 1 and Statement 2. Of the four choices given after the statements, choose the one that best describes the two statements.
Statement 1: The stochastic gradient computes the gradient using a single sample
Statement 2: It converges much faster than the batch gradient

a. Statement 1 is True and Statement 2 is False
b. Statement 1 is False and Statement 2 is True
c. Statement 1 is True and Statement 2 is True
d. Statement 1 is False and Statement 2 is False

Answer: c. Statement 1 is True and Statement 2 is True

These are NPTEL Deep Learning Week 9 Assignment 9 Answers

Q10. What is the main purpose of auxiliary classifier in GoogleNet?
a. To increase the number of parameters
b. To avoid vanishing gradient problem
c. To increase the inference speed
d. None of the above