Deep Learning IIT Ropar Week 4 Nptel Answers

Are you looking for the Deep Learning IIT Ropar Week 4 NPTEL Assignment Answers? You’ve come to the right place!

Deep Learning IIT Ropar Week 4 Nptel Assignment Answers (Jan- Apr 2025)

Q1. Using the Adam optimizer with β1=0.9\beta_1 = 0.9, β2=0.999\beta_2 = 0.999, and ϵ=10−8\epsilon = 10^{-8}, what would be the bias-corrected first moment estimate after the first update if the initial gradient is 4?

a) 0.4
b) 4.0
c) 3.6
d) 0.44

View Answer

Q2. In a mini-batch gradient descent algorithm, if the total number of training samples is 50,000 and the batch size is 100, how many iterations are required to complete 10 epochs?

a) 5,000
b) 50,000
c) 500
d) 5

View Answer

Q3. In a stochastic gradient descent algorithm, the learning rate starts at 0.1 and decays exponentially with a decay rate of 0.1 per epoch. What will be the learning rate after 5 epochs?

View Answer

Q4. In the context of the Adam optimizer, what is the purpose of bias correction?

a) To prevent overfitting
b) To speed up convergence
c) To correct for the bias in the estimates of first and second moments
d) To adjust the learning rate

View Answer

Q5. The figure below shows the contours of a surface. Suppose that a man walks from -1 to +1 on both the horizontal (x) axis and the vertical (y) axis. The statement that the man would have seen the slope change rapidly along the x-axis than the y-axis is,

a) True
b) False
c) Cannot say

View Answer

Q6. What is the primary benefit of using Adagrad compared to other optimization algorithms?

Deep Learning IIT Ropar Week 4 Nptel Assignment Answers (JULY – DEC 2024)

Course Link: Click Here

Q1.A team has a data set that contains 1000 samples for training a feed-forward neural network. Suppose they decided to use stochastic gradient descent algorithm to update the weights. How many times do the weights get updated after training the network for 5 epochs?
1000
5000
100
5

Answer: B) 5000

Q2. What is the primary benefit of using Adagrad compared to other optimization algorithms?
It converges faster than other optimization algorithms.
It is more memory-efficient than other optimization algorithms.
It is less sensitive to the choice of hyperparameters(learning rate).
It is less likely to get stuck in local optima than other optimization algorithms.

Answer: It is more memory-efficient than other optimization algorithms.

For answers or latest updates join our telegram channel: Click here to join

These are Deep Learning IIT Ropar Week 4 Nptel Assignment Answers

Q3.What are the benefits of using stochastic gradient descent compared to vanilla gradient descent?
SGD converges more quickly than vanilla gradient descent.
SGD is computationally efficient for large datasets.
SGD theoretically guarantees that the descent direction is optimal.
SGD experiences less oscillation compared to vanilla gradient descent.

Answer:

Q4. Select the behaviour of the Gradient descent algorithm that uses the following update rule,
wt+1=wt−η∇wt

where w
is a weight and η
is a learning rate.
The weight update is tiny at a steep loss surface
The weight update is tiny at a gentle loss surface
The weight update is large at a steep loss surface
The weight update is large at a gentle loss surface

Answer: The weight update is large at a steep loss surface

For answers or latest updates join our telegram channel: Click here to join

These are Deep Learning IIT Ropar Week 4 Nptel Assignment Answers

Q5.Given data where one column predominantly contains zero values, which algorithm should be used to achieve faster convergence and optimize the loss function?
Adam
NAG
Momentum-based gradient descent
Stochastic gradient descent

Answer: Adam

Q6. In Nesterov accelerated gradient descent, what step is performed before determining the update size?
Increase the momentum
Adjust the learning rate
Decrease the step size
Estimate the next position of the parameters

Answer:

For answers or latest updates join our telegram channel: Click here to join

These are Deep Learning IIT Ropar Week 4 Nptel Assignment Answers

Q7.We have following functions x3,ln(x),ex,x
and 4. Which of the following functions has the steepest slope at x=1?

ln(x)

e^x
4

Answer: ln(x)

Q8.Which of the following represents the contour plot of the function f(x,y) = x2−y2?

Answer: C option

For answers or latest updates join our telegram channel: Click here to join

These are Deep Learning IIT Ropar Week 4 Nptel Assignment Answers

Q9.Which of the following algorithms will result in more oscillations of the parameter during the training process of the neural network?
Stochastic gradient descent
Mini batch gradient descent
Batch gradient descent
Batch NAG

Answer:

Q10.Consider a gradient profile ∇W=[1,0.9,0.6,0.01,0.1,0.2,0.5,0.55,0.56].
Assume v−1=0,ϵ=0,β=0.9
and the learning rate is η−1=0.1
. Suppose that we use the Adagrad algorithm then what is the value of η6=η/sqrt(vt+ϵ)?
0.03
0.06
0.08
0.006

Answer:

For answers or latest updates join our telegram channel: Click here to join

These are Deep Learning IIT Ropar Week 4 Nptel Assignment Answers

Check here all Deep Learning IIT Ropar Nptel Assignment Answers : Click here

For answers to additional Nptel courses, please refer to this link: NPTEL Assignment Answers