Deep Learning IIT Ropar Week 4 Nptel Answers

Are you looking for the Deep Learning IIT Ropar Week 4 NPTEL Assignment Answers? You’ve come to the right place!


Deep Learning IIT Ropar Week 4 Nptel Assignment Answers
Deep Learning IIT Ropar Week 4 Nptel Assignment Answers

Deep Learning IIT Ropar Week 4 Nptel Assignment Answers (Jan- Apr 2025)


Q1. Using the Adam optimizer with β1=0.9\beta_1 = 0.9, β2=0.999\beta_2 = 0.999, and ϵ=10−8\epsilon = 10^{-8}, what would be the bias-corrected first moment estimate after the first update if the initial gradient is 4?

a) 0.4
b) 4.0
c) 3.6
d) 0.44

View Answer


Q2. In a mini-batch gradient descent algorithm, if the total number of training samples is 50,000 and the batch size is 100, how many iterations are required to complete 10 epochs?

a) 5,000
b) 50,000
c) 500
d) 5

View Answer


Q3. In a stochastic gradient descent algorithm, the learning rate starts at 0.1 and decays exponentially with a decay rate of 0.1 per epoch. What will be the learning rate after 5 epochs?

View Answer


Q4. In the context of the Adam optimizer, what is the purpose of bias correction?

a) To prevent overfitting
b) To speed up convergence
c) To correct for the bias in the estimates of first and second moments
d) To adjust the learning rate

View Answer


Q5. The figure below shows the contours of a surface. Suppose that a man walks from -1 to +1 on both the horizontal (x) axis and the vertical (y) axis. The statement that the man would have seen the slope change rapidly along the x-axis than the y-axis is,

a) True
b) False
c) Cannot say

View Answer


Q6. What is the primary benefit of using Adagrad compared to other optimization algorithms?

See also  All About NPTEL

a) It converges faster than other optimization algorithms.
b) It is more memory-efficient than other optimization algorithms.
c) It is less sensitive to the choice of hyperparameters (learning rate).
d) It is less likely to get stuck in local optima than other optimization algorithms.

View Answer


Q7. What are the benefits of using stochastic gradient descent compared to vanilla gradient descent?

a) SGD converges more quickly than vanilla gradient descent.
b) SGD is computationally efficient for large datasets.
c) SGD theoretically guarantees that the descent direction is optimal.
d) SGD experiences less oscillation compared to vanilla gradient descent.

View Answer


Q8. What is the role of activation functions in deep learning?

a) Activation functions transform the output of a neuron into a non-linear function, allowing the network to learn complex patterns.
b) Activation functions make the network faster by reducing the number of iterations needed for training.
c) Activation functions are used to normalize the input data.
d) Activation functions are used to compute the loss function.

View Answer


Q9. What is the advantage of using mini-batch gradient descent over batch gradient descent?

a) Mini-batch gradient descent is more computationally efficient than batch gradient descent.
b) Mini-batch gradient descent leads to a more accurate estimate of the gradient than batch gradient descent.
c) Mini-batch gradient descent gives us a better solution.
d) Mini-batch gradient descent can converge faster than batch gradient descent.

View Answer


Q10. In the Nesterov Accelerated Gradient (NAG) algorithm, the gradient is computed at:

a) The current position
b) A “look-ahead” position
c) The previous position
d) The average of current and previous positions

View Answer


Deep Learning IIT Ropar Week 4 Nptel Assignment Answers (JULY – DEC 2024)

Course Link: Click Here


Q1.A team has a data set that contains 1000 samples for training a feed-forward neural network. Suppose they decided to use stochastic gradient descent algorithm to update the weights. How many times do the weights get updated after training the network for 5 epochs?
1000
5000
100
5

See also  Deep Learning Week 1 Assignment Answers Nptel

Answer: B) 5000


Q2. What is the primary benefit of using Adagrad compared to other optimization algorithms?
It converges faster than other optimization algorithms.
It is more memory-efficient than other optimization algorithms.
It is less sensitive to the choice of hyperparameters(learning rate).
It is less likely to get stuck in local optima than other optimization algorithms.

Answer: It is more memory-efficient than other optimization algorithms.


For answers or latest updates join our telegram channel: Click here to join

These are Deep Learning IIT Ropar Week 4 Nptel Assignment Answers


Q3.What are the benefits of using stochastic gradient descent compared to vanilla gradient descent?
SGD converges more quickly than vanilla gradient descent.
SGD is computationally efficient for large datasets.
SGD theoretically guarantees that the descent direction is optimal.
SGD experiences less oscillation compared to vanilla gradient descent.

Answer:


Q4. Select the behaviour of the Gradient descent algorithm that uses the following update rule,
wt+1=wt−η∇wt

where w
is a weight and η
is a learning rate.
The weight update is tiny at a steep loss surface
The weight update is tiny at a gentle loss surface
The weight update is large at a steep loss surface
The weight update is large at a gentle loss surface

Answer: The weight update is large at a steep loss surface


For answers or latest updates join our telegram channel: Click here to join

These are Deep Learning IIT Ropar Week 4 Nptel Assignment Answers


Q5.Given data where one column predominantly contains zero values, which algorithm should be used to achieve faster convergence and optimize the loss function?
Adam
NAG
Momentum-based gradient descent
Stochastic gradient descent

See also  Deep Learning IIT Ropar Week 3 Nptel Answers

Answer: Adam


Q6. In Nesterov accelerated gradient descent, what step is performed before determining the update size?
Increase the momentum
Adjust the learning rate
Decrease the step size
Estimate the next position of the parameters

Answer:


For answers or latest updates join our telegram channel: Click here to join

These are Deep Learning IIT Ropar Week 4 Nptel Assignment Answers


Q7.We have following functions x3,ln(x),ex,x
and 4. Which of the following functions has the steepest slope at x=1?

x3

ln(x)

ex
4

Answer: ln(x)


Q8.Which of the following represents the contour plot of the function f(x,y) = x2−y2?

Answer: C option


For answers or latest updates join our telegram channel: Click here to join

These are Deep Learning IIT Ropar Week 4 Nptel Assignment Answers


Q9.Which of the following algorithms will result in more oscillations of the parameter during the training process of the neural network?
Stochastic gradient descent
Mini batch gradient descent
Batch gradient descent
Batch NAG

Answer:


Q10.Consider a gradient profile ∇W=[1,0.9,0.6,0.01,0.1,0.2,0.5,0.55,0.56].
Assume v−1=0,ϵ=0,β=0.9
and the learning rate is η−1=0.1
. Suppose that we use the Adagrad algorithm then what is the value of η6=η/sqrt(vt+ϵ)?
0.03
0.06
0.08
0.006

Answer:


For answers or latest updates join our telegram channel: Click here to join

These are Deep Learning IIT Ropar Week 4 Nptel Assignment Answers

Check here all Deep Learning IIT Ropar Nptel Assignment Answers : Click here

For answers to additional Nptel courses, please refer to this link: NPTEL Assignment Answers