Introduction to Large Language Models Week 3 Answers
Are you looking for Introduction to Large Language Models Week 3 Answers. All weeks of Introduction To Internet Of Things available here.
Table of Contents

Introduction to Large Language Models Week 3 Answers (July-Dec 2025)
Course link: Click here
Question 1. In backpropagation, which method is used to compute the gradients?
a) Gradient descent
b) Chain rule of derivatives
c) Matrix factorization
d) Linear regression
Question 2. Which of the following functions is not differentiable at zero?
a) Sigmoid
b) Tanh
c) ReLU
d) Linear
Question 3. In the context of regularization, which of the following statements is true?
a) L2 regularization tends to produce sparse weights
b) Dropout is applied during inference to improve accuracy
c) L1 regularization adds the squared weight penalties to the loss function
d) Dropout prevents overfitting by randomly disabling neurons during training
Question 4. Which activation function is least likely to suffer from vanishing gradients?
a) Tanh
b) Sigmoid
c) ReLU
Question 5. Which of the following equations correctly represents the derivative of the sigmoid function?
a) σ(x) · (1 + σ(x))
b) σ(x)²
c) σ(x) · (1 − σ(x))
d) 1 / (1 + e^x)
Question 6. What condition must be met for the Perceptron learning algorithm to converge?
a) Learning rate must be zero
b) Data must be non-linearly separable
c) Data must be linearly separable
d) Activation function must be sigmoid
Question 7. Which of the following logic functions requires a network with at least one hidden layer to model?
a) AND
b) OR
c) NOT
d) XOR
Question 8. Why is it necessary to include non-linear activation functions between layers in an MLP?
a) Without them, the network is just a linear function
b) They prevent overfitting
c) They allow backpropagation to work
Question 9. What is typically the output activation function for an MLP solving a binary classification task?
a) Tanh
b) ReLU
c) Sigmoid
d) Softmax
Question 10. Which type of regularization encourages sparsity in the weights?
a) L1 regularization
b) L2 regularization
c) Dropout
d) Early stopping
Click here for all nptel assignment answers
These are Introduction to Large Language Models Week 3 Answers