Introduction to Large Language Models Week 3 Answers

Are you looking for Introduction to Large Language Models Week 3 Answers. All weeks of Introduction To Internet Of Things available here.


Introduction to Large Language Models Week 3 Answers
Introduction to Large Language Models Week 3 Answers

Introduction to Large Language Models Week 3 Answers (July-Dec 2025)

Course link: Click here


Question 1. In backpropagation, which method is used to compute the gradients?
a) Gradient descent
b) Chain rule of derivatives
c) Matrix factorization
d) Linear regression

View Answers


Question 2. Which of the following functions is not differentiable at zero?
a) Sigmoid
b) Tanh
c) ReLU
d) Linear

View Answers


Question 3. In the context of regularization, which of the following statements is true?
a) L2 regularization tends to produce sparse weights
b) Dropout is applied during inference to improve accuracy
c) L1 regularization adds the squared weight penalties to the loss function
d) Dropout prevents overfitting by randomly disabling neurons during training

View Answers


Question 4. Which activation function is least likely to suffer from vanishing gradients?
a) Tanh
b) Sigmoid
c) ReLU

View Answers


Question 5. Which of the following equations correctly represents the derivative of the sigmoid function?
a) σ(x) · (1 + σ(x))
b) σ(x)²
c) σ(x) · (1 − σ(x))
d) 1 / (1 + e^x)

View Answers


Question 6. What condition must be met for the Perceptron learning algorithm to converge?
a) Learning rate must be zero
b) Data must be non-linearly separable
c) Data must be linearly separable
d) Activation function must be sigmoid

View Answers


Question 7. Which of the following logic functions requires a network with at least one hidden layer to model?
a) AND
b) OR
c) NOT
d) XOR

View Answers


Question 8. Why is it necessary to include non-linear activation functions between layers in an MLP?
a) Without them, the network is just a linear function
b) They prevent overfitting
c) They allow backpropagation to work

View Answers


Question 9. What is typically the output activation function for an MLP solving a binary classification task?
a) Tanh
b) ReLU
c) Sigmoid
d) Softmax

View Answers


Question 10. Which type of regularization encourages sparsity in the weights?
a) L1 regularization
b) L2 regularization
c) Dropout
d) Early stopping

View Answers



Click here for all nptel assignment answers

These are Introduction to Large Language Models Week 3 Answers