Introduction to Large Language Models Week 5 Answers

Are you looking for Introduction to Large Language Models Week 5 Answers. All weeks of Introduction To Internet Of Things available here.


Introduction to Large Language Models Week 5 Answers
Introduction to Large Language Models Week 5 Answers

Introduction to Large Language Models Week 5 Answers (July-Dec 2025)

Course link: Click here


Question 1. Which of the following best explains the vanishing gradient problem in RNNs?
a) RNNs lack memory mechanisms for long-term dependencies.
b) Gradients grow too large during backpropagation.
c) Gradients shrink exponentially over long sequences.
d) RNNs cannot process variable-length sequences.

View Answers


Question 2. In an attention mechanism, what does the softmax function ensure?
a) Normalization of decoder outputs
b) Stability of gradients during backpropagation
c) Values lie between -1 and 1
d) Attention weights sum to 1

View Answers


Question 3. Which of the following is true about the difference between a standard RNN and an LSTM?
a) LSTM does not use any non-linear activation.
b) LSTM has a gating mechanism to control information flow.
c) RNNs have fewer parameters than LSTMs because they use convolution.
d) LSTMs cannot learn long-term dependencies.

View Answers


Question 4. Which gate in an LSTM is responsible for deciding how much of the cell state to keep?
a) Forget gate
b) Input gate
c) Output gate
d) Cell candidate gate

View Answers


Question 5. What improvement does attention bring to the basic Seq2Seq model?
a) Reduces training time
b) Removes the need for an encoder
c) Allows access to all encoder states during decoding
d) Reduces the number of model parameters

View Answers


Question 6. Which of the following is a correct statement about the encoder-decoder architecture?
a) The encoder generates tokens one at a time.
b) The decoder summarizes the input sequence.
c) The decoder generates outputs based on encoder representations and its own prior outputs.
d) The encoder stores only the first token of the sequence.

View Answers


Question 7. What is self-attention in Transformers used for?
a) To enable sequential computation
b) To attend to the previous layer’s output
c) To relate different positions in the same sequence
d) To enforce fixed-length output

View Answers


Question 8. Why are RNNs preferred over fixed-window neural models?
a) They have a smaller parameter size.
b) They can process sequences of arbitrary length.
c) They eliminate the need for embedding layers.
d) None of the above.

View Answers


Question 9. Given the following encoder and decoder hidden states, compute the attention scores. (Use dot product as the scoring function)

Encoder hidden states: h1 = [7, 3], h2 = [0, 2], h3 = [1, 4]
Decoder hidden state: s = [0.2, 1.5]

a) 0.42, 0.02, 0.56
b) 0.15, 0.53, 0.32
c) 0.64, 0.18, 0.18
d) 0.08, 0.91, 0.01

View Answers


Click here for all nptel assignment answers

These are Introduction to Large Language Models Week 5 Answers