Introduction to Large Language Models Week 2 Answers

Are you looking for Introduction to Large Language Models Week 2 Answers. All weeks of Introduction To Internet Of Things available here.


Introduction to Large Language Models Week 2 Answers (Jan-Apr 2026)

Que1. A language model that conditions each word on the previous three words is equivalent to which Markov model?
a) First-order
b) Second-order
c) Third-order
d) Fourth-order

View Answer

Que2. Which smoothing technique leverages the number of unique contexts a word appears in?
a) Good-Turing
b) Add-k
c) Kneser-Ney
d) Absolute Discounting

View Answer

Que3. Using a language model inside a machine translation system to measure BLEU score is an example of:
a) Intrinsic evaluation
b) Extrinsic evaluation
c) Perplexity evaluation
d) Likelihood estimation

View Answer

Que4. Assuming a bi-gram language model, calculate the probability of the sentence:
<s> the dragon casts fire </s>
Ignore the unigram probability of P(<s>) in your calculation.
a) 2/37
b) 1/27
c) 1/36
d) None of the above

View Answer

Que5. Using add-one smoothing, compute the probability of the sentence:
<s> the dragon casts fire </s>
Assume vocabulary size V includes all unique words (excluding <s>, </s>).
a) 1/7150
b) 1/35
c) 1/67
d) 1/9900

View Answer

Que6. Using the smoothed probability, compute the perplexity of:
<s> the dragon casts fire </s>
(Exclude <s> and </s> from word count.)
a) 90001/5
b) 71501/4
c) 351/5
d) 99001/4

View Answer


(July-Dec 2025)

Course link: Click here

Question 1. Which of the following does not directly affect perplexity?
a) Vocabulary size
b) Sentence probability
c) Number of tokens
d) Sentence length

View Answers


Question 2. Which equation expresses the chain rule for a 4-word sentence?
a) P(w1, w2, w3, w4) = P(w1) + P(w2|w1) + P(w3|w2) + P(w4|w3)
b) P(w1, w2, w3, w4) = P(w1) × P(w2|w1) × P(w3|w1, w2) × P(w4|w1, w2, w3)
c) P(w1, w2, w3, w4) = P(w1) × P(w2|w1) × P(w3|w2) × P(w4|w3)
d) P(w1, w2, w3, w4) = P(w4|w3) × P(w3|w2) × P(w2|w1) × P(w1)

View Answers


Question 3. Which assumption allows n-gram models to reduce computation?
a) Bayes Assumption
b) Chain Rule
c) Independence Assumption
d) Markov Assumption

View Answers


Question 4. In a trigram language model, which of the following is a correct example of linear interpolation?
a) P(wi∣wi−2,wi−1)=λ1P(wi∣wi−2,wi−1)
b) P(wi∣wi−2,wi−1)=λ1P(wi∣wi−2,wi−1)+λ2P(wi∣wi−1)+λ3P(wi)
c) P(wi∣wi−2,wi−1)=max(P(wi∣wi−2,wi−1),P(wi∣wi−1))
d) P(wi∣wi−2,wi−1)=P(wi)P(wi−1)/P(wi−2)

View Answers


Question 5. A trigram model is equivalent to which order Markov model?
a) 3
b) 2
c) 1
d) 4

View Answers


Question 6. Which smoothing technique leverages the number of unique contexts a word appears in?
a) Good-Turing
b) Add-k
c) Kneser-Ney
d) Absolute Discounting

View Answers


Question 7. Assuming a bi-gram language model, calculate the probability of the sentence: birds fly in the blue sky. Ignore the unigram probability of P() in your calculation.
a) 2/37
b) 1/27
c) 0
d) 1/36

View Answers


Question 8. Assuming a bi-gram language model, calculate the perplexity of the sentence: birds fly in the blue sky. Please do not consider and as words of the sentence.
a) 271/4
b) 271/5
c) 91/6
d) None of these

View Answers


Click here for all nptel assignment answers

These are Introduction to Large Language Models Week 2 Answers