Introduction to Large Language Models Week 2 Answers
Are you looking for Introduction to Large Language Models Week 2 Answers. All weeks of Introduction To Internet Of Things available here.
Table of Contents
Introduction to Large Language Models Week 2 Answers (Jan-Apr 2026)
Que1. A language model that conditions each word on the previous three words is equivalent to which Markov model?
a) First-order
b) Second-order
c) Third-order
d) Fourth-order
Que2. Which smoothing technique leverages the number of unique contexts a word appears in?
a) Good-Turing
b) Add-k
c) Kneser-Ney
d) Absolute Discounting
Que3. Using a language model inside a machine translation system to measure BLEU score is an example of:
a) Intrinsic evaluation
b) Extrinsic evaluation
c) Perplexity evaluation
d) Likelihood estimation
Que4. Assuming a bi-gram language model, calculate the probability of the sentence:<s> the dragon casts fire </s>
Ignore the unigram probability of P(<s>) in your calculation.
a) 2/37
b) 1/27
c) 1/36
d) None of the above
Que5. Using add-one smoothing, compute the probability of the sentence:<s> the dragon casts fire </s>
Assume vocabulary size V includes all unique words (excluding <s>, </s>).
a) 1/7150
b) 1/35
c) 1/67
d) 1/9900
Que6. Using the smoothed probability, compute the perplexity of:<s> the dragon casts fire </s>
(Exclude <s> and </s> from word count.)
a) 90001/5
b) 71501/4
c) 351/5
d) 99001/4
(July-Dec 2025)
Course link: Click here
Question 1. Which of the following does not directly affect perplexity?
a) Vocabulary size
b) Sentence probability
c) Number of tokens
d) Sentence length
Question 2. Which equation expresses the chain rule for a 4-word sentence?
a) P(w1, w2, w3, w4) = P(w1) + P(w2|w1) + P(w3|w2) + P(w4|w3)
b) P(w1, w2, w3, w4) = P(w1) × P(w2|w1) × P(w3|w1, w2) × P(w4|w1, w2, w3)
c) P(w1, w2, w3, w4) = P(w1) × P(w2|w1) × P(w3|w2) × P(w4|w3)
d) P(w1, w2, w3, w4) = P(w4|w3) × P(w3|w2) × P(w2|w1) × P(w1)
Question 3. Which assumption allows n-gram models to reduce computation?
a) Bayes Assumption
b) Chain Rule
c) Independence Assumption
d) Markov Assumption
Question 4. In a trigram language model, which of the following is a correct example of linear interpolation?
a) P(wi∣wi−2,wi−1)=λ1P(wi∣wi−2,wi−1)
b) P(wi∣wi−2,wi−1)=λ1P(wi∣wi−2,wi−1)+λ2P(wi∣wi−1)+λ3P(wi)
c) P(wi∣wi−2,wi−1)=max(P(wi∣wi−2,wi−1),P(wi∣wi−1))
d) P(wi∣wi−2,wi−1)=P(wi)P(wi−1)/P(wi−2)
Question 5. A trigram model is equivalent to which order Markov model?
a) 3
b) 2
c) 1
d) 4
Question 6. Which smoothing technique leverages the number of unique contexts a word appears in?
a) Good-Turing
b) Add-k
c) Kneser-Ney
d) Absolute Discounting
Question 7. Assuming a bi-gram language model, calculate the probability of the sentence: birds fly in the blue sky. Ignore the unigram probability of P() in your calculation.
a) 2/37
b) 1/27
c) 0
d) 1/36
Question 8. Assuming a bi-gram language model, calculate the perplexity of the sentence: birds fly in the blue sky. Please do not consider and as words of the sentence.
a) 271/4
b) 271/5
c) 91/6
d) None of these
Click here for all nptel assignment answers
These are Introduction to Large Language Models Week 2 Answers