# Introduction to Machine Learning | Week 2

**Session: JAN-APR 2024**

**Course name: Introduction to Machine Learning**

**Course Link: Click Here**

**For answers or latest updates join our telegram channel: Click here to join **

#### These are Introduction to Machine Learning Week 2 Assignment 2 Answers

#### Q1. The parameters obtained in linear regression

can take any value in the real space

are strictly integers

always lie in the range [0,1]

can take only non-zero values

**Answer: can take any value in the real space**

**Q2. Suppose that we have N independent variables (X1,X2,…Xn) and the dependent variable is Y. Now imagine that you are applying linear regression by fitting the best fit line using the least square error on this data. You found that the correlation of X1 with Y is -0.005.**

Regressing Y on X1 mostly does not explain away Y

Regressing Y on X1 explains away Y

The given data is insufficient to determine if regressing Y on X1 explains away Y or not

None of the above

**Answer: Regressing Y on X1 mostly does not explain away Y**

**For answers or latest updates join our telegram channel: Click here to join **

**These are Introduction to Machine Learning Week 2 Assignment 2 Answers**

**Q3. The relation between studying time (in hours) and grade on the final examination (0-100) in a random sample of students in the Introduction to Machine Learning Class was found to be: Grade = 30.5+15.2(h)How will a student’s grade be affected if she studies for four hours, compared to not studying?**

It will go down by 30.4 points

It will go up by 60.8 points

The grade will remain unchanged

It cannot be determined from the information given

**Answer: It will go up by 60.8 points**

**Q4. Consider the following 4 training examples:We want to learn a function f(x)=ax+b which is parametrized by (a,b). Using squared error as the loss function, which of the following parameters would you use to model this function.**

(1,1)

(1,2)

(2,1)

(2,2)

**Answer: (1,1)**

**For answers or latest updates join our telegram channel: Click here to join **

**These are Introduction to Machine Learning Week 2 Assignment 2 Answers**

**Q5. Consider a modified k−NN method in which once the k nearest neighbours to the query point are identified, you do a linear regression fit on them and output the fitted value for the query point. Which of the following is/are true regarding this method.**

This method makes an assumption that the data is locally linear

In order to perform well, this method would need dense distributed training data

This method has higher bias compared to k−NN

This method has higher variance compared to k−NN

**Answer: a, b, dThis method makes an assumption that the data is locally linearIn order to perform well, this method would need dense distributed training dataThis method has higher variance compared to k−NN**

**Q6. Which of the statements is/are True?**

Ridge has sparsity constraint, and it will drive coefficients with low values to 0

Lasso has a closed form solution for the optimization problem, but this is not the case for Ridge

Ridge regression may reduce the number of variables

If there are two or more highly collinear variables, Lasso will select one of them randomly

**Answer: c, dRidge regression does not reduce the number of variables since it never leads a coefficient to zero but only minimizes it.If there are two or more highly collinear variables, Lasso will select one of them randomly**

**For answers or latest updates join our telegram channel: Click here to join **

**These are Introduction to Machine Learning Week 2 Assignment 2 Answers**

**Q7. Choose the correct option(s) from the following:**

When working with a small dataset, one should prefer low bias/high variance classifiers over high bias/low variance classifiers

When working with a small dataset, one should prefer high bias/low variance classifiers over low bias/high variance classifiers

When working with a large dataset, one should prefer high bias/low variance classifiers over low bias/high variance classifiers

When working with a large dataset, one should prefer low bias/high variance classifiers over high bias/low variance classifiers

**Answer: b), d)When working with a small dataset, one should prefer high bias/low variance classifiers over low bias/high variance classifiersWhen working with a large dataset, one should prefer low bias/high variance classifiers over high bias/low variance classifiers**

**Q8. Consider the following statements:Statement A: In Forward stepwise selection, in each step, that variable is chosen which has the maximum correlation with the residual, then the residual is regressed on that variable, and it is added to the predictor.Statement B: In Forward stagewise selection, the variables are added one by one to the previously selected variables to produce the best fit till then**

Both the statements are True

Statement A is True, and Statement B is False

Statement A if False and Statement B is True

Both the statements are False

**Answer: Both the statements are False**

**For answers or latest updates join our telegram channel: Click here to join **

**These are Introduction to Machine Learning Week 2 Assignment 2 Answers**

**Q9. The linear regression model y=a0+a1x1+a2x2+…+apxp is to be fitted to a set of N training data points having p attributes each. Let X be N×(p+1) vectors of input values (augmented by 1’s), Y be N×1vector of target values, and θ be (p+1)×1 vector of parameter values (a0,a1,a2,…,ap ). If the sum squared error is minimized for obtaining the optimal regression model, which of the following equation holds?**

XTX=XY

Xθ=XTY

XTXθ=Y

XTXθ=XTY

**Answer: XTXθ=XTY**

**For answers or latest updates join our telegram channel: Click here to join **

**These are Introduction to Machine Learning Week 2 Assignment 2 Answers**

More Weeks of Introduction to Machine Learning: Click here

More Nptel Courses: https://progiez.com/nptel-assignment-answers

**Session: JULY-DEC 2023**

**Course Name: Introduction to Machine Learning**

**Course Link: Click Here**

**These are Introduction to Machine Learning Week 2 Assignment 2 Answers**

**Q1. The parameters obtained in linear regression**

can take any value in the real space

are strictly integers

always lie in the range [0,1]

can take only non-zero values

**Answer: can take any value in the real space**

**These are Introduction to Machine Learning Week 2 Assignment 2 Answers**

**Q2. Suppose that we have N independent variables (X1,X2,…Xn) and the dependent variable is Y. Now imagine that you are applying linear regression by fitting the best fit line using the least square error on this data. You found that the correlation coefficient for one of its variables (Say X1) with Y is -0.005.**

Regressing Y on X1 mostly does not explain away Y.

Regressing Y on X1 explains away Y.

The given data is insufficient to determine if regressing Y on X1 explains away Y or not.

**Answer: Regressing Y on X1 mostly does not explain away Y.**

**These are Introduction to Machine Learning Week 2 Assignment 2 Answers**

**Q3. Which of the following is a limitation of subset selection methods in regression?**

They tend to produce biased estimates of the regression coefficients.

They cannot handle datasets with missing values.

They are computationally expensive for large datasets.

They assume a linear relationship between the independent and dependent variables.

They are not suitable for datasets with categorical predictors.

**Answer: They are computationally expensive for large datasets.**

**These are Introduction to Machine Learning Week 2 Assignment 2 Answers**

**Q4. The relation between studying time (in hours) and grade on the final examination (0-100) in a random sample of students in the Introduction to Machine Learning Class was found to be:Grade = 30.5 + 15.2 (h)How will a student’s grade be affected if she studies for four hours?**

It will go down by 30.4 points.

It will go down by 30.4 points.

It will go up by 60.8 points.

The grade will remain unchanged.

It cannot be determined from the information given

**Answer: It will go up by 60.8 points.**

**These are Introduction to Machine Learning Week 2 Assignment 2 Answers**

**Q5. Which of the statements is/are True?**

Ridge has sparsity constraint, and it will drive coefficients with low values to 0.

Lasso has a closed form solution for the optimization problem, but this is not the case for Ridge.

Ridge regression does not reduce the number of variables since it never leads a coefficient to zero but only minimizes it.

If there are two or more highly collinear variables, Lasso will select one of them randomly

**Answer: a, d**

**These are Introduction to Machine Learning Week 2 Assignment 2 Answers**

**Q6. Find the mean of squared error for the given predictions:Hint: Find the squared error for each prediction and take the mean of that.**

1

2

1.5

0

**Answer: 1**

**These are Introduction to Machine Learning Week 2 Assignment 2 Answers**

**Q7. Consider the following statements:Statement A: In Forward stepwise selection, in each step, that variable is chosen which has the maximum correlation with the residual, then the residual is regressed on that variable, and it is added to the predictor.Statement B: In Forward stagewise selection, the variables are added one by one to the previously selected variables to produce the best fit till then**

Both the statements are True.

Statement A is True, and Statement B is False

Statement A is False and Statement B is True

Both the statements are False.

**Answer: Both the statements are False.**

**These are Introduction to Machine Learning Week 2 Assignment 2 Answers**

**Q8. The linear regression model y=a0+a1x1+a2x2+…+apxp is to be fitted to a set of N training data points having p attributes each. Let X be N×(p+1) vectors of input values (augmented by 1‘s), Y be N×1vector of target values, and θ be (p+1)×1 vector of parameter values (a0,a1,a2,…,ap). If the sum squared error is minimized for obtaining the optimal regression model, which of the following equation holds?**

XTX=XY

Xθ=XTY

XTXθ=Y

XTXθ=XTY

**Answer: D. XTXθ=XTY**

**These are Introduction to Machine Learning Week 2 Assignment 2 Answers**

**Q9. Which of the following statements is true regarding Partial Least Squares (PLS) regression?**

PLS is a dimensionality reduction technique that maximizes the covariance between the predictors and the dependent variable.

PLS is only applicable when there is no multicollinearity among the independent variables.

PLS can handle situations where the number of predictors is larger than the number of observations.

PLS estimates the regression coefficients by minimizing the residual sum of squares.

PLS is based on the assumption of normally distributed residuals.

All of the above.

None of the above.

**Answer: PLS is a dimensionality reduction technique that maximizes the covariance between the predictors and the dependent variable.**

**These are Introduction to Machine Learning Week 2 Assignment 2 Answers**

**Q10. Which of the following statements about principal components in Principal Component Regression (PCR) is true?**

Principal components are calculated based on the correlation matrix of the original predictors.

The first principal component explains the largest proportion of the variation in the dependent variable.

Principal components are linear combinations of the original predictors that are uncorrelated with each other.

PCR selects the principal components with the highest p-values for inclusion in the regression model.

PCR always results in a lower model complexity compared to ordinary least squares regression.

**Answer: Principal components are linear combinations of the original predictors that are uncorrelated with each other.**

**These are Introduction to Machine Learning Week 2 Assignment 2 Answers**

More Weeks of INTRODUCTION TO MACHINE LEARNING: Click here

More Nptel Courses: Click here

**Session: JAN-APR 2023**

**Course Name: Introduction to Machine Learning**

**Course Link: Click Here**

**These are Introduction to Machine Learning Week 2 Assignment 2 Answers**

**Q1. Given a training data set of 10,000 instances, with each input instance having 17 dimensions and each output instance having 2 dimensions, the dimensions of the design matrix used in applying linear regression to this data is**

a. 10000 × 17

b. 10002 × 17

c. 10000 × 18

d. 10000 × 19

**Answer: c. 10000 × 18**

**These are Introduction to Machine Learning Week 2 Assignment 2 Answers**

**Q2. Suppose we want to add a regularizer to the linear regression loss function, to control the magnitudes of the weights β. We have a choice between Ω1(β)=Σpi=1|β| and Ω2(β)=Σpi=1β2. Which one is more likely to result in sparse weights?**

a. Ω1

b. Ω2

c. Both Ω1 and Ω2 will result in sparse weights

d. Neither of Ω1 or Ω2 can result in sparse weights

**Answer: a. Ω1**

**These are Introduction to Machine Learning Week 2 Assignment 2 Answers**

**Q3. The model obtained by applying linear regression on the identified subset of features may differ from the model obtained at the end of the process of identifying the subset during**

a. Forward stepwise selection

b. Backward stepwise selection

c. Forward stagewise selection

d. All of the above

**Answer: c. Forward stagewise selection**

**These are Introduction to Machine Learning Week 2 Assignment 2 Answers**

**Q4. Consider forward selection, backward selection and best subset selection with respect to the same data set. Which of the following is true?**

a. Best subset selection can be computationally more expensive than forward selection

b. Forward selection and backward selection always lead to the same result

c. Best subset selection can be computationally less expensive than backward selection

d. Best subset selection and forward selection are computationally equally expensive

e. Both (b) and (d)

**Answer: a. Best subset selection can be computationally more expensive than forward selection**

**These are Introduction to Machine Learning Week 2 Assignment 2 Answers**

**Q5. In the lecture on Multivariate Regression, you learn about using orthogonalization iteratively to obtain regression co-effecients. This method is generally referred to as Multiple Regression using Successive Orthogonalization. In the formulation of the method, we observe that in iteration k, we regress the entire dataset on z0,z1,…zk−1. It seems like a waste of computation to recompute the coefficients for z0 a total of p times, z1 a total of p−1 times and so on. Can we re-use the coefficients computed in iteration j for iteration j+1 for zj−1?**

a. No. Doing so will result in the wrong γ matrix. and hence, the wrong βi’s.

b. Yes. Since zj−1 is orthogonal to zj−l∀l≤j1, the multiple regression in each iteration is essentially a univariate regression on each of the previous residuals. Since the regression coefficients for the previous residuals don’t change over iterations, we can re-use the coefficients for further iterations.

**Answer: b. Yes. Since zj−1 is orthogonal to zj−l∀l≤j1, the multiple regression in each iteration is essentially a univariate regression on each of the previous residuals. Since the regression coefficients for the previous residuals don’t change over iterations, we can re-use the coefficients for further iterations.**

**These are Introduction to Machine Learning Week 2 Assignment 2 Answers**

**Q6. Principal Component Regression (PCR) is an approach to find an orthogonal set of basis vectors which can then be used to reduce the dimension of the input. Which of the following matrices contains the principal component directions as its columns (follow notation from the lecture video)**

a. X

b. S

c. Xc

d. V

e. U

**Answer: d. V**

**These are Introduction to Machine Learning Week 2 Assignment 2 Answers**

**Q7. Consider the following five training examples**

**We want to learn a function f(x) of the form f(x)=ax+b which is parameterised by (a,b). Using squared error as the loss function, which of the following parameters would you use to model this function to get a solution with the minimum loss.**

a. (4,3)

b. (1,4)

c. (4,1)

d. (3,4)

**Answer: d. (3,4)**

**These are Introduction to Machine Learning Week 2 Assignment 2 Answers**

**Q8. Here is a data set of words in two languages.**

**Let us build a nearest neighbours classifier that will predict which language a word belongs to. Say we represent each word using the following features.• Length of the word• Number of consonants in the word• Whether it ends with the letter ’o’ (1 if it does, 0 if it doesn’t)For example, the representation of the word ‘waffle’ would be [6, 2, 0]. For a distance function, use the Manhattan distance.d(a,b)=Σni=1|ai−bi| where a,b∈RnTake the input word ‘keto’. With k = 1, the predicted language for the word is?**

a. English

b. Vinglish

c. None of the above

**Answer: a. English**

**These are Introduction to Machine Learning Week 2 Assignment 2 Answers**

More Weeks of Introduction to Machine Learning: Click Here

**Session: JUL-DEC 2022**

**1. The parameters obtained in linear regression**

a. can take any value in the real space

b. are strictly integers

c. always lie in the range [0,1]

d. can take only non-zero values

**Answer: d. can take only non-zero values**

**These are Introduction to Machine Learning Week 2 Assignment 2 Answers**

**2. **Suppose that we have *N* independent variables (*X*1,*X*2,…*Xn*) and the dependent variable is *Y* . Now imagine that you are applying linear regression by fitting the best fit line using the least square error on this data. You found that the correlation coefficient for one of its variables (Say *X*1) with *Y* is -0.005.

a. Regressing *Y* on *X*1 mostly does not explain away *Y* .

b. Regressing *Y* on *X*1 explains away *Y* .

c. The given data is insufficient to determine if regressing *Y* on *X*1 explains away *Y* or not.

**Answer: c. The given data is insufficient to determine if regressing Y on X1 explains away Y or not.**

**These are Introduction to Machine Learning Week 2 Assignment 2 Answers**

**3. **Consider the following five training examples

We want to learn a function *f*(*x*) of the form *f*(*x*)=*ax*+*b* which is parameterised by (*a*,*b*).Using mean squared error as the loss function, which of the following parameters would you use to model this function to get a solution with the minimum loss?

a. (4, 3)

b. (1, 4)

c. (4, 1)

d. (3, 4)

**Answer: d. (3, 4)**

**These are Introduction to Machine Learning Week 2 Assignment 2 Answers**

**4. **The relation between studying time (in hours) and grade on the final examination (0-100) in a random sample of students in the Introduction to Machine Learning Class was found to be: Grade = 30.5 + 15.2 (h)

How will a student’s grade be affected if she studies for four hours?

a. It will go down by 30.4 points.

b. It will go down by 30.4 points.

c. It will go up by 60.8 points.

d. The grade will remain unchanged.

e.It cannot be determined from the information given

**Answer: a. It will go down by 30.4 points.**

**These are Introduction to Machine Learning Week 2 Assignment 2 Answers**

**5.** Which of the statements is/are True?

a. Ridge has sparsity constraint, and it will drive coefficients with low values to 0.

b. Lasso has a closed form solution for the optimization problem, but this is not the case for Ridge.

c. Ridge regression does not reduce the number of variables since it never leads a coefficient to zero but only minimizes it.

d. If there are two or more highly collinear variables, Lasso will select one of them randomly.

**Answer: c. Ridge regression does not reduce the number of variables since it never leads a coefficient to zero but only minimizes it.**

**These are Introduction to Machine Learning Week 2 Assignment 2 Answers**

**6. **Consider the following statements:

Assertion(A): Orthogonalization is applied to the dimensions in linear regression.

Reason(R): Orthogonalization makes univariate regression possible in each orthogonal dimension separately to produce the coefficients.

a. Both A and R are true, and R is the correct explanation of A.

b. Both A and R are true, but R is not the correct explanation of A.

c. A is true, but R is false.

d. A is false, but R is true

e.Both A and R are false.

**Answer: d. A is false, but R is true**

**These are Introduction to Machine Learning Week 2 Assignment 2 Answers**

**7. **Consider the following statements:

Statement A: In Forward stepwise selection, in each step, that variable is chosen which has the maximum correlation with the residual, then the residual is regressed on that variable, and it is added to the predictor.

Statement B: In Forward stagewise selection, the variables are added one by one to the previously selected variables to produce the best fit till then

a. Both the statements are True.

b. Statement A is True, and Statement B is False

c. Statement A if False and Statement B is True

d. Both the statements are False.

**Answer: a. Both the statements are True.**

**These are Introduction to Machine Learning Week 2 Assignment 2 Answers**

**8. **The linear regression model *y*=*a*0+*a*1*x*1+*a*2*x*2+…+*apxp* is to be fitted to a set of N training data points having p attributes each. Let *X* be *N*×(*p*+1) vectors of input values (augmented by 1‘s), *Y* be *N*×1 vector of target values, and *θ* be (*p*+1)×1 vector of parameter values (*a*0,*a*1,*a*2,…,*ap*). If the sum squared error is minimized for obtaining the optimal regression model, which of the following equation holds?

**Answer:- b**

**These are Introduction to Machine Learning Week 2 Assignment 2 Answers**

The content uploaded on this website is for reference purposes only. Please do it yourself first. |