# Python for Data Science NPTEL Week 4 Assignment Answers

Are you looking for the Python for Data Science NPTEL Week 4 Assignment Answers 2024? You’ve come to the right place! This guide offers detailed solutions to the Week 4 assignment questions, helping you solidify your understanding of Python programming and its applications in data science.

## Python for Data Science Nptel Week 4 Assignment Answers (July-Dec 2024)

1. Which of the following are regression problems? Assume that appropriate data is given.

A) Predicting the house price.
B) Predicting whether it will rain or not on a given day.
C) Predicting the maximum temperature on a given day.
D) Predicting the sales of the ice-creams.

Answer: A) Predicting the house price.
C) Predicting the maximum temperature on a given day.
D) Predicting the sales of the ice-creams.

2. Which of the following are multiclass classification problems?

A) Classifying emails as spam or not spam.
B) Classifying a person’s blood type as A, B, AB, or O.
C) Predicting the price of a second-hand car.
D) Classifying a movie genre into Drama, Comedy, Action, or Thriller.

Answer:B) Classifying a person’s blood type as A, B, AB, or O.
D) Classifying a movie genre into Drama, Comedy, Action, or Thriller.

3. If a linear regression model achieves zero training error, can we say that all the data points lie on a straight line in the feature space?

A) Yes.
B) No.

4. Which of the following machine learning techniques would NOT be appropriate to solve the problem given in the problem statement?

A) kNN
B) Random Forest
C) Logistic Regression
D) Linear regression

Answer: D) Linear regression

5. After applying logistic regression, what is/are the correct observations from the resultant confusion matrix?

A) True Positive = 29, True Negative = 94
B) True Positive = 94, True Negative = 29
C) False Positive = 5, True Negative = 94
D) None of the above

Answer: A) True Positive = 29, True Negative = 94
C) False Positive = 5, True Negative = 94

6. The logistic regression model built between the input and output variables is checked for its prediction accuracy of the test data. What is the accuracy range (in %) of the predictions made over test data?

A) 60 – 79
B) 90 – 95
C) 30 – 59
D) 80 – 89

Answer: B) 90 – 95

7. How are categorical variables preprocessed before model building?

A) Standardization
B) Dummy variables
C) Correlation
D) None of the above

Answer: B) Dummy variables

8. A regression model with the function ( y = 80 + 4.5x ) was built to understand the impact of temperature ( x ) on ice cream sales ( y ). The temperature this month is 10 degrees more than the previous month. What is the predicted difference in ice cream sales?

A) 56 units
B) 45 units
C) 80 units
D) None of the above

9. X and Y are two variables that have a strong linear relationship. Which of the following statements are incorrect?

A) There cannot be a negative relationship between the two variables.
B) The relationship between the two variables is purely causal.
C) One variable may or may not cause a change in the other variable.
D) The variables can be positively or negatively correlated with each other.

Answer: A) There cannot be a negative relationship between the two variables.
B) The relationship between the two variables is purely causal.

10. A multiple linear regression model is built on the Global Happiness Index dataset ‘GHI Report.csv’. What is the RMSE of the baseline model?

A) 2.00
B) 0.50
C) 1.06
D) 0.75

Python for Data Science NPTEL All weeks: Click Here

More Nptel Courses: https://progiez.com/nptel-assignment-answers

## Python for Data Science Nptel Week 4 Assignment Answers (Jan-Apr 2024)

Course name: Python For Data Science

These are Python for Data Science Nptel Week 4 Assignment Answers

Q1. Which of the following are regression problems? Assume that appropriate data is given.
Predicting the house price.
Predicting whether it will rain or not on a given day.
Predicting the maximum temperature on a given day.
Predicting the sales of the ice-creams.

Answer: a, c, d

Q2. Which of the followings are binary classification problems?
Predicting whether a patient is diagnosed with cancer or not.
Predicting whether a team will win a tournament or not.
Predicting the price of a second-hand car.
Classify web text into one of the following categories: Sports, Entertainment, or Technology.

Q3. If a linear regression model achieves zero training error, can we say that all the data points lie on a hyperplane in the (d+1)-dimensional space? Here, d is the number of features.
Yes
No

These are Python for Data Science Nptel Week 4 Assignment Answers

Q4. Which of the following machine learning techniques would NOT be appropriate to solve the problem given in the problem statement?
kNN
Random Forest
Logistic Regression
Linear regression

Q5. After applying logistic regression, what is/are the correct observations from the resultant confusion matrix?
True Positive = 29, True Negative = 94
True Positive = 94, True Negative = 29
False Positive = 5, True Negative = 94
None of the above

Q6. The logistic regression model built between the input and output variables is checked for its prediction accuracy of the test data. What is the accuracy range (in %) of the predictions made over test data?
60 – 79
90 – 95
30 – 59
80 – 89

Answer: 90 – 95

These are Python for Data Science Nptel Week 4 Assignment Answers

Q7. How are categorical variables preprocessed before model building?
Standardization
Dummy variables
Correlation
None of the above

Q8. A multiple linear regression model is built on the Global Happiness Index dataset ‘GHI_Report.csv’. What is the RMSE of the baseline model?
2.00
0.50
1.06
0.75

Q9. A regression model with the following function y=60+5.2x was built to understand the impact of humidity (x) on rainfall (y). The humidity this week is 30 more than the previous week. What is the predicted difference in rainfall?
156 mm
15.6 mm
-156 mm
None of the above

Q10. X nd Y are two variables that have a strong linear relationship. Which of the following statements are incorrect?
There cannot be a negative relationship between the two variables.
The relationship between the two variables is purely causal.
One variable may or may not cause a change in the other variable.
The variables can be positively or negatively correlated with each other.

These are Python for Data Science Nptel Week 4 Assignment Answers

More Weeks of Python for Data Science: Click here

## Python for Data Science Nptel Week 4 Assignment Answers (Jan-Apr 2023)

Course Name: Python for Data Science

These are Python for Data Science Nptel Week 4 Assignment Answers

Q1. Which of the following are regression problems? Assume that appropriate data is given.
a. Predicting the house price.
b. Predicting whether it will rain or not on a given day.
c. Predicting the maximum temperature on a given day.
d. Predicting the sales of the ice-creams.

Answer: a, c, d

Q2. Which of the followings are binary classification problems?
a. Predicting whether a patient is diagnosed with cancer or not.
b. Predicting whether a team will win a tournament or not.
c. Predicting the price of a second-hand car.
d. Classify web text into one of the following categories: Sports, Entertainment, or Technology.

These are Python for Data Science Nptel Week 4 Assignment Answers

Q3. If a linear regression model achieves zero training error, can we say that all the data points lie on a hyperplane in the (d+1)-dimensional space? Here, d is the number of features.
a. Yes
b. No

Read the information given below and answer the questions from 4 to 6:

Data Description:
An automotive service chain is launching its new grand service station this weekend.They offer to service a wide variety of cars. The current capacity of the station is to check 315 cars thoroughly per day. As an inaugural offer, they claim to freely check all cars that arrive on their launch day, and report whether they need servicing or not!
Unexpectedly, they get 450 cars. The servicemen will not work longer than the working hours, but the data analysts have to!

Can you save the day for the new service station?
How can a data scientist save the day for them?
He has been given a data set, ‘ServiceTrain.csv’ that contains some attributes of the car that can be easily measured and a conclusion that if a service is needed or not.
Now for the cars they cannot check in detail, they measure those attributes and store them in ‘ ServiceTest.csv ’
Problem Statement: Use machine learning techniques to identify whether the cars require service or not
Read the given datasets ‘ ServiceTrain.csv ’ and ‘ ServiceTest.csv ’ as train data and test data respectively and import all the required packages for analysis.

Q4. Which of the following machine learning techniques would NOT be appropriate to solve the problem given in the problem statement?
a. kNN
b. Random Forest
c. Logistic Regression
d. Linear regression

Answer: d. Linear regression

These are Python for Data Science Nptel Week 4 Assignment Answers

Prepare the data by following the steps given below, and answer questions 6 and 7.

• Encode categorical variable, Service – Yes as 1 and No as 0 for both the train and test datasets.
• Split the set of independent features and the dependent feature on both the train and test datasets.
• Set random_state for the instance of the logistic regression class as 0.

Q5. After applying logistic regression, what is/are the correct observations from the resultant confusion matrix?
a. True Positive = 29, True Negative = 94
b. True Positive = 94, True Negative = 29
c. False Positive = 5, True Negative = 94
d. None of the above

Q6. The logistic regression model built between the input and output variables is checked for its prediction accuracy of the test data. What is the accuracy range (in %) of the predictions made over test data?
a. 60 – 79
b. 90 – 95
c. 30 – 59
d. 80 – 89

Answer: b. 90 – 95

These are Python for Data Science Nptel Week 4 Assignment Answers

Q7. How are categorical variables preprocessed before model building?
a. Standardization
b. Dummy variables
c. Correlation
d. None of the above

Answer: b. Dummy variables

The Global Happiness Index report contains the Happiness Score data with multiple features (namely the Economy, Family, Health, and Freedom) that could affect the target variable value.

Prepare the data by following the steps given below, and answer question 8

• Split the set of independent features and the dependent feature on the given dataset
• Create training and testing data from the set of independent features and dependent feature by splitting the original data in the ratio 3:1 respectively, and set the value for random_state of the training/test split method’s instance as 1

Q8. A multiple linear regression model is built on the Global Happiness Index dataset “GHI Report.csv”. What is the RMSE of the baseline model?
a. 2.00
b. 0.50
c. 1.06
d. 0.75

These are Python for Data Science Nptel Week 4 Assignment Answers

Q9. A regression model with the following function y = 60 + 5.2x was built to understand the impact of humidity (x) on rainfall (y). The humidity this week is 30 more than the previous week. What is the predicted difference in rainfall?
a. 156 mm
b. 15.6 mm
c. -156 mm
d. None of the above

Answer: a. 156 mm

Q10. X and Y are two variables that have a strong linear relationship. Which of the following statements are incorrect?
a. There cannot be a negative relationship between the two variables.
b. The relationship between the two variables is purely causal.
c. One variable may or may not cause a change in the other variable.
d. The variables can be positively or negatively correlated with each other.

These are Python for Data Science Nptel Week 4 Assignment Answers

More Weeks of Python for Data Science NPTEL: Click here

More NPTEL courses: https://progiez.com/nptel

## Python for Data Science Nptel Week 4 Assignment Answers (July-Dec 2022)

Course name: Python for Data Science

These are Python for Data Science Nptel Week 4 Assignment Answers

Q1. The power consumption of an individual house in a residential complex has been recorded for the previous year. This data is analysed to predict the power consumption for the next year. Under which type of machine learning problem does this fall under?
a. Classification
b. Regression
c. Reinforcement Learning
d. None of the above

These are Python for Data Science Nptel Week 4 Assignment Answers

Q2. A dataset contains data collected by the Tamil Nadu Pollution Control Board on environmental conditions (154 variables) from one of their monitoring stations. This data is further analyzed to understand the most significant factors that affect the Air Quality Index. The predictive algorithm that can be used in this situation is __________.
a. Logistic Regression
b. Simple Linear Regression
c. Multiple Linear Regression
d. None of the above

Answer: c. Multiple Linear Regression

These are Python for Data Science Nptel Week 4 Assignment Answers

Q3. A regression model with the following function y = 60 + 5.2x was built to understand the impact of humidity (x) on rainfall (y). The humidity this week is 30 more than the previous week. What is the predicted difference in rainfall?
a. 156 mm
b. 15.6 mm
c. -156 mm
d. None of the above

Answer: a. 156 mm

These are Python for Data Science Nptel Week 4 Assignment Answers

Q4. Which of the following machine learning techniques would NOT be appropriate to solve the problem given in the problem statement?
a. kNN
b. Random Forest
c. Logistic Regression
d. Linear regression

Answer: d. Linear regression

These are Python for Data Science Nptel Week 4 Assignment Answers