# INTRODUCTION TO MACHINE LEARNING Week 10

**Session: JULY-DEC 2023**

**Course Name: Introduction to Machine Learning**

**Course Link: Click Here**

**These are Introduction to Machine Learning Week 10 Assignment 10 Answers**

**Q1. The pairwise distance between 6 points is given below. Which of the option shows the hierarchy of clusters created by single link clustering algorithm?**

Answer: (b)

**Q2. For the pairwise distance matrix given in the previous question, which of the following shows the hierarchy of clusters created by the complete link clustering algorithm.**

Answer: (b)

**These are Introduction to Machine Learning Week 10 Assignment 10 Answers**

**Q3. In BIRCH, using number of points N, sum of points SUM and sum of squared points SS, we can determine the centroid and radius of the combination of any two clusters A and B. How do you determine the radius of the combined cluster? (In terms of N, SUM and SS of both two clusters A and B)Radius of a cluster is given by: Radius=√SSN−(SUMN)2Note: We use the following definition of radius from the BIRCH paper:“Radius is the average distance from the member points to the centroid.”**

Radius=√SSANA−(SUMANA)2+SSBNB−(SUMBNB)2

Radius=√SSANA−(SUMANA)2+SSBNB−(SUMBNB)2

Radius=√SSA+SSBNA+NB−(SUMA+SUMBNA+NB)2

Radius=√SSANA+SSBNB−(SUMA+SUMBNA+NB)2

**Answer: C. Radius=√SSA+SSB/NA+NB−(SUMA+SUMB/NA+NB)^2**

**These are Introduction to Machine Learning Week 10 Assignment 10 Answers**

**Q4. Run K-means on the input features of the MNIST dataset using the following initialization:KMeans(nclusters=10,randomstate=seed)Usually, for clustering tasks, we are not given labels, but since we do have labels for our dataset, we can use accuracy to determine how good our clusters are.Label the prediction class for all the points in a cluster as the majority true label.E.g. {a,a,b} would be labeled as {a,a,a}What is the accuracy of the resulting labels?**

0.790

0.893

0.702

0.933

**Answer: 0.790**

**These are Introduction to Machine Learning Week 10 Assignment 10 Answers**

**Q5. For the same clusters obtained in the previous question, calculate the rand-index. The formula for rand-index:R=a+bCn2where,a = the number of times a pair of elements occur in the same cluster in both sequences.b = the number of times a pair of elements occur in the different clusters in both sequences.Note: The two clusters are given by: (1) Ground truth labels, (2) Prediction labels using clustering as directed in Q4.**

0.879

0.893

0.919

0.933

**Answer: 0.933**

**These are Introduction to Machine Learning Week 10 Assignment 10 Answers**

**Q6. a in rand-index can be viewed as true positives(pair of points belonging to the same cluster) andb as true negatives(pair of points belonging to different clusters). How, then, are rand-index and accuracy from the previous two questions related?**

rand-index = accuracy

rand-index = 1.18×accuracy

rand-index = accuracy/2

None of the above

**Answer: None of the above**

**These are Introduction to Machine Learning Week 10 Assignment 10 Answers**

**Q7. Run BIRCH on the input features of MNIST dataset using Birch(nclusters=10,threshold=1).What is the rand-index obtained?**

0.91

0.96

0.88

0.98

**Answer: 0.96**

**Q8. Run PCA on MNIST dataset input features with n components = 2. Now run DBSCAN using DBSCAN(eps=0.5,minsamples=5)on both the original features and the PCA features. What are their respective number of outliers/noisy points detected by DBSCAN?As an extra, you can plot the PCA features on a 2D plot using matplotlib.pyplot.scatter with parameter c=y−pred (where y−pred is the cluster prediction) to visualise the clusters and outliers.**

1600, 1522

1500, 1482

1000, 1000

1797, 1742

**Answer: 1797, 1742**

**These are Introduction to Machine Learning Week 10 Assignment 10 Answers**

More Weeks of INTRODUCTION TO MACHINE LEARNING: Click here

More Nptel Courses: Click here

**Session: JAN-APR 2023**

**Course Name: Introduction to Machine Learning**

**Course Link: Click Here**

**These are Introduction to Machine Learning Week 10 Assignment 10 Answers**

**Q1.Consider the following one dimensional data set: 12, 22, 2, 3, 33, 27, 5, 16, 6, 31, 20, 37, 8 and 18. Given k=3 and initial cluster centers to be 5, 6 and 31, what are the final cluster centres obtained on applying the k-means algorithm?**

a. 5, 18, 30

b. 5, 18, 32

c. 6, 19, 32

d. 4.8, 17.6, 32

e. None of the above

**Answer: d. 4.8, 17.6, 32**

**Q2. For the previous question, in how many iterations will the k-means algorithm converge?**

a. 2

b. 3

c. 4

d. 6

e. 7

**Answer: c. 4**

**These are Introduction to Machine Learning Week 10 Assignment 10 Answers**

**Q3. In the lecture on the BIRCH algorithm, it is stated that using the number of points N, sum of points SUM and sum of squared points SS, we can determine the centroid and radius of the combination of any two clusters A and B. How do you determine the centroid of the combined cluster? (In terms of N,SUM and SS of both the clusters)**

a. SUMA+SUMB

b. SUMA/NA+SUMB/NB

c. SUMA+SUMB/NA+NB

d. SSA+SSB/NA+NB

**Answer: c. SUMA+SUMB/NA+NB**

**Q4. What assumption does the CURE clustering algorithm make with regards to the shape of the clusters?**

a. No assumption

b. Spherical

c. Elliptical

**Answer: a. No assumption**

**These are Introduction to Machine Learning Week 10 Assignment 10 Answers**

**Q5. What would be the effect of increasing MinPts in DBSCAN while retaining the same Eps parameter? (Note that more than one statement may be correct)**

a. Increase in the sizes of individual clusters

b. Decrease in the sizes of individual clusters

c. Increase in the number of clusters

d. Decrease in the number of clusters

**Answer: b, c**

For the next question, kindly download the dataset – DS1. The first two columns in the dataset correspond to the co-ordinates of each data point. The third column corresponds two the actual cluster label.**DS1: Click here**

**Q6. Visualize the dataset DS1. Which of the following algorithms will be able to recover the true clusters (first check by visual inspection and then write code to see if the result matches to what you expected).**

a. K-means clustering

b. Single link hierarchical clustering

c. Complete link hierarchical clustering

d. Average link hierarchical clustering

**Answer: b. Single link hierarchical clustering**

**These are Introduction to Machine Learning Week 10 Assignment 10 Answers**

**Q7. Consider the similarity matrix given below: Which of the following shows the hierarchy of clusters created by the single link clustering algorithm.**

a.

b.

c.

d.

**Answer: b**

**Q8. For the similarity matrix given in the previous question, which of the following shows the hierarchy of clusters created by the complete link clustering algorithm.**

a.

b.

c.

d.

**Answer: d**

**These are Introduction to Machine Learning Week 10 Assignment 10 Answers**

More Weeks of Introduction to Machine Learning: Click Here

More Nptel courses: https://progiez.com/nptel

The content uploaded on this website is for reference purposes only. Please do it yourself first. |