INTRODUCTION TO MACHINE LEARNING Week 10

Session: JULY-DEC 2023

Course Name: Introduction to Machine Learning

Course Link: Click Here

These are Introduction to Machine Learning Week 10 Assignment 10 Answers


Q1. The pairwise distance between 6 points is given below. Which of the option shows the hierarchy of clusters created by single link clustering algorithm?

Answer: (b)


Q2. For the pairwise distance matrix given in the previous question, which of the following shows the hierarchy of clusters created by the complete link clustering algorithm.

Answer: (b)


These are Introduction to Machine Learning Week 10 Assignment 10 Answers


Q3. In BIRCH, using number of points N, sum of points SUM and sum of squared points SS, we can determine the centroid and radius of the combination of any two clusters A and B. How do you determine the radius of the combined cluster? (In terms of N, SUM and SS of both two clusters A and B)
Radius of a cluster is given by:
Radius=√SSN−(SUMN)2
Note: We use the following definition of radius from the BIRCH paper:
“Radius is the average distance from the member points to the centroid.”

Radius=√SSANA−(SUMANA)2+SSBNB−(SUMBNB)2
Radius=√SSANA−(SUMANA)2+SSBNB−(SUMBNB)2
Radius=√SSA+SSBNA+NB−(SUMA+SUMBNA+NB)2
Radius=√SSANA+SSBNB−(SUMA+SUMBNA+NB)2

Answer: C. Radius=√SSA+SSB/NA+NB−(SUMA+SUMB/NA+NB)^2


These are Introduction to Machine Learning Week 10 Assignment 10 Answers


Q4. Run K-means on the input features of the MNIST dataset using the following initialization:
KMeans(nclusters=10,randomstate=seed)
Usually, for clustering tasks, we are not given labels, but since we do have labels for our dataset, we can use accuracy to determine how good our clusters are.
Label the prediction class for all the points in a cluster as the majority true label.
E.g. {a,a,b} would be labeled as {a,a,a}
What is the accuracy of the resulting labels?

0.790
0.893
0.702
0.933

Answer: 0.790


These are Introduction to Machine Learning Week 10 Assignment 10 Answers


Q5. For the same clusters obtained in the previous question, calculate the rand-index. The formula for rand-index:
R=a+bCn2
where,
a = the number of times a pair of elements occur in the same cluster in both sequences.
b = the number of times a pair of elements occur in the different clusters in both sequences.
Note: The two clusters are given by: (1) Ground truth labels, (2) Prediction labels using clustering as directed in Q4.

0.879
0.893
0.919
0.933

Answer: 0.933


These are Introduction to Machine Learning Week 10 Assignment 10 Answers


Q6. a in rand-index can be viewed as true positives(pair of points belonging to the same cluster) and
b as true negatives(pair of points belonging to different clusters). How, then, are rand-index and accuracy from the previous two questions related?

rand-index = accuracy
rand-index = 1.18×accuracy
rand-index = accuracy/2
None of the above

Answer: None of the above


These are Introduction to Machine Learning Week 10 Assignment 10 Answers


Q7. Run BIRCH on the input features of MNIST dataset using Birch(nclusters=10,threshold=1).
What is the rand-index obtained?

0.91
0.96
0.88
0.98

Answer: 0.96


Q8. Run PCA on MNIST dataset input features with n components = 2. Now run DBSCAN using DBSCAN(eps=0.5,minsamples=5)
on both the original features and the PCA features. What are their respective number of outliers/noisy points detected by DBSCAN?
As an extra, you can plot the PCA features on a 2D plot using matplotlib.pyplot.scatter with parameter c=y−pred (where y−pred is the cluster prediction) to visualise the clusters and outliers.

1600, 1522
1500, 1482
1000, 1000
1797, 1742

Answer: 1797, 1742


These are Introduction to Machine Learning Week 10 Assignment 10 Answers

More Weeks of INTRODUCTION TO MACHINE LEARNING: Click here

More Nptel Courses: Click here


Session: JAN-APR 2023

Course Name: Introduction to Machine Learning

Course Link: Click Here

These are Introduction to Machine Learning Week 10 Assignment 10 Answers


Q1.Consider the following one dimensional data set: 12, 22, 2, 3, 33, 27, 5, 16, 6, 31, 20, 37, 8 and 18. Given k=3 and initial cluster centers to be 5, 6 and 31, what are the final cluster centres obtained on applying the k-means algorithm?
a. 5, 18, 30
b. 5, 18, 32
c. 6, 19, 32
d. 4.8, 17.6, 32
e. None of the above

Answer: d. 4.8, 17.6, 32


Q2. For the previous question, in how many iterations will the k-means algorithm converge?
a. 2
b. 3
c. 4
d. 6
e. 7

Answer: c. 4


These are Introduction to Machine Learning Week 10 Assignment 10 Answers


Q3. In the lecture on the BIRCH algorithm, it is stated that using the number of points N, sum of points SUM and sum of squared points SS, we can determine the centroid and radius of the combination of any two clusters A and B. How do you determine the centroid of the combined cluster? (In terms of N,SUM and SS of both the clusters)
a. SUMA+SUMB
b. SUMA/NA+SUMB/NB
c. SUMA+SUMB/NA+NB
d. SSA+SSB/NA+NB

Answer: c. SUMA+SUMB/NA+NB


Q4. What assumption does the CURE clustering algorithm make with regards to the shape of the clusters?
a. No assumption
b. Spherical
c. Elliptical

Answer: a. No assumption


These are Introduction to Machine Learning Week 10 Assignment 10 Answers


Q5. What would be the effect of increasing MinPts in DBSCAN while retaining the same Eps parameter? (Note that more than one statement may be correct)
a. Increase in the sizes of individual clusters
b. Decrease in the sizes of individual clusters
c. Increase in the number of clusters
d. Decrease in the number of clusters

Answer: b, c


For the next question, kindly download the dataset – DS1. The first two columns in the dataset correspond to the co-ordinates of each data point. The third column corresponds two the actual cluster label.
DS1: Click here

Q6. Visualize the dataset DS1. Which of the following algorithms will be able to recover the true clusters (first check by visual inspection and then write code to see if the result matches to what you expected).
a. K-means clustering
b. Single link hierarchical clustering
c. Complete link hierarchical clustering
d. Average link hierarchical clustering

Answer: b. Single link hierarchical clustering


These are Introduction to Machine Learning Week 10 Assignment 10 Answers


Q7. Consider the similarity matrix given below: Which of the following shows the hierarchy of clusters created by the single link clustering algorithm.

image 2

a.

image 3

b.

image 4

c.

image 5

d.

image 6

Answer: b


Q8. For the similarity matrix given in the previous question, which of the following shows the hierarchy of clusters created by the complete link clustering algorithm.

a.

image 7

b.

image 8

c.

image 9

d.

image 10

Answer: d


These are Introduction to Machine Learning Week 10 Assignment 10 Answers

More Weeks of Introduction to Machine Learning: Click Here

More Nptel courses: https://progiez.com/nptel


These are Introduction to Machine Learning Week 10 Assignment 10 Answers
The content uploaded on this website is for reference purposes only. Please do it yourself first.