# Introduction to Machine Learning | Week 10

Session: JAN-APR 2024

Course name: Introduction to Machine Learning

#### Q1. K-means algorithm is not a particularly sophisticated approach for Image Segmentation tasks. Choose the best possible explanation from below which supports the claim:a) It takes no account of the spatial proximity of different pixels.b) The curse of dimensionality does not affect the performance of K-means algorithm, as it effectively handles high-dimensional data with minimal loss of accuracy.c) The algorithm requires the number of clusters (K) to be specified beforehand.d) Initialization does not affect K-means.

Q2. The pairwise distance between 6 points is given below. Which of the option shows the hierarchy of clusters created by single link clustering algorithm?
a)
b)
c)
d)

These are Introduction to Machine Learning Week 10 Assignment 10 Answers

Q3. For the pairwise distance matrix given in the previous question, which of the following shows the hierarchy of clusters created by the complete link clustering algorithm.
a)
b)
c)
d)

Q4. In BIRCH, using number of points N, sum of points SUM and sum of squared points SS, we can determine the centroid and radius of the combination of any two clusters A and B. How do you determine the radius of the combined cluster? (In terms of N,SUM and SS of both two clusters A and B) Radius of a cluster is given by:
Radius of a cluster is given by:
Note: We use the following definition of radius from the BIRCH paper: ”Radius is the average distance from the member points to the centroid.”

These are Introduction to Machine Learning Week 10 Assignment 10 Answers

Q5. Statement 1: CURE is robust to outliers.
Statement 2: Because of multiplicative shrinkage, the effect of outliers is dampened.

Statement 1 is true. Statement 2 is true. Statement 2 is the correct reason for statemnet 1.
Statement 1 is true. Statement 2 is true. Statement 2 is not the correct reason for statemnet 1.
Statement 1 is true. Statement 2 is false.
Both statements are false.

Answer: Statement 1 is true. Statement 2 is true. Statement 2 is the correct reason for statemnet 1.

Q6. Which of the following statements about the Rand Index is true?
It is insensitive to the permutations of cluster labels
It is biased towards larger clusters
It cannot handle overlapping clusters
It is unaffected by outliers in the data

Answer: It is insensitive to the permutations of cluster labels

These are Introduction to Machine Learning Week 10 Assignment 10 Answers

Q7. For Rand-index: RI = a+b(n2). ‘a’ in rand-index can be viewed as true positives (pair of points belonging to the same cluster) and ‘b’ as true negatives (pair of points belonging to different clusters). How are rand-index and accuracy related?
rand-index = accuracy
rand-index = 1.01×accuracy
rand-index = accuracy/2
None of the above

Q8. Run BIRCH on the input features of iris dataset using Birch(n_clusters=5, threshold=2). What is the rand-index obtained?
0.68
0.71
0.88
0.98

These are Introduction to Machine Learning Week 10 Assignment 10 Answers

Q9. Run PCA on Iris dataset input features with n components = 2. Now run DBSCAN using DBSCAN(eps=0.5, min samples=5) on both the original features and the PCA features. What are their respective number of outliers/noisy points detected by DBSCAN?
As an extra, you can plot the PCA features on a 2D plot using matplotlib.pyplot.scatter with parameter c = y pred (where y pred is the cluster prediction) to visualise the clusters and outliers.

10, 10
17, 7
21, 11
5, 10

These are Introduction to Machine Learning Week 10 Assignment 10 Answers

Session: JULY-DEC 2023

Course Name: Introduction to Machine Learning

#### These are Introduction to Machine Learning Week 10 Assignment 10 Answers

Q1. The pairwise distance between 6 points is given below. Which of the option shows the hierarchy of clusters created by single link clustering algorithm?

Q2. For the pairwise distance matrix given in the previous question, which of the following shows the hierarchy of clusters created by the complete link clustering algorithm.

These are Introduction to Machine Learning Week 10 Assignment 10 Answers

Q3. In BIRCH, using number of points N, sum of points SUM and sum of squared points SS, we can determine the centroid and radius of the combination of any two clusters A and B. How do you determine the radius of the combined cluster? (In terms of N, SUM and SS of both two clusters A and B)
Radius of a cluster is given by:
Note: We use the following definition of radius from the BIRCH paper:
“Radius is the average distance from the member points to the centroid.”

These are Introduction to Machine Learning Week 10 Assignment 10 Answers

Q4. Run K-means on the input features of the MNIST dataset using the following initialization:
KMeans(nclusters=10,randomstate=seed)
Usually, for clustering tasks, we are not given labels, but since we do have labels for our dataset, we can use accuracy to determine how good our clusters are.
Label the prediction class for all the points in a cluster as the majority true label.
E.g. {a,a,b} would be labeled as {a,a,a}
What is the accuracy of the resulting labels?

0.790
0.893
0.702
0.933

These are Introduction to Machine Learning Week 10 Assignment 10 Answers

Q5. For the same clusters obtained in the previous question, calculate the rand-index. The formula for rand-index:
R=a+bCn2
where,
a = the number of times a pair of elements occur in the same cluster in both sequences.
b = the number of times a pair of elements occur in the different clusters in both sequences.
Note: The two clusters are given by: (1) Ground truth labels, (2) Prediction labels using clustering as directed in Q4.

0.879
0.893
0.919
0.933

These are Introduction to Machine Learning Week 10 Assignment 10 Answers

Q6. a in rand-index can be viewed as true positives(pair of points belonging to the same cluster) and
b as true negatives(pair of points belonging to different clusters). How, then, are rand-index and accuracy from the previous two questions related?

rand-index = accuracy
rand-index = 1.18×accuracy
rand-index = accuracy/2
None of the above

These are Introduction to Machine Learning Week 10 Assignment 10 Answers

Q7. Run BIRCH on the input features of MNIST dataset using Birch(nclusters=10,threshold=1).
What is the rand-index obtained?

0.91
0.96
0.88
0.98

Q8. Run PCA on MNIST dataset input features with n components = 2. Now run DBSCAN using DBSCAN(eps=0.5,minsamples=5)
on both the original features and the PCA features. What are their respective number of outliers/noisy points detected by DBSCAN?
As an extra, you can plot the PCA features on a 2D plot using matplotlib.pyplot.scatter with parameter c=y−pred (where y−pred is the cluster prediction) to visualise the clusters and outliers.

1600, 1522
1500, 1482
1000, 1000
1797, 1742

These are Introduction to Machine Learning Week 10 Assignment 10 Answers

Session: JAN-APR 2023

Course Name: Introduction to Machine Learning

#### Q1.Consider the following one dimensional data set: 12, 22, 2, 3, 33, 27, 5, 16, 6, 31, 20, 37, 8 and 18. Given k=3 and initial cluster centers to be 5, 6 and 31, what are the final cluster centres obtained on applying the k-means algorithm?a. 5, 18, 30b. 5, 18, 32c. 6, 19, 32d. 4.8, 17.6, 32e. None of the above

Q2. For the previous question, in how many iterations will the k-means algorithm converge?
a. 2
b. 3
c. 4
d. 6
e. 7

These are Introduction to Machine Learning Week 10 Assignment 10 Answers

Q3. In the lecture on the BIRCH algorithm, it is stated that using the number of points N, sum of points SUM and sum of squared points SS, we can determine the centroid and radius of the combination of any two clusters A and B. How do you determine the centroid of the combined cluster? (In terms of N,SUM and SS of both the clusters)
a. SUMA+SUMB
b. SUMA/NA+SUMB/NB
c. SUMA+SUMB/NA+NB
d. SSA+SSB/NA+NB

Q4. What assumption does the CURE clustering algorithm make with regards to the shape of the clusters?
a. No assumption
b. Spherical
c. Elliptical

These are Introduction to Machine Learning Week 10 Assignment 10 Answers

Q5. What would be the effect of increasing MinPts in DBSCAN while retaining the same Eps parameter? (Note that more than one statement may be correct)
a. Increase in the sizes of individual clusters
b. Decrease in the sizes of individual clusters
c. Increase in the number of clusters
d. Decrease in the number of clusters

For the next question, kindly download the dataset – DS1. The first two columns in the dataset correspond to the co-ordinates of each data point. The third column corresponds two the actual cluster label.

Q6. Visualize the dataset DS1. Which of the following algorithms will be able to recover the true clusters (first check by visual inspection and then write code to see if the result matches to what you expected).
a. K-means clustering

These are Introduction to Machine Learning Week 10 Assignment 10 Answers

Q7. Consider the similarity matrix given below: Which of the following shows the hierarchy of clusters created by the single link clustering algorithm.

a.

b.

c.

d.

Q8. For the similarity matrix given in the previous question, which of the following shows the hierarchy of clusters created by the complete link clustering algorithm.

a.

b.

c.

d.