Introduction to Machine Learning Week 10 Answers

Q1. The pairwise distance between 6 points is given below. Which of the option shows the hierarchy of clusters created
by single link clustering algorithm?


Answer:- (b) 

Q2. For the pairwise distance matrix given in the previous question, which of the following shows the hierarchy of clusters created
by the complete link clustering algorithm.

  • (a) 
  • (b) 
  • (c) 
  • (d)

Answer: (b) 

Q3. In BIRCH, using number of points N, sum of points SUM and sum of squared points SS, we can determine the centroid
and radius of the combination of any two clusters A and B. How do you determine the radius of the combined cluster?
(In terms of N,SUM and SS of both two clusters A and B)

Answer: c

Q4. Statement 1: CURE is robust to outliers.

Statement 2: Because of multiplicative shrinkage, the effect of outliers is dampened.

Answer: a

Q5. Run K-means on the input features of the iris dataset using the following initialization:

Answer: b

Q6. For the same clusters obtained in the previous question, calculate the rand-index. Formula for rand-index:

Answer:- a

Q7. a in rand-index can be viewed as true positives(pair of points belonging to the same cluster) and b as true negatives(pair of points
belonging to different clusters). How then, are rand-index and accuracy from the previous two questions related?

Answer: d

Q8. Run BIRCH on the input features of iris dataset using Birch(n clusters=3, threshold=1). What is the rand-index obtained?

Answer: c

Q9. Run BIRCH on the following values of threshold parameter: [0.01, 0.02, 0.03, …, 0.99, 1.00] using the same command as given
in the previous question. What value of threshold achieves the best rand-index?

Answer: b

Q10. Run PCA on Iris dataset input features with n components = 2. Now run DBSCAN using DBSCAN(eps=0.5, min samples=5) on both the original features and the PCA features. What are their respective number of outliers/noisy points detected by DBSCAN?

As an extra, you can plot the PCA features on a 2D plot using matplotlib.pyplot.scatter with parameter c = y-pred (where y-pred is the cluster prediction) to visualise the clusters and outliers.

Answer: b