Introduction to Machine Learning Week 10 Answers
Q1. The pairwise distance between 6 points is given below. Which of the option shows the hierarchy of clusters created
by single link clustering algorithm?
(a)
(b)
(c)
(d)
Answer:- (b)
Q2. For the pairwise distance matrix given in the previous question, which of the following shows the hierarchy of clusters created
by the complete link clustering algorithm.
- (a)
- (b)
- (c)
- (d)
Answer: (b)
Q3. In BIRCH, using number of points N, sum of points SUM and sum of squared points SS, we can determine the centroid
and radius of the combination of any two clusters A and B. How do you determine the radius of the combined cluster?
(In terms of N,SUM and SS of both two clusters A and B)
Answer: c
Q4. Statement 1: CURE is robust to outliers.
Statement 2: Because of multiplicative shrinkage, the effect of outliers is dampened.
Answer: a
Q5. Run K-means on the input features of the iris dataset using the following initialization:
Answer: b
Q6. For the same clusters obtained in the previous question, calculate the rand-index. Formula for rand-index:
Answer:- a
Q7. a in rand-index can be viewed as true positives(pair of points belonging to the same cluster) and b as true negatives(pair of points
belonging to different clusters). How then, are rand-index and accuracy from the previous two questions related?
Answer: d
Q8. Run BIRCH on the input features of iris dataset using Birch(n clusters=3, threshold=1). What is the rand-index obtained?
Answer: c
Q9. Run BIRCH on the following values of threshold parameter: [0.01, 0.02, 0.03, …, 0.99, 1.00] using the same command as given
in the previous question. What value of threshold achieves the best rand-index?
Answer: b
Q10. Run PCA on Iris dataset input features with n components = 2. Now run DBSCAN using DBSCAN(eps=0.5, min samples=5) on both the original features and the PCA features. What are their respective number of outliers/noisy points detected by DBSCAN?
As an extra, you can plot the PCA features on a 2D plot using matplotlib.pyplot.scatter with parameter c = y-pred (where y-pred is the cluster prediction) to visualise the clusters and outliers.
Answer: b