Introduction to Machine Learning Week 10 Answers
Q1. The pairwise distance between 6 points is given below. Which of the option shows the hierarchy of clusters created
by single link clustering algorithm?
Q2. For the pairwise distance matrix given in the previous question, which of the following shows the hierarchy of clusters created
by the complete link clustering algorithm.
Q3. In BIRCH, using number of points N, sum of points SUM and sum of squared points SS, we can determine the centroid
and radius of the combination of any two clusters A and B. How do you determine the radius of the combined cluster?
(In terms of N,SUM and SS of both two clusters A and B)
Q4. Statement 1: CURE is robust to outliers.
Statement 2: Because of multiplicative shrinkage, the effect of outliers is dampened.
Q5. Run K-means on the input features of the iris dataset using the following initialization:
Q6. For the same clusters obtained in the previous question, calculate the rand-index. Formula for rand-index:
Q7. a in rand-index can be viewed as true positives(pair of points belonging to the same cluster) and b as true negatives(pair of points
belonging to different clusters). How then, are rand-index and accuracy from the previous two questions related?
Q8. Run BIRCH on the input features of iris dataset using Birch(n clusters=3, threshold=1). What is the rand-index obtained?
Q9. Run BIRCH on the following values of threshold parameter: [0.01, 0.02, 0.03, …, 0.99, 1.00] using the same command as given
in the previous question. What value of threshold achieves the best rand-index?
Q10. Run PCA on Iris dataset input features with n components = 2. Now run DBSCAN using DBSCAN(eps=0.5, min samples=5) on both the original features and the PCA features. What are their respective number of outliers/noisy points detected by DBSCAN?
As an extra, you can plot the PCA features on a 2D plot using matplotlib.pyplot.scatter with parameter c = y-pred (where y-pred is the cluster prediction) to visualise the clusters and outliers.