Q1. ________ is used for calculating distance measures in clustering using python
Q2. The formula for dissimilarity computation between two objects for categorical variables is –
Here p is a categorical variable and m denotes the number of matches.
- D(i,j) = p-m / p
- D(i,j) = p-m / m
- D(i,j) = m-p / p
- D(i,j) = m-p / m
Q3. Select the correct option for a data set with 7 objects and an interval-scaled variable ‘f’ we have the following measurements: f = (1, 2, 3, 4, 5, 8, 50) containing one outlying value.
- Std deviation (std_f) and mean absolute deviation (s_f) are equally affected
- Mean absolute deviation (s_f) is more affected by the outlier
- Std deviation (std_f) is more affected by the outlier
- None of these
Q4. Which of the following is true for K-means clustering?
- It comes under the partitioning method
- The number of clusters is predefined for this method
- Cluster similarity is measure in regard to the mean value of the objects in a cluster
- All of the above
Q5. Which of the following can act as possible termination conditions in K-Means?
- For a fixed number of iterations.
- Assignment of observations to clusters does not change between iterations. Except for cases with a bad local minimum.
- Centroids do not change between successive iterations.
- Terminate when Residual Sum of Squares (RSS) falls below a threshold.
- 1,3 and 4
- 1,2,3 and 4
- 2 and 3
- None of these
Q6. In the figure below, if you draw a horizontal line on y-axis for y=2. What will be the number of clusters formed?
Q7. Which of the following clustering requires merging approach?
Q8. State True or False: Hierarchical clustering should primarily be used for exploration
Q9. State True or False: For finding dissimilarity between two clusters in hierarchical clustering, average-link is the only metric used
Q10. Hierarchical clustering can either be an agglomerative or divisive algorithm