# Data Analytics With Python Week 11 Answers

Q1. ________ is used for calculating distance measures in clustering using python

a. distance_matrix
b. spatial_matrix
c. scipy_matrix
d. distance.matrix

Q2. The formula for dissimilarity computation between two objects for categorical variables is –
Here p is a categorical variable and m denotes the number of matches.

• D(i,j) = p-m / p
• D(i,j) = p-m / m
• D(i,j) = m-p / p
• D(i,j) = m-p / m

Q3. Select the correct option for a data set with 7 objects and an interval-scaled variable ‘f’ we have the following measurements: f = (1, 2, 3, 4, 5, 8, 50) containing one outlying value.

• Std deviation (std_f) and mean absolute deviation (s_f) are equally affected
• Mean absolute deviation (s_f) is more affected by the outlier
• Std deviation (std_f) is more affected by the outlier
• None of these

Q4. Which of the following is true for K-means clustering?

• It comes under the partitioning method
• The number of clusters is predefined for this method
• Cluster similarity is measure in regard to the mean value of the objects in a cluster
• All of the above

Q5. Which of the following can act as possible termination conditions in K-Means?

1. For a fixed number of iterations.
2. Assignment of observations to clusters does not change between iterations. Except for cases with a bad local minimum.
3. Centroids do not change between successive iterations.
4. Terminate when Residual Sum of Squares (RSS) falls below a threshold.
• 1,3 and 4
• 1,2,3 and 4
• 2 and 3
• None of these

Q6. In the figure below, if you draw a horizontal line on y-axis for y=2. What will be the number of clusters formed?

Q7. Which of the following clustering requires merging approach?