Learning | Clustering | Data Clustering
Data Clustering is a form of unsupervised learning that is utilized to segment the data. The output of the algorithm is a new variable, [Factor_i]. The states of this new variable correspond to the created segments.
There are various reasons to use Data Clustering:
- For finding observations that look the same
- For finding observations that behave the same
- For representing an unobserved dimension
- For compactly representing the joint probability distribution
Even though some metrics are available to judge the technical quality of the created variable, the practical quality is usually quite subjective as a good segmentation should be easily interpretable. Another important quality relies on the stability of the solution.