Learning | Clustering | Data Clustering
Data Clustering is a form of unsupervised learning that is utilized to segment the data. The output of the algorithm is a new variable, [Factor_i]. The states of this new variable correspond to the created segments.
There are various reasons to use Data Clustering:
- For finding observations that look the same;
- For finding observations that behave the same;
- For representing an unobserved dimension;
- For compactly representing the joint probability distribution.
From a technical point of view, the segment should be:
- Have clear differences with the other segments;
- Be stable.
From a functional point of view, the segments should be:
- Easy to understand;
- Be a fair representation of the data.