Child pages
  • Data Clustering (7.0)

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »


The root page BlabC:BayesiaLab Home could not be found in space BayesiaLab.


Learning | Clustering | Data Clustering

Data Clustering is a form of unsupervised learning that is utilized to segment the data. The output of the algorithm is a new variable,  [Factor_i]. The states of this new variable correspond to the created segments. 

There are various reasons to use Data Clustering:

  • For finding observations that look the same;
  • For finding observations that behave the same;
  • For representing an unobserved dimension;
  • For compactly representing the joint probability distribution.

From a technical point of view, the segment should be:

  • Homogeneous/pure;
  • Have clear differences with the other segments;
  • Be stable.

From a functional point of view, the segments should be:

  • Easy to understand;
  • Operational;
  • Be a fair representation of the data.


Data Clustering has been updated in versions 5.1 and 5.2.