Child pages
• Data Clustering (7.0)

You are viewing an old version of this page. View the current version.

Version 6

Contents

The root page BlabC:BayesiaLab Home could not be found in space BayesiaLab.

Context

Learning | Clustering | Data Clustering

Data Clustering is a form of unsupervised learning that is utilized to segment the data. The output of the algorithm is a new variable,  [Factor_i]. The states of this new variable correspond to the created segments.

There are various reasons to use Data Clustering:

• For finding observations that look the same;
• For finding observations that behave the same;
• For representing an unobserved dimension;
• For compactly representing the joint probability distribution.

From a technical point of view, the segment should be:

1. Homogeneous/pure;
2. Have clear differences with the other segments;
3. Be stable.

From a functional point of view, the segments should be:

1. Easy to understand;
2. Operational;
3. Be a fair representation of the data.

History

Data Clustering has been updated in versions 5.1 and 5.2.

New Feature: Meta-Clustering

This new feature has been added for improving the stability of the solution (3rd technical quality). It consists in using Data Clustering on the data set made of a subset of the Factors that have been created while learning. The final solution is thus a summary of the best solutions that have been found.

This graph illustrates Meta-Clustering. The five variables at the very bottom are called Manifest variables. They are the dimensions used for describing the observations contained in the data set. They are used for the initial segmentations, i.e. the creation of the Factor variables  [Factor_1],  [Factor_2], and [Factor_3].  In this example, three Factor variables are thus used for creating the final solution [Factor_4].

Example

The graph below illustrate