Child pages
  • Data Clustering

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


  • Sample Size: the sample size option makes it possible to search for the optimal number of classes on data subsets to improve the convergence speed (a sampling by step/trial). The partition obtaining the best score is then used as the initial partition for the search on the entire data set. It is possible to indicate either a percentage or the exact number of lines to use.
  • Steps Number: the number of steps for the random walk. Knowing that it is possible to stop the search by clicking on the red light of the status bar while preserving the best clustering, this number can be exaggeratedly great.
  • Maximum Drift: indicates the maximum difference between the clusters probabilities during learning and those obtained after missing value completion, i.e. between the theoretical distribution during learning and the effective distribution after imputation over the learning data set.
  • Minimum Cluster Purity in Percentage: defines the minimum allowed purity for a cluster to be kept.
  • Minimum Cluster Size in Percentage: defines the minimum allowed size for a cluster to be kept.

Node Weights

A button displays a dialog box in order to edit weights associated to each variable.

Those weights, with default value 1, are associated with the variables and permit to guide the clustering. A weight greater than 1 will imply that the variable will be more taken into account during the clustering. A zero weight will make the variable purely illustrative.