Child pages
  • Discretization Wizard (8.0)

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Contents

The root page BlabC:BayesiaLab Home could not be found in space BayesiaLab.

Context

BayesiaLab requires the discretization of the continuous variables. This process basically consists in creating a clone of the hidden continuous variable, with discrete states, usually called bins in this context. Discretization has obviously a huge impact on the model because it defines the perception of the domain.

One of the most important parameters of discretization is the number of bins. It has indeed a direct impact on the model complexity. The more bins there are, the larger the (conditional) probability tables are.

In the context of machine learning, this means that we need to have enough data to estimate all these probabilities. The size of the data set must therefore be taken into consideration when choosing the number of bins. 

New Feature: Intervals

As of version 8.0, the number of bins proposed by default in the discretization wizard is calculated using the number of observations. This number is between 3 and 7. The choice is rather conservative, which means that BayesiaLab should be able to discover two-parent structures when the number of observations exceeds a few hundred.

The number of bins proposed in Learning | Discretization tools does not use this new heuristic. It is either your previous choice, or the one defined in Window | Preferences | Discretization