Contents

Context

BayesiaLab requires the discretization of the continuous variables. This process basically consists in creating a clone of the hidden continuous variable, with discrete states (usually called bins in this context). Discretization has obviously a huge impact on the model because it defines the perception of the domain.

One of the most important parameters of discretization is the number of bins. It has indeed a direct impact on the model complexity. The more bins there are, the larger the (conditional) probability tables are.

In the context of machine learning, this means that we need to have enough samples to estimate all these probabilities. The size of the data set must therefore be taken into consideration when choosing the number of bins.