Child pages
  • Score-Based Learning Algorithms

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Localtab Group
Localtab
activetrue
titleMinimum Description Length

The Minimum Description Length (MDL) score is a two-component score that has been traditionally used in the Artificial Intelligence community for estimating the number of bits required to represent a model and the data given this model. For structural learning of Bayesian networks, the model is the Bayesian network (graph plus probability tables), whereas the number of bits for representing the data given the Bayesian network is inversely proportional to the probability of the observations returned by the model.

Image Added

where:

  •  

    Image Added represents the BayesiaLab Structural Coefficient (the default value is 1), a parameter that allows changing the weight of the MDL structural part (the lower its value, the greater the complexity of the resulting networks),

  • DL(B) the number of bits to represent the Bayesian network B (graph and probabilities), and
  • DL(D|B) the number of bits to represent the dataset D given the Bayesian network B.
Localtab
titleBayesian Network Part
Localtab Group
Localtab
activetrue
titleDL(B)

The number of bits to represent a Bayesian network is equal to the number of bits to represent the structure plus the number of bits to represent the probability distributions.

Image Added

where G refers to the Graphical structure, and P to the set of Probability tables.

Localtab
titleDL(G)

The coding of the structure implies the identification of each node plus its parents.

Image Added

where 

  • n is the number of random variables (nodes):

     

    Image Added

  • Image Added is the set of the random variables that are parents of Image Added in the graph G

  • and  is

    and Image Added is the number of parents of random variable Image Added.

Localtab
titleDL(P|G)

The number of bits to represent the probability distributions is proportional to the number of cells of the conditional probability tables.

Image Added

where

  • val(X) represents Image Added is the number of states of random variable XImage Added,
  • p is the probability recorded in associated with the cell.

As this probability is not known prior to learning the network, we are using the following classical heuristic:

Image Added

where N is the size number of observations in the datasetdata set.

Localtab
titleData Part

The number of bits for representing the data given the Bayesian network is inversely proportional to the probability of the observations returned by the model.

Image Added

Image Added

Image Added

where

  • Image Added is the n-dimensional observation described in row j, and 
  • Image Added is the joint probability of this observation returned by the Bayesian network B.

The chain rule allows rewriting this equation with:

Image Added

Image Added

For each candidate Bayesian network, which is generated during the search and evaluated with the MDL score, the corresponding parameters are computed using Maximum Likelihood Estimation.

...