Question
What types of algorithms does BayesiaLab use for learning network structures?
Answer
We have developed our proprietary scorebased learning algorithms. As opposed to the constraintbased algorithms that use independence tests to add or remove arcs between nodes, we utilize the MDL score (Minimum Description Length) to measure the quality of candidate networks with respect to the available data.
This score, which is derived from Information Theory, allows to automatically take into account the data likelihood with respect to the network and the structural complexity of the network.
The Minimum Description Length (MDL) score is a twocomponent score that has been traditionally used in the Artificial Intelligence community for estimating the number of bits required to represent a model and the data given this model. For structural learning of Bayesian networks, the model is the Bayesian network (graph plus probability tables), whereas the number of bits for representing the data given the Bayesian network is inversely proportional to the probability of the observations returned by the model.
where: represents the BayesiaLab Structural Coefficient (the default value is 1), a parameter that allows changing the weight of the MDL structural part (the lower its value, the greater the complexity of the resulting networks),  DL(B) the number of bits to represent the Bayesian network B (graph and probabilities), and
 DL(DB) the number of bits to represent the dataset D given the Bayesian network B.

The number of bits to represent a Bayesian network is equal to the number of bits to represent the structure plus the number of bits to represent the probability distributions. where G refers to the Graphical structure, and P to the set of Probability tables. 
The coding of the structure implies the identification of each node plus its parents. where 
The number of bits to represent the probability distributions is proportional to the number of cells of the conditional probability tables. where  is the number of states of random variable ,
 p is the probability associated with the cell.
As this probability is not known prior to learning the network, we are using the following classical heuristic: where N is the number of observations in the data set. 


The number of bits for representing the data given the Bayesian network is inversely proportional to the probability of the observations returned by the model.
where  is the ndimensional observation described in row j, and
 is the joint probability of this observation returned by the Bayesian network B.
The chain rule allows rewriting this equation with: 

For each candidate Bayesian network, which is generated during the search and evaluated with the MDL score, the corresponding parameters are computed using Maximum Likelihood Estimation.