Child pages
  • Maximum Likelihood Estimation

Contents

Question

What algorithms does BayesiaLab use for learning network parameters?

Answer

The parameters of a network are computed using Maximum Likelihood Estimationi.e. the probability of each state corresponds to the observed frequency in the dataset. 
Let's consider the simple network below:

 

The marginal probability distribution of PA is estimated as:

where N(.) represents the number of occurrences of the specified configuration in the dataset.

The conditional probability distribution of X|PA is estimated as:

Priors can also be taken into account when estimating the parameters. Priors would reflect the a-priori knowledge of an analyst regarding the domain, i.e. expert knowledge. See also Prior Knowledge for Structural Learning.

These priors are expressed with an analyst-specified, initial Bayesian network (structure and parameters), plus an analyst-specified Prior Samples. The Prior Samples represent the analyst's own degree of confidence in the priors.

where

  • M0 is the degree of confidence in the prior.

  • P0 is the joint probability returned by the prior Bayesian network.

These two terms are used to generate virtual samples that are subsequently combined with the observed samples from the dataset.

Priors are defined by selecting Learning | Generate Prior Samples.

The current Bayesian network is used to compute P0

A text field allows to set M0

The existence of a Virtual Database is indicated by an icon in the lower right corner of the graph window, next to the "real dataset" icon.

Right-clicking on the Virtual Database icon displays the structure of the prior knowledge that was used for generating the virtual samples. The virtual samples will be combined with the observed ("real") samples during the learning process.

 

Smoothed Probability Estimation allows you to define prior knowledge in such a way that all the variables are marginally independent (fully unconnected network), and the marginal probability distributions of all nodes are uniform. For instance, if the number of Prior Samples is set to 1, one observation ("occurrence") would be "spread across" the states of each node, essentially assigning a fraction of an observation to each node's states.

specifies the number of Prior Samples.