Edit | Edit Structural Priors
Graph Contextual Menu | Edit Structural Priors
When expert knowledge is available, it may be very useful to incorporate it into the learning process to improve the quality of the obtained models.
BayesiaLab offers various solutions for specifying prior knowledge:
- Structural Constraints: the following priors just bias the structure, not the estimation of the probabilities. There are 3 ways to define such priors:
- Creation of Fixed Arcs: the unsupervised structural learning algorithms that can start with "Delete Unfixed Arcs" or "Keep Structure" (i.e. EQ, TabooEQ, and Taboo) keep the arcs that have been fixed, even when data does not support the corresponding relationship.
- Creation of Forbidden Arcs: forbidden arcs are never added by a structural learning algorithm.
- Temporal Indices: a node whose temporal index is strictly greater than the temporal index of another node cannot be defined as its ancestor during structural learning.
- Dirichlet Priors, virtual particles sampled from a fully specified Bayesian network that represents the knowledge of the expert. There are also 4 ways to define such priors:
- Uniform Prior Samples: the Bayesian network used for generating the particles describes a joint probability distribution where all nodes are independent and uniformly distributed. This so-called uninformative prior specifies that everything is possible, i.e. it prevents to have zeros in the probability tables. These uniform priors are mainly used to bias the estimation of the probabilities. However, they can also have an impact on the machine-learned structure.
- Parameter Updating and Hyperparameter Updating: the current fully specified Bayesian network is used in conjunction with particles (partially) described via the monitors, data set, or evidence scenario file, to update the local probability distributions associated with some selected nodes. This feature cannot be used with a structural learning algorithm.
- Prior Samples: the current fully specified Bayesian network is used to generate virtual particles that are mixed with the particles described in the data set associated with the network. The augmented set of particles is used by all learning algorithms and has, therefore, an impact on both the structure and the local probability distributions.
- Structural Coefficient: all BayesiaLab structural learning algorithms use the Minimum Description Length (MDL) score to evaluate the quality of the network given the associated data set. Minimizing this score is approximatively equivalent to maximizing the a posteriori of the network given the data. The approximation comes from the priors of the networks that are approximated with a heuristic that associates a probability inversely proportional to the complexity of the network. By default, both the prior and the likelihood terms have the same weight (which is quite conservative). There are 2 ways to change the weight of the heuristic used for approximating the priors:
- Overall Structural Coefficient: this parameter, with values between 0 and 150, allows modifying the number of particles that are perceived during structural learning. Choosing a value less than 1 is equivalent to increasing the number of particles and therefore the sensitivity of the learning algorithms. Choosing a value greater than 1 is equivalent to reducing the number of particles and thus makes the learning algorithms more conservative. This parameter basically represents a way of modifying the significance threshold to represent relationships with arcs.
- Local Structural Coefficients: MDL is a decomposable score, which means that the MDL score of the network is equal to the sum of the MDL scores of its nodes. A Local Structural Coefficient of less than 1 decreases the cost of adding connections to the associated node, and therefore increases the priors of the corresponding structures. Values greater than 1 increase the cost of adding arcs, and therefore decrease the priors of the corresponding networks.
New Feature: Structural Priors
As of version 9.0, prior knowledge can be defined at the arc level, i.e. even more locally than what is possible with Local Structural Coefficients, which operate at the node level.
The prior values can range between -1.0 and 1.0, with -1.0 strongly increasing the cost of adding the link, and 1.0 strongly reducing it.
-1.0 is not equivalent to forbidding the arc. If the relationship in the data is strong, the arc can still be added.
1.0 is not equivalent to forcing the arc. No arc will be added if the relationship is really too weak in the data.
Clicking on a table header allows you to sort the table according to this dimension.
The color of the Orientation cell is identical for the arcs that have the same Prior for both directions.
These functions allow the use of Structural Priors Dictionaries.
Store Priors on Arcs
This function associates the Prior Values with the comments of the arcs represented in the current network.
This feature returns a preview of the priors that are defined. The thickness of the link is proportional to the absolute value of the prior. The positive priors are represented in blue, the negative in red. The thickness and color of each half of a link differ if the priors are different for each direction.