Inference | Parameter Updating
A Bayesian network compactly represents the joint probability distribution of the domain defined by its variables. The qualitative and quantitative design of the network is done via the analysis of particles/observations sampled from this domain. In the case of expert based modeling, the utilized particles are coming from the experience of the experts. In the framework of machine learning, the particles have been systematically stored in a structured data set.
The qualitative design consists in adding arcs between nodes for representing direct probabilistic relationships. In the context of expert based modeling, arcs can be defined via brainstorming sessions. When a structured data set is available, the particles are analyzed in order to automatically discover the probabilistic relationships.
The quantitative design consists in setting the probability distribution associated with each node. In BayesiaLab, the distributions are represented by default with tables. For root nodes, i.e. nodes that do not have any incoming arc, the table describes the marginal distribution, i.e. one probability per state of the node. For nodes with parents, the table describes a conditional probability distribution, i.e. one distribution per combination of the parents's states.
In the context of expert based modeling, these probabilities can be collectively estimated via brainstorming sessions by using BEKEE. In the machine learning framework, probabilities are estimated by using the maximum likelihood estimation method, where the probability of each state is its observed frequency in the data set. This method is purely frequentist. However, it is possible to make this method Bayesian by mixing the observed particles (those described in the data set) with virtual particles defined by Dirichlet priors.
Prior to version 8.0, BayesiaLab offered two ways to define Dirichlet priors:
- Learning | Generate Prior Samples: creation of virtual particles (prior samples) by using the joint probability distribution encoded by the current Bayesian network;
- Edit | Edit Smoothed Probability Estimation: creation of virtual particles representing uninformative priors, i.e. particles generated by using the uniform joint distribution. This distribution is encoded with a the fully unconnected Bayesian network, with uniform distributions for all the nodes. The particles are thus spread uniformly across the entire joint probability distribution, defining a prior knowledge that stipulates that everything is possible.
As these virtual particles are mixed with those described in the data set, Dirichlet priors not only impact the estimation of the probabilities, but also bias the structural search.
This new feature offers a third way to define Dirichlet priors. It allows taking into account the observations set via the monitors for updating the quantitative part of the current Bayesian network. The particles described by the observations are used to update the probability distributions of the observed nodes. Given that the particles do not need to be fully observed, the probability tables of the unobserved ancestors of the observed nodes can also be updated.
The joint probability distribution encoded by the current Bayesian network is used to define the Dirichlet priors. The confidence of the prior is set by using two new parameters that have been associated with each row of the probability tables:
- Prior Weight, which defines the number of virtual particles that will be mixed with the particle described via the monitors;
- Discount, whose values are less than or equal to 1, used to gradually reduce the number of virtual particles that will be mixed with the particle described via the monitors.
Let's start with a single node. As illustrated below, our prior for Sex is a uniform distribution.
The new tab Updating allows setting Prior Weight and Discount.
Prior Weight, is set to 100. This means that we consider our prior equivalent to a population of 100 persons, 50 males and 50 females.
Discount is set to its default value 1. This indicates that no particles will be forgotten (monitor-defined particles and prior population).
Upon running Parameter Updating, you are prompted to select if you want to use the Prior Weights that have been defined in the Node Editor or set the same value for all the nodes.
When there is a data set associated with the network, it is also possible to estimate the Prior Weights directly from the particles described in the data.
Upon validation of your choice, a new set of tools is added to the Toolbar:
- : triggers the updating with the current set of evidence;
- : resets the set of evidence and cancel the updating that have been done;
- : allows to observe or not the nodes defined as Not Observable (only when evidence is read from a scenario, or a data set);
- : saves the updated tables and exits from the Parameter Updating tool;
- : exits from the Parameter Updating tool without validating the updated tables.
The text field indicates how many particles have been taken into account for updating the tables.
Let's set evidence on Female. As indicated by Joint Probability, the probability of this observation is 50%, based on the current network.
Upon clicking , the particle is mixed with the 100 virtual particles, and we now have a population of 51 females and 50 males. The probability table is updated with the maximum likelihood estimation on this new population. The new probability of Female is thus 50.5% 0), as indicated by Joint Probability.
Upon clicking a second time on with the same evidence, the probability of Female goes to 50.98% (52/102):
Now suppose that we set evidence on Male and click on . The probability of Male increases from 49.02% to 49.51% (51/103).
Upon clicking on we are prompted to confirm our validation of the updating.
We thus got the following marginal distribution:
The Node Editor shows that the Prior Weight is now equal to 103:
- 100 coming from the defined virtual particles
- 3 coming from the manually observed particles
Had we used a discount of 0.75, the updated table would have been the following:
with a Prior Weight after these 3 manually observed particles equal to 44.5:
- Prior Weight #1: (100 * 0.75) + 1 = 76
- Prior Weight #2: (76 * 0.75) + 1 = 58
- Prior Weight #3: (58 * 0.75) + 1 = 44.5
Let's take now the following network.
Choosing an overall Prior Weight of 100 defines a Prior Weight of 100 for Sex, and spreads this weight for each conditional distribution of Treatment.
In Modeling Mode, Prior Weights can be locally edited by using the Node Editor. It is also possible to define them globally for the node by using the Node Contextual Menu | Properties | Prior Weights. In that case, the Prior Weight is spread uniformly across all the conditions.
When initialized in Validation Mode upon running Parameter Updating, the Prior Weights are spread based on the joint probability of each condition.
Discounts can be locally edited by using the Node Editor. It is also possible to define them globally for the node by using the Node Contextual Menu | Properties | Discounts. In that case, the Discounts are the same for all the conditions.
Below is the result of the update after mixing our virtual particles with a particle corresponding to a male that did not take the treatment.
As we can see below, the table of Treatment has just been updated for the condition Male.
Now suppose that instead of having a complete description of the particle, we simply observed that the person has taken the treatment.
As we can see, even though we do not know the sex, knowing the person took the treatment changed the distribution of Sex. The new particle will thus be split according to the posterior probability of Sex.
Below is the result of the update after mixing our virtual particles with this particle. The entire table of Treatment has been updated.
The distribution of Sex has also been updated.
Now suppose that instead of having hard evidence on the particle, we have an uncertain partial observation. For example, we just see a pill box next to the person, which increases our belief that he/she has taken the treatment, say 75%.
This changes the posterior distribution of Sex. The new particle will thus be split to take into account the uncertainty on both Sex and Treatment.
Below is the result of the update after mixing our virtual particles with this particle. The entire table of Treatment has been updated.
Setting the Prior Weight to 0 allows indicating that you do not want to update the corresponding node.
Obviously, you need to have at least one node that has a Prior Weight greater than 0 for being able to use the Parameter Updating feature.
Types of Evidence
All the BayesiaLab types of evidence can be used to describe the particles:
- Hard Positive Evidence: no uncertainty on the state of the variable, i.e. all the likelihoods of the other states are set to 0%;
- Hard Negative Evidence: the likelihood of one state is set to 0% while the other ones are set to 100%;
- Likelihood Evidence: a likelihood distribution is associated with the node. To be informative, the distribution should contain at least one likelihood that differs from the other ones;
- Probabilistic Evidence: a likelihood distribution is computed for getting the required probability distribution. It's possible to define a static likelihood distribution ( computed just once, when the evidence is set), or a dynamic likelihood distribution ( updated after each new piece of evidence for maintaining constant the required probability distribution).
- Numerical Evidence: a probability distribution is estimated to get the required expected value (with the MinXEnt, Binary, or Shift estimation methods). A likelihood distribution is then computed for getting the computed probability distribution. It's possible to define a static likelihood distribution (computed just once, when the evidence is set), or a dynamic likelihood distribution ( updated after each new piece of evidence for maintaining constant the required probability distribution or required expected value).
Source of Observations
There are three evidence sources for describing the particles:
- Manual: utilization of the monitors in the Monitor Panel, or the new interactive rendering of the nodes (Monitors, Gauges or Bars) in the Graph Panel; All types of evidence can be used;
- Evidence Scenario File: the scenarios can use all types of evidence;
- Database: the observations are only based on Hard Positive Evidence.
When the chosen evidence source is the Evidence Scenario File or the Database, two additional tools are added in the toolbar compared to the manual mode: :
- : carries out the update by using all the particles contained in the evidence source in a sequential way;
- : carries out the update by using all the particles contained in the evidence source in batch, until the entropy of the observations converges. Note that it comes back to the particle that was described when the process has been triggered.
While the text field is passive in the Manual Mode (i.e. it indicates how many particles have been taken into account for updating the tables), it can be utilized in the other two modes to indicate the index of the last particle to be processed.
Smooth Probability Estimation
The option available via Edit | Edit Smoothed Probability Estimation allows generating particles from the uniform joint distribution. Interestingly, the exact same result can be achieved by setting uniform distribution in all the tables of the network (marginal and conditional), and then use the Database Source for updating the parameters.
Suppose we have the network below, and an associated data set containing 100 samples.
In order to implement the Smooth Probability Distribution, we first set the three tables to uniform distributions:
We let Discount to its default value 1, then select Database as the Evidence Source, and set our Prior Weight to 1.
Upon clicking , all the particles described in the data set are utilized for updating the probability distributions.
saves the updated tables, which are exactly identical to those you would get using Smooth Probability Estimation.
Evidence Instantiation, introduced in version 5.0.4, creates a network based on the current network and a set of evidence. While the original network and the evidence define a subspace in the original joint probability distribution, the new instantiated network is entirely dedicated to the representation of that subspace. This is very useful in at least two scenarios:
- You want to set evidence on a subset of nodes, while keeping these nodes in the set of “drivers” (e.g. Target Optimization) or analyzed nodes (e.g. Target Analysis),
- You have designed your network (usually by using expert knowledge), and some marginalized distributions do not exactly match knowledge you have about marginal distributions. So you want to automatically adjust the conditional probability distributions to get these marginal distributions.
In the case of evidence set on a common child of two marginally independent nodes, the exact instantiation was not possible because of the conditional dependency.
It is now possible to use Parameter Updating to obtain a much better instantiation for all set of evidence.
Suppose we have the network below
and for some reason, we want to get the following marginal distributions, set with Fixed Numerical Evidence:
Below is the instantiated network obtained with Evidence Instantiation:
As we can see, whereas the distributions are correct for Factor_2 and Factor_5, it's not the requested one for the collider Factor_4.
We first need to store this set of numerical evidence in the scenario file .
We then set a Discount of 0 for all the nodes so as not to mix our particle with any prior. We then select Evidence Scenario File as as source of evidence. Note that Prior Weight does not matter because Discount is zero.
Clicking on allows using these three pieces of evidence until the entropy of the observations converges.
As we can see below, the updated model perfectly represent the requested marginal distributions.
When the data set contains missing values, or when the network has hidden nodes (i.e. with 100% missing values), the estimation of the probabilities is done by using an Expectation-Maximization (EM) algorithm. It basically consists in iterating though the following two steps until convergence:
- Expectation: utilization of the current Bayesian network for completing the not fully observed particles;
- Maximization: utilization of the Maximum Likelihood Estimation algorithm for updating the probabilities with the completed particles.
The Learning | Parameter Estimation method is used by all BayesiaLab machine learning algorithms, and as such, it is entirely data driven, i.e. it does not take into account the probability distributions that have been defined manually.
As of version 8.0, you can use Parameter Updating when you want EM to take your priors into account when estimating the probabilities of your network.
Suppose we have the network below for which we have data for all nodes, except for Hidden Cause.
The only expert knowledge we have is expressed in the conditional distribution of Measure below. The "a" measures can only come from Context1, whereas the "b" measures can only come from Context2.
Without any expert knowledge on the other nodes, we set uniform distributions for all the other nodes.
We then set a Discount of 0 for all nodes to use only the particles described in the data set for estimating the probabilities. We then select Database as source of evidence. Note that Prior Weight does not matter because Discount is zero.
Clicking on allows using all the particles described in the data set for updating the probabilities, until the entropy of the observations converges.
We get the following updated network.
As we can see below, the prior knowledge has been taken into account during the update of the probability tables.