Child pages
  • Confidence Intervals (7.0)

Contents

Context

Analysis | Visual | Sensitivity | Confidence Intervals 

When the number of instances used for machine learning the model is limited, it can be useful to analyze the confidence intervals associated with each estimated (conditional) distribution of the network.

The parameters of the models are estimated via Maximum Likelihood, by using the frequencies observed in the data set:

where:

  •  is the estimated probability
  • xi is the state i of variable x
  • N(.) represents the number of occurrences of the argument in the data set.

Suppose we have an estimation of 0.1 for x0. This can be obtained with 1 instance out of 10, or 1,000 instances out of 10,000. Clearly, these two scenarios are not equivalent. Our intuition is that, in the second scenario, we should be more confident in our estimate.

Frequentist Estimation of Confidence Intervals

We estimate a confidence interval from the observed data for each parameter .

For a confidence level of 95%, the intervals are computed as follows: 

When there is no instance for a given state, the estimation is defined with the Rule of Three as +/- 3/N.

Note that using the Smooth Probability Estimation allows not having to use this heuristic.

Confidence Level

The confidence level is the frequency of confidence intervals that contain the actual parameter value . It can be set via Window | Preferences.

This level of confidence is for each parameter. For probability distributions, this means that this confidence level is only correct for binary variables.

For variables with more states, the confidence level associated with the estimation of the distribution of the variable decreases exponentially with the number of states.

Monte Carlo Simulation

Once the confidence intervals estimated for every parameter in the Bayesian network, i.e. every cell of the (conditional) probability tables, a Monte Carlo simulation is carried out to generate a set of networks. Each of these networks uses parameters sampled from the confidence intervals.

These networks are thus used for measuring how these confidence intervals impact the represented joint probability distribution. 

Example

Let's use a data set that contains house sale prices for King County, which includes Seattle. It describes homes sold between May 2014 and May 2015. More precisely, we have extracted the 94 houses that are more than 100 years old, that have been renovated, and come with a basement.

Below is the network obtained with the unsupervised structural learning algorithm EQ, 5 non-informative prior samples for a Smooth Probability Estimation and a Structural Coefficient set to 0.5.

Let's focus on the variable grade, directly connected to the Target node Price (K$).

The Mutual Information between Price (K$) and grade is 0.337, the Normalized Mutual Information is 21.257% and the Relative Mutual Information is 28.149%

The monitor below describes the marginal distribution of grade and its mean value inferred with the Bayesian network machine learning with 94+5 samples:

Here are the results of the Confidence Interval Analysis after generating 10,000 networks: