Tools | Resampling
Prior to version 7.0, this menu item was named Cross-Validation. However, the associated tools now belong to a broader class of methods usually called Resampling.
Let's assume the current Bayesian network B has an associated data set D made of N observations. Resampling consists in generating K data sets Dk that are utilized for learning K networks Bk.
BayesiaLab offers now three different ways to create the data sets Dk:
- Jackknife and K-Fold: D is divided into K folds of N/K observations. K data sets are created by iteratively excluding one fold.
- Bootstrap: each data set Dk is created by sampling N observations with replacement from the original data set D.
- Data Perturbation: each data set Dk is created by perturbing the N observations of D by multiplying their current weight by a random perturbation. It is a smooth bootstrap, where weights can take any continuous values from 0 to 2.
Resampling can be used for two types of analysis:
- Measuring the variability of estimations with Jackknife, Bootstrap or Data Perturbation;
- Estimating the quality of a learning configuration, i.e. learning algorithm and settings, with K-Fold.
The maximum number of data sets that can be generated with Jackknife and K-Fold is N. This is configuration is known as Leave-One-Out Cross-Validation.
Bootstrap and Data Perturbation do not have such limitation and can be used for generating an arbitrary number of data sets.
Since the data sets created with Jackknife and K-Fold contain less observations than D, the structural coefficient αk is updated (see Arc Confidence) in order to take into account that Nk < N. The number of prior samples, if any, is also updated by using the same equation.
Once Dk generated, all the Continuous Variables that have not been discretized manually are re-discretized! The discretization is thus another source of instability.
If you want to exclude this source of instability and only measure the variability associated with the learning method, make a right click on the data set icon Remove Associated Discretization Type.in the lower right corner of the Graph Panel and select
The Structural Coefficient Analysis is a tool that is not based on data sampling by rather on multiple runs on the same data set. It has thus been moved to the new Multi-Run menu.
The Multi-Target Evaluation has been added in the set of tools that can be used for analyzing the K learned networks Bk.
As of version 7.0, there are thus four available kinds of resampling analysis: