Tools | Resampling
Prior to version 7.0, this menu item was named Cross-Validation. However, the associated tools now belong to a broader class of methods usually called Resampling.
Let's assume the current Bayesian network
data sets that are utilized for learning
BayesiaLab offers now three different ways to create the data sets:
- Jackknife and K-Fold: is divided into folds of observations. data sets are created by iteratively excluding one fold .
- Bootstrap: each data set is created by sampling observations with replacement from the original data set .
- Data Perturbation: each data set is created by perturbing the observations of by multiplying their current weight by a random perturbation. It is a smooth bootstrap, where weights can take any continuous values from 0 to 2.
Resampling can be used for two types of analysis:
- Measuring the variability of estimations with Jackknife, Bootstrap or Data Perturbation;
- Estimating the quality of a learning configuration, i.e. learning algorithm and settings, with K-Fold.
The maximum number of data sets that can be generated with Jackknife and K-Fold is. This is configuration is known as Leave-One-Out Cross-Validation.
Bootstrap and Data Perturbation do not have such limitation and can be used for generating an arbitrary number of data sets.
The data sets created with Jackknife and K-Fold contain less observations thanThe structural coefficient . is thus updated (see Arc Confidence) in order to take into account that . The number of prior samples, if any, is also updated by using the same equation.
Oncegenerated, all the Continuous Variables that have not been discretized manually are re-discretized! The discretization is thus another source of instability.
If you want to exclude this source of instability and only measure the variability associated with the learning method, you can make a right click on the data set icon Remove Associated Discretization Type.in the lower right corner of the Graph Panel and select
The Structural Coefficient Analysis is a tool that is not based on data sampling by rather on multiple runs on the same data set. It has thus been moved to the new Multi-Run menu.
The Multi-Target Evaluation has been added in the set of tools that can be used for analyzing thelearned networks .
As of version 7.0, there are thus four available kinds of resampling analysis: