Child pages
  • Data Perturbation (5.4)



Learning | Data Perturbation

Data Perturbation is an algorithm that adds random noise to the weight of each observation in the database. The additive noise is generated from a normal distribution with 0 mean and a standard deviation to be set by the user. A Decay Factor can be set to progressively attenuate the standard deviation with each iteration.

Data Perturbation can be used in the context of:

  • Machine Learning: perturbation helps escape from local minima in the learning process. The decay factor is typically set to values smaller than 1 in order to test different degrees of perturbation.
  • Cross Validation: perturbation introduces variability in data. As such, it is a kind of bootstrap algorithm with continuous values instead of integers. The decay factor is typically set to 1 to get the same degree of perturbation for all evaluations.

New Feature: Learning

Given these two different functions, we have added Data Perturbation as a new learning algorithm in order to take into account the specificities of the algorithm for machine learning. 

The parameters are essentially the same as those that are available under Tools | Cross Validation

Here, however, the output of the algorithm is not a Cross Validation Report. Rather, the output shows the best network learned during the iterations of the trial, as evaluated against the original data.