Child pages
  • Data Perturbation (6.0)

Contents

Context

Data Perturbation is an algorithm that perturbs each observation in the database by multiplying its current weight by a random perturbation with values between 0 and 2. The perturbation value is drawn from a normal distribution with a mean of 1 and a standard deviation set by the user. A Decay Factor can be used to progressively attenuate the standard deviation with each iteration.

Data Perturbation can be used in the context of:

  • Machine Learning: Perturbation helps escape from local minima in the learning process. The Decay Factor is typically set to values smaller than 1 in order to test different degrees of perturbation.
  • Cross Validation: Perturbation introduces variability in data. As such, it is a kind of bootstrap algorithm with continuous values instead of integers. The decay factor is typically set to 1 to get the same degree of perturbation for all evaluations.

Updated Feature: Final Standard Deviation

Learning | Data Perturbation

Tools | Cross Validation | Targeted Evaluation | Data Perturbation

As of version 6.0, the Initial Standard Deviation is internally set to 1, and the Decay Factor is now automatically computed based on the Final Standard Deviation set by the user.

 

The Data Perturbation settings for the other cross-validation tools (Tools | Cross Validation | Arc Confidence | Data Perturbation and Tools | Cross Validation | Variable Clustering | Data Perturbation) only allow users to set the Standard Deviation.