Child pages
  • Missing Values (5.3)

Contents

Context

Data Import Wizard

Data Associate Wizard

Learning | Missing Value Processing

Data | Imputation

Node Contextual Menu | Imputation

New Feature: Entropy-Based Imputation

The Entropy-Based Imputation scheme was originally designed for processing data sets with a large amount of missing values.

Missing values imputation is always based on the posterior probability distributions computed by taking into account the current Bayesian network and the data available (not missing or already imputed values). However, if a data row has more that one missing value, the imputation sequence has a direct impact on the imputation result.

With BayesiaLab's Standard Imputation, the imputation sequence is randomized for each row. With the Entropy-Based Imputation, the order is dynamically defined; first come the nodes for which the Markov Blanket is completely observed, and then the nodes with the lowest posterior entropies are imputed.

Entropy-Based Imputation is available for:

  • Static Imputation: missing values are virtually imputed on request. The initial (automatic) imputation is performed upon loading the data. Subsequent virtual imputations are performed on request whenever the parameters of the current structure are estimated by selecting Learning | Parameter Estimation.
  • Dynamic Imputation: missing values are virtually imputed during structural learning after each modification of the structure.
  • Data Imputation: all the missing values are imputed, and the resulting dataset is saved as a file.
  • Node Imputation: the missing values are imputed only for the selected nodes within the internal dataset.

New Feature: Most Probable Explanation Imputation

This new imputation mode is based on the MAP query: the set of states that jointly maximizes the probability given the observed values is identified and used for imputing the missing values (the MAP assignment).

  • Unlike the Standard Imputation and Entropy-Based Imputation, the Most Probable Explanation does not depend on the imputation sequence.
  • There is no option for choosing the states, it's always the one belonging to the MAP assignment.

MAP assignment and Incremental Choice of the most likely state, given the not missing or already imputed values, are usually different. The first one corresponds to the states that will jointly maximize the probability given the observed values. The second is incremental and, as such, is dependent on the sequence of the choices.

Example

Let's take the following simple example of 2 nodes. We wish to impute the values in a row of data, in which both values are missing.

 

Using Analysis | Visual | Most Probable Explanation allows to get the MAP assignment.

Consequently, the imputed values will be

Age=Senior and Smoker=No

Age is selected to be imputed first. Its most likely state is Adult. The most likely state of Smoker given Age=Adult is Yes.

Then, the imputed values will be

Age=Adult and Smoker=Yes

Smoker is selected to be first imputed. Its most likely state is Yes. The most likely state of Age given Smoker=Yes is Young Adult.

As a result, the imputed values will be

Smoker=Yes and Age=Young Adult