Data Import Wizard
Data Associate Wizard
Learning | Missing Value Processing
Data | Imputation
Node Contextual Menu | Imputation
New Feature: Entropy-Based Imputation
The Entropy-Based Imputation scheme was originally designed for processing data sets with a large amount of missing values.
Missing values imputation is always based on the posterior probability distributions computed by taking into account the current Bayesian network and the data available (not missing or already imputed values). However, if a data row has more that one missing value, the imputation sequence has a direct impact on the imputation result.
With BayesiaLab's Standard Imputation, the imputation sequence is randomized for each row. With the Entropy-Based Imputation, the order is dynamically defined; first come the nodes for which the Markov Blanket is completely observed, and then the nodes with the lowest posterior entropies are imputed.
Entropy-Based Imputation is available for:
- Static Imputation: missing values are virtually imputed on request. The initial (automatic) imputation is performed upon loading the data. Subsequent virtual imputations are performed on request whenever the parameters of the current structure are estimated by selecting Learning | Parameter Estimation.
- Dynamic Imputation: missing values are virtually imputed during structural learning after each modification of the structure.
- Data Imputation: all the missing values are imputed, and the resulting dataset is saved as a file.
- Node Imputation: the missing values are imputed only for the selected nodes within the internal dataset.
New Feature: Most Probable Explanation Imputation
This new imputation mode is based on the MAP query: the set of states that jointly maximizes the probability given the observed values is identified and used for imputing the missing values (the MAP assignment).
- Unlike the Standard Imputation and Entropy-Based Imputation, the Most Probable Explanation does not depend on the imputation sequence.
- There is no option for choosing the states, it's always the one belonging to the MAP assignment.
MAP assignment and Incremental Choice of the most likely state, given the not missing or already imputed values, are usually different. The first one corresponds to the states that will jointly maximize the probability given the observed values. The second is incremental and, as such, is dependent on the sequence of the choices.