Contents

Question

What score should be used when learning a Maximum Weight Spanning Tree?

The Maximum Weight Spanning Tree learning algorithm in BayesiaLab can be run with either the Minimum Description Length (MDL) score or Pearson's Correlation. While I am clear on how the MDL score is calculated for a candidate Bayesian network solution, I am not sure how the bivariate correlation metrics are used (summed up, averaged?) to score a network? 

Answer

If Pearson's Correlation is used for learning a Maximum Weight Spanning Tree (MWST), the correlation coefficients will be computed for every pair of nodes. These coefficients are then used as weights to build a tree that maximizes the total sum of their squared values.

The MDL score takes into account the "correlation" plus the structural complexity of the network, thus establishing "automatic significance thresholds". However, Pearson's Correlation is only based on correlation, without any significance threshold. Thus the latter algorithm always returns trees in which all the nodes are connected, even in the case of very weak relationships.