Is multiple testing adjusment (MTA) neccessary for establishing significance of dependecies or associations in learned-BN?
I recently generated a learned BN (single equivalence class using SopLEQ method) from small dataset (47 instances) to associate viral DNA sequence to therapy outcomes. BN was then validated with 2 independent datasets. However, a reviewer pointed out that MTA needs to be performed.
The BayesiaLab’s structural learning algorithms are based on the Minimum Description Length (MDL) score:
- DL(B) is the Description Length of the Bayesian network B, i.e. the number of bits to represent the graph and the associated conditional probability distributions
- DL(D|B) is the Description Length of the Data given the Bayesian network B, i.e. the log-likelihood of the data given the network
- is the Structural Coefficient that allows adjusting the relative importance of the structural complexity vs the data likelihood. This is equivalent to changing the size of the dataset by
Be careful not choosing too low!
Setting this coefficient to 0 leads to fully connected networks that are quickly unmanageable when the number of variable is higher than 10.
This score withis conservative and returns by default highly significant relations (classical statistical tests will return p-values = 0).
However, when data is scarce, we usually need to lower. Choosing a value that is too low can lead to learning models with relationships that are not significant anymore. This value should then be chosen carefully.