Child pages
  • Merging Surveys



Q1: How can I use BayesiaLab to combine respondent-level data from two different surveys? These two surveys have different sample sizes only a few variables in common.

Q2: What if I want to use the two data sets together to find relationship between the different variables from different it possible? 


When you have a Hub, i.e. a set of variables that are common in the different studies, you can merge the two data sets into a single file and define Filtered States for those values that do not exist. You can mark these fields with "*" or "FV" in your dataset to indicate that a question is not applicable for the study. In the first step of the Data Import Wizard you can set the code for Filtered Values.


Let's consider two Bayesian networks, shown below, learned on two different surveys.

Three nodes are common in these two studies (Q1_1, Q2_1 and Q3_1). The merged dataset looks as follows:

Upon importing this dataset into BayesiaLab, those nodes that contain Filtered States are marked with .

Using Unsupervised Structural Learning on this merged dataset returns the Bayesian network below.

It would be tempting to use Missing Values instead of Filtered Values. However, values in this case are obviously not missing at random, hypothesis on which relies the BayesiaLab Missing Value Processing (Structural EM, Static and Dynamic Imputation). See our white paper "Missing Value Imputation" for more information.

Using Filtered Values add a Filtered State to the variables.

Once the network learned on the merged dataset, it is possible to get rid of these filtered states:

  1. Select all the monitors
  2. Set a Negative Evidence on all the Filtered states (right click on a monitor to bring up the contextual menu, Negative Evidence on Filtered State)

  3. Create a new Bayesian network for representing this new joint probability distribution: Tools | Evidence Instantiation. This new network still have Filtered states, but with a null probability.
  4. Delete the Filtered States by using the Node Editor.

It is not possible to find relationship between variables that have not been measured together when you do not have a Hub, i.e. without any variable in common.

This is only with such a hub that you may find (indirect) probabilistic relations between variables coming from different studies.

When you do not have a hub, you can build one by inducing a respondent segmentation. Based on some characteristics of the respondents, you can use BayesiaLab:

  • for creating a typology, and then
  • for asking the most informative questions to predict the cluster in which the respondent belong (e.g. with the Adaptive Questionnaire).

Doing this pre-work for each survey allows you to have at least one variable in your hub.