Child pages
  • Data Clustering (7.0)

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


This new option allows selecting the segmentation that maximizes the Mutual Information of the Manifest variables with the Target Node


Let's use the entire data set that describes houses in Seattle, with this subset of Manifest variables:

  • Renovated: indicates if the house has been renovated
  • Age: Age of the house
  • sqft_living15: Living room area in 2015
  • long: Longitude coordinate
  • lat: Latitude coordinate
  • Price (K$): Price of the house.

After setting Price (K$) as a Target Node and selecting all the other variables, we use the following settings for Data Clustering:

This returns a solution with 2 segments, generating an Heterogeneity Index of 60%. This indicates thus that using [Factor_i] as a breakout variable would allow increasing by 60% the sum of the Mutual Informations of the Manifest variables with the Target Node.

The Multi-Quadrant below highlights the improvement of the Mutual Information. The points correspond to the Mutual InformationInformations on the entire data set, and the vertical scales shows the variations of the Mutual Informations by splitting the data based on the values of [Factor_i].