Child pages
  • GIS Mapping (7.0)

Contents

Context

Tools | GIS Mapping

If the data set associated with the network contains latitude and longitude coordinates, it is now possible to display a graphical object per observation/row on a Google Map

The coordinates need to be loaded as continuous variables and thus have to be discretized. While the choice of the discretization can have an impact on the machine-learned model (if these coordinates are useful for the model), it does not have any impact on the mapping. The continuous values are utilized directly.

The graphical objects have four dimensions: Shape, Color, Size and Opacity.

Each of these dimensions can be:

  • Fixed, i.e. identical for each object/observation,
  • Based on the value of a variable, i.e specific to each observation:
    • Directly extracted from the observation described in the data set when the variable is:
      • an Observable Random Node and the value is Not Missing,
    • Inferred with the current Bayesian network when the variable is:
      • an ObservableRandom Node and the value is Missing,
      • the TargetNode,
      • Not Observable Random Node,
      • Function Node with numerical values.

For inference, all the non missing values of the Observable Random Nodes are set as hard evidence.

The value that will be utilized for the mapping depends on the type of the variable:

  • Discrete: the state is chosen with the Maximum a posteriori criterion,
  • Continuous: the mean value is computed with the posterior probability distribution, normalized to bring all values into the range [0,1],
  • Function Node: except when used to define the Shape, the value is normalized to bring all values into the range [0,1], by using the Minimum and Maximum Values set in the wizard.

Three shapes are defined: Circle (1), Square (2) and Triangle (3).

When not Fixed, the shape is chosen based on its rank and the inferred value:

  • For Discrete nodes:



    where is the state's rank and is the modulo,
  • For Function nodes:



    where is the value of the Function and is the function for converting into an integer.

When not Fixed to the user defined Fixed Value , the size is chosen based on the and the inferred value:

  • For Discrete variables:



    where is the normalized state's rank;
  • For Continuous and Function nodes:



    where is the normalized value.

When not Fixed to the chosen color, the color is chosen as folliows:

  • For Discrete variables: the color is chosen based on the state's rank and the Secondary Color Palette,
  • For Continuous and Function nodes: the normalized value is direclty used to defined a color on the user defined scale Min, Mid (if checked), and Max.

When not Fixed to the user defined Fixed Value (), the opacity is chosen based on and the inferred value:

  • For Discrete variables:



    where is the normalized state's rank;
  • For Continuous and Function nodes:



    where is the normalized value.

Example

Let's use a data set that contains house sale prices for King County, which includes Seattle. It describes homes sold between May 2014 and May 2015. More precisely, we have extracted the 94 houses that are more than 100 years old, that have been renovated, and come with a basement.

After having set Price (K$) as a Target Node, we've used the Augmented Markov Blanket algorithm for generating the following network:

The Function Node Certainty is defined as: 1-Entropy(?Price (K$)?, yes)

The first three parameters of this wizard are the general settings of the mapping:

  • Map Type: Roads, Terrain, Satellite or Hybrid,
  • Latitude: the continuous variable to use for the latitude coordinate,
  • Longitude: the continuous variable to use for the longitude coordinate.

This setting generates the following map that takes into account four differents variables:

  • The Observable variable Overall grade given to the housing unit (discretized into three bins) defines the shape. The values are directly read in the data set to determine the corresponding discrete bin, if not missing;
    • <= 7.5: CIRCLE
    • <= 8.5: SQUARE
    • > 8.5: TRIANGLE
  • The Observable variable Living room area in 2015 defines the size, with 25 a a the maximum (set in the Fixed field), The continous values are directly read in the data set, if not missing,
  • The Target Node Price (K$) defines the color. The values are the inferred posterior mean values,
  • The Function Node Certainty defines the opacity. The values are inferred.