# Contents

### Overview

Probabilistic models based on directed acyclic graphs have a long and rich tradition, beginning with the work of geneticist Sewall Wright in the 1920s. Variants have appeared in many fields. Within statistics, such models are known as directed graphical models; within cognitive science and artificial intelligence, such models are known as Bayesian Belief Networks (BBNs), a term coined in 1985 by UCLA Professor Judea Pearl to honor the Rev. Thomas Bayes (1702-1761), whose rule for updating probabilities in the light of new evidence is the foundation of the approach.

BBNs provide an elegant and sound approach to represent uncertainty and to carry out rigorous probabilistic inference by propagating the pieces of evidence gathered on a subset of variables on the remaining variables. BBNs are not only effective for representing expert’s belief, uncertain knowledge and vague linguistic representations of knowledge via an intuitive graphical representation, but are also a powerful knowledge discovery tool when associated to machine learning/data mining techniques.

In 2004, the MIT Press of Technology (Massachusetts Institute of Technology) classified Bayesian Machine Learning at the 4th rank among the “10 Emerging Technologies That Will Change Your World”. Most recently, Judea Pearl, the father of BBNs, received the 2012 Turing Award, the most prestigious award in computer science, widely considered the "Nobel Prize in Computer Science”, especially for the development of the theoretical foundations for reasoning under uncertainty using BBNs.

Over the last 25 years, BBNs have then emerged as a practically feasible form of knowledge representation and as a new comprehensive data analysis framework. With the ever-increasing computing power, their computational efficiency and inherently visual structure make them attractive for exploring and explaining complex problems. BBNs are now a powerful tool for deep understanding of very complex and high-dimensional problem domains. Deep understanding means knowing, not merely how things behaved yesterday, but also how things will behave under new hypothetical circumstances tomorrow. More specifically, a BBN allows explicit reasoning, and deliberate reasoning to allow the anticipation of the consequences of actions that have not yet be taken.

### Probabilistic Model

From a technical point of view, BBNs are made of two parts:

- Qualitative: a Directed Acyclic Graph (DAG), i.e. a special kind of directed graph that does not include cycles. Directed Acyclic Graphs are composed of nodes that represent the variables of the domain (e.g. the temperature of a device, a feature of an object, the occurrence of an event, the age of a patient) and the links represent statistical (informational) or causal dependencies among the variables. The DAG is the formal definition of the factorization of the Joint Probability Distribution over the all set of variables;
- Quantitative: conditional probability distributions, for the quantification of the dependencies between each node given its parents in the graph.

**Example**

Let’s take two variables, Age and Gray Hair. Therefore, the DAG has two nodes, one for Age and another one for Gray Hair. As there is a probabilistic (and causal) relationship between Age and Gray Hair, there is a link between the two nodes as follow:

This graph defines the following factorization of the joint probability distribution P(Age, Gray Hair):

**P(Age, Gray Hair) = P(Age) P(Gray Hair | Age)**

The probability distributions are often represented with tables. The marginal distribution of Age is illustrated in the following table:

*16% of the population is less than 30 and 9.4% is more than 70 years old.*

The table below quantifies the relationship between Age and Gray Hair.

*In words, among those under 30 years of age, 66% do not have any gray hair, but 1.8% already have a lot of gray hair. *

*Among people older than 70, 30.8% have all gray hair, i.e. they are 100% gray.*

### Probabilistic Inference

The DAG and the probability distributions associated with each node allow a compact representation of the joint probability distribution over all the variables. Therefore, inference algorithms are available and BBNs can be used as probabilistic expert systems for computing the posterior probability distributions of *unobserved* variables given evidence on an arbitrary number of *observed* variables.

Also, observational inference in BBNs is *omnidirectional*: it is possible to perform inference from parents to children (simulation), from children to parents (diagnosis), and any combination of these two kinds of inference.

However, it is important to point out that *causal* inference can only be used in the context of simulation.

**Example**

**Marginal probability distributions of Age and Gray Hair**

**Simulation - Posterior probability distribution of Gray Hair given Age >70**

**Diagnosis - Posterior probability distribution of Age given Gray Hair is Salt & Pepper**

### BBN Design

There are two ways of building BBNs:

- Using available expert knowledge to manually design the DAG and to define the corresponding probability distributions.
- Analyzing available data to machine-learn the DAG and to estimate the corresponding probability distributions.

Within the same theoretical framework, BayesiaLab offers a broad set of data mining algorithms:

- Structural Unsupervised Learning: induction of a BBN to compactly represent the joint probability distribution sampled by the data set; all the variables have the exact same importance.
- Supervised Learning: design of a BBN entirely dedicated to the characterization of a Target variable.
- Unsupervised Data Clustering: creation of a BBN with a hidden variable to represent uniform groups of individuals/observations.
- Unsupervised Variable Clustering: identification of strongly connected variables that can be clustered into factors.
- Probabilistic Structural Equation Models: hierarchical BBN where hidden variables are used to define the factors identified during variable clustering.