Probabilistic models based on directed acyclic graphs have a long and rich tradition, beginning with the work of geneticist Sewall Wright in the 1920s. Variants have appeared in many fields. Within statistics, such models are known as directed graphical models; within cognitive science and artificial intelligence, such models are known as Bayesian Belief Networks (BBNs), a term coined in 1985 by UCLA Professor Judea Pearl to honor the Rev. Thomas Bayes (1702-1761), whose rule for updating probabilities in the light of new evidence is the foundation of the approach.
BBNs provide an elegant and sound approach to represent uncertainty and to carry out rigorous probabilistic inference by propagating the pieces of evidence gathered on a subset of variables on the remaining variables. BBNs are not only effective for representing expert’s belief, uncertain knowledge and vague linguistic representations of knowledge via an intuitive graphical representation, but are also a powerful knowledge discovery tool when associated to machine learning/data mining techniques.
In 2004, the MIT Press of Technology (Massachusetts Institute of Technology) classified Bayesian Machine Learning at the 4th rank among the “10 Emerging Technologies That Will Change Your World”. Most recently, Judea Pearl, the father of BBNs, received the 2012 Turing Award, the most prestigious award in computer science, widely considered the "Nobel Prize in Computer Science”, especially for the development of the theoretical foundations for reasoning under uncertainty using BBNs.
Over the last 25 years, BBNs have then emerged as a practically feasible form of knowledge representation and as a new comprehensive data analysis framework. With the ever-increasing computing power, their computational efficiency and inherently visual structure make them attractive for exploring and explaining complex problems. BBNs are now a powerful tool for deep understanding of very complex and high-dimensional problem domains. Deep understanding means knowing, not merely how things behaved yesterday, but also how things will behave under new hypothetical circumstances tomorrow. More specifically, a BBN allows explicit reasoning, and deliberate reasoning to allow the anticipation of the consequences of actions that have not yet be taken.
From a technical point of view, BBNs are made of two parts:
- Qualitative: a Directed Acyclic Graph (DAG), i.e. a special kind of directed graph that does not include cycles. Directed Acyclic Graphs are composed of nodes that represent the variables of the domain (e.g. the temperature of a device, a feature of an object, the occurrence of an event, the age of a patient) and the links represent statistical (informational) or causal dependencies among the variables. The DAG is the formal definition of the factorization of the Joint Probability Distribution over the all set of variables;
- Quantitative: conditional probability distributions, for the quantification of the dependencies between each node given its parents in the graph.
The DAG and the probability distributions associated with each node allow a compact representation of the joint probability distribution over all the variables. Therefore, inference algorithms are available and BBNs can be used as probabilistic expert systems for computing the posterior probability distributions of unobserved variables given evidence on an arbitrary number of observed variables.
Also, observational inference in BBNs is omnidirectional: it is possible to perform inference from parents to children (simulation), from children to parents (diagnosis), and any combination of these two kinds of inference.
However, it is important to point out that causal inference can only be used in the context of simulation.
There are two ways of building BBNs:
- Using available expert knowledge to manually design the DAG and to define the corresponding probability distributions.
- Analyzing available data to machine-learn the DAG and to estimate the corresponding probability distributions.
Within the same theoretical framework, BayesiaLab offers a broad set of data mining algorithms:
- Structural Unsupervised Learning: induction of a BBN to compactly represent the joint probability distribution sampled by the data set; all the variables have the exact same importance.
- Supervised Learning: design of a BBN entirely dedicated to the characterization of a Target variable.
- Unsupervised Data Clustering: creation of a BBN with a hidden variable to represent uniform groups of individuals/observations.
- Unsupervised Variable Clustering: identification of strongly connected variables that can be clustered into factors.
- Probabilistic Structural Equation Models: hierarchical BBN where hidden variables are used to define the factors identified during variable clustering.