In this new study, we turn to the field of cancer classification by means of microarray analysis. Microarray analysis is a technique for gene expression profiling of cell samples. Expression profiles indicate which genes are currently active among thousands of genes. The activation of certain genes can indicate the type and the current state of a cell.
In our case, we want to use the expression profiles of cell samples from cancer patients to distinguish between different types of leukemia. Leukemia is a type of cancer of the blood or bone marrow characterized by an abnormal increase of white blood cells. Clinically and pathologically, leukemia can be divided into a number of groups, of which we will examine two types of acute leukemia, namely acute lymphoblastic leukemia (ALL) and acute myelogenous leukemia (AML).
The correct classification of the subgroup of leukemia is critical for the selection of the most efficient therapy, which may include chemotherapy and radiation, and for minimizing side effects. In general, the progress in correct cancer classification in recent years has been crucial for improving the overall treatment success.
One of the challenges in microarray analysis is the sheer number of genes, which could potentially be predictors in a classification model. At the same time, the number of observations tends to be small. So, it is not uncommon to have thousands of predictors while only having a few dozens of samples. It is precisely the opposite of what one would hope to have for a traditional statistical analysis.
As a result, many new statistical techniques have emerged in recent decades and one of them is described in detail in Golub et al. (1999). This study demonstrates that cancer classification is feasible on the basis of gene expression data alone. Since its publication, it has been widely cited and further disseminated, e.g. in Slonim et. al (2000) and Dudoit et al. (2002). Also, the underlying dataset has been made publicly available to any interested researcher by the Broad Institute. Given the seminal nature of the Golub study and its excellent pedagogical qualities, we have chosen it as our reference point for a new case study and BayesiaLab tutorial.
Our objective is to show that our modeling approach with Bayesian networks (as the framework) and BayesiaLab (as the software tool) can quickly and effectively generate models of equal or better classification performance compared to models documented in literature, while only requiring a minimum of specification effort from the research analyst.