"Discriminative Stochastic Models for Complex Networks Derived from Flow Cytometry Big Data",
Forty-Fourth Southeastern International Conference on Combinatorics, Graph Theory, and Computing
, Boca Raton, March 4-8, 2013.
Flow cytometry measures multiple features of thousands of cells in a sample, and thus produces a massive high-dimensional data set that must be analyzed to decide whether a given patient has cancer or not. Currently, human experts use two-dimensional visualizations of data sets of up to six dimensions and perform this process of discriminating between cancer and normal patients using their own expertise. However, modern flow cytometry machines can produce massive high-dimensional (20 or more) data sets, and their manual analysis quickly becomes infeasible. In this paper, we propose the use of complex network models and their topological properties for discriminating between cancer and normal patients. In particular, each node in the complex network corresponds to the measurements obtained from a single cell and an edge between two nodes exists if the Euclidean distance between them is smaller than a threshold. The evolution of the network through time is derived by studying periodically acquired patient samples. By constructing such complex network models for multiple normal patients, we develop a stochastic generative model that describes the flow cytometry data for normal patients. In particular, topological properties such as number of connected components, edge density, number of clusters, etc. are studied. The goal of the stochastic generative modeling is to capture the natural diversity that occurs in the normal patient population (age, race, gender, BMI), and thereby compute the probability that a given flow cytometry sample does not arise from this stochastic generative model. Rare behavior identification algorithms will then be employed to compute the probability that a given flow cytometry sample indicates the presence of cancer in a patient.