Category Archives: Graph Theory

Extracting Biologically Meaningful Information from Gene Expression Data: Gene CoExpression Networks

Data generated from gene expression experiments hold a important amount of biological information (Eisen et al.1998). The end point of any analysis of this sort is to gain a thorough view and understanding in the “inner life” of a cell i.e. the ongoing biological processes in the cell. This can be considered as a bottom-up approach, whereby we can slowly build our way up from the transcript levels, to the cellular process and ultimately the understand biological process under question (of course by combining other appropriate methods in order to be able to extract causal relationships). A natural thought to do, is that genes that have similar expression patterns, within a dataset, may be participating in common biological processes or even be under the same regulatory mechanism(s) (Tavazoie et al., 1999). Clustering of genes with similar expression patterns is a useful approach to gain this sort of information and and also putatively extent the information of common regulatory control to extract participation of the genes in various pathways.  Paraphrasing/Quoting from Eisen et al., 1998: “Statistical organization (clustering) and graphical display  of a microarray dataset allows for researchers to assimilate and explore data in a biologically meaningful way.  … Also, similarity in the gene expression pattern may be the easiest way to make -at least provisional- attribution of function on a genomic scale”.

Along the same lines, Transcription factor (TF) binding sites are critical in our understanding of transcription and trascriptional regulation. A TF binding site lies close or in a promoter region, therefore it has the ability to regulate transcription by either recruiting the RNA-polymerase in the promoter, or by blocking its docking on the DNA. The actions of TFs are transcript specific i.e. the TFs has a range of genes whose transcription it modulates. Thephysical approachof constructing gene networks, seeks to determine the TFs and their respective DNA motifs to which they bind to regulate transcription. Another strategy, the “influence approach” of constructing gene networks, deals with gene expression data and describes the relationships between the transcript levels and how they interact to regulate each other’s transcription. The transcript interactions are described with a graph, in which the nodes represent transcripts and the edges represent a relationship between the connected transcripts, according to the graph-construction method followed. The graph can be constructed as a system of differential equation models, a bayesian network, a boolean network or as an association network. The latter approach creates a gene coexpression network by assigning edges to pairs of genes with high statistical similarity. Different similarity metrics have been used such as Euclidean distance, Pearson correlation coefficient, mutual information (e.g. ARACNE, CLR), partial correlation coefficient (graphical Gaussians models (GGMs)). Moreover to tackle with analysis of gene expression data from time-series experiments appropriate algorithms extract correlation relationships between transcript level changes at the different time points  (Schmit Raab Stephanopoulos Genome Res04; Arkin, Shen , Ross Science 1997).

Genomic strategies in our days are advancing with a speed-of-light and the amounts of data generated are massive. The aforementioned network approaches, borrowed by graph theory and statistics hold the promise to reveal critical biological information where the “data mining” ability of a bench researcher stops. This is especially important, but without being the only, for cancer research. For example, breast cancer is the leading cancer death cause in women. It self is of heterogeneous phenotype, both in terms of histological origin/initiation (e.g. can develop in the ducts or lobule of the breast) as also, in terms of heterogeneity in the mutational landscape of the cancer cells. The latter means that the tumor it self can be highly heterogeneous. Combining transcript level analysis by coexpression networks with the recent advancements in breast tumor whole-genome sequencing (see Gray and Druker Nature 2012), may prove critical in our understanding on cancer initiation and evolution.

For more information on coexpression network construction the interested reader is referred to Gardner and Faith PLReav 2005.


Tavazoie et al., Nature Genetics 1999 

Eisen et al .PNAS 1998

Gardner and Faith PLReav 2005

Schmit et al., Genome Res 2004

Arkin et al., Science 1997

Gray and Druker Nature 2012

Leave a comment

Filed under Biology, Coexpression, Gene Expression, Graph Theory, Microarray, Networks, Science, Similarity

On Networks, Neurons and the Brain

When we think of our brain what comes in mind? Well, typically, words as memory and neurons are some keywords associated with the word brain. Moreover, other keywords can be schizophrenia, depression, Alzheimer’s, Parkinson’s. All these characterize different functional aspects of the brain. But, lets take a more spherical look at this structure which holds the most valuable process in humans: cognition. The brain can be considered as the world, consisting of many countries and smaller communities. Countries will be brain regions, such as the prefrontal cortex, the parietal cortex or the hippocampus. Communities would then be smaller neuronal networks within the individual brain areas. All these functional units work together to produce an output, may it be a single thought or better an array of thoughts, or, a complex grasping movement. At the heart of this output, lie neuronal units.  This post provides some introduction on the examination of the functional and architectural hierarchy of the brain, under the scope of Graph Theory.  The information is taken by the excellent review of Bullmore and Sporns [2009]. I urge you to read this review for more details and references on the topic.

Graph Theory, which is the study of graphs, has been steadily gaining interest in brain research. Regardless of time and scale and species [Bullmore and Sporns, 2009], the brain has been shown to have the characteristics of a small-world network. Before going into the specifics of brain networks, lets go over some definitions.

What is a network? A network is characterized by a set of nodes (may it be anatomical brain areas, functional brain areas as measured by electroencephalography or neurons) and a set of edges that connect these nodes. The connection is determined on the basis of the actual experimental question e.g. an edge can represent anatomical connections between brain areas e.g. the corpus callosum between the two cerebral hemispheres (a very simple 2 node graph/network with one edge) or the individual 200–250 million (!!!) contralateral axonal projections of the corpus callosum ( more complex graph; the nodes in this case may be the individual neurons receiving these connections and their topology). Key attributes of a network are: Node Degree, the number of connections of the specific node; the Clustering Coefficient, which is the ratio of the number of connections of a node to the maximum possible number connections a node may have. Moreover other important network attributes are: Hubs, which are nodes with high degree or else, high centrality, which is simply the number of shortest paths of other nodes, that cross the node under question. Short path, denotes the minimum number of edges, one may cross between a specific pair of nodes.

What then is a Small-World Network? A small-world network is a type of graph/network which is characterized by high node degree, hence, most nodes are neighbors between them. Also, a small-world network is characterized by small short path lengths, thus each node can be reached from another within a small number of lengths. This type of graph has received a lot of attention, when studying social networks. A famous example is the Six-Degrees of Separation which refers to the theory that a person is approximately 6 steps (edges of connection) away from another person on Earth. The brain satisfies the criteria of a small-world network, and this holds true on various scales in time and space and along the evolutionary ladder.

Caenorhabdidis elegans is the first organism to have its  nervous system described at a cellular level and shown to be a small-world network in terms of connectivity [Watts and Strogatz, 1998]. The organism has merely 302 neurons, a fact that made mapping of its connectome very easy in comparison to higher organisms such as human. In the human case, data produced by many methods of brain imaging have been used to build structural brain networks. For example, application of graph theory on data from Diffusion Tensor Imaging, revealed that a few areas such as the superior frontal cortex, superior parietal cortex, the precuneus and the insula have high correlation between them, in terms of centrality [Sporns and Kötter, 2004]. Moreover, functional brain networks have been constructed with the use of fMRI, MEG, EEG. Combination of data from the BOLD signal of fMRI which provides better spatial resolution and data from electrophysiological methods such as EEG, which provide better temporal resolution, and application of interpolation methods made it possible to generate more accurate networks of brain activation. Moreover, science has gone further down the line, and scientists are able to record neuronal network activity in a dish (in vitro) by MEA (multielectrode array). From these in vitro studies and animal studies, graph theory has shown that neuronal nodes with  similar connection patterns, tend to have similar functions [White et al., 1986]. Graph theory has and is proving very useful in understanding the changes that take place in various neurological and neuropsychiatric syndromes. For example, network analysis of MEG and fMRI data from Alzheimer’s and schizophrenic patients showed that small-world organization is lost and this held true both on the functional and on the structural scale. Moreover network analysis, offers the ability to ”visualize” changes that happen in the molecular level, and allows better understanding of processes such as synaptic plasticity, which is a process fine-tuned on the scale of milliseconds, and it is considered (LTP), to be the molecular underpinning of memory formation.  Comparison of the functional networks on the molecular scale, to functional networks collected from humans while performing a memory task , may help us understand the functional and even the structural changes taking place in the brain during memory formation and shed light in the pathophysiology of memory-related conditions.

This post only provides a brief introduction on the subject of Graph Analysis and brain function and structure. Nevertheless, it is clear from the little described here, that Graph Analysis along with other mathematical and statistical methods, may provide the key to understand better this eloquently-build structure called brain.


Bullmore, E. & Sporns, O. Complex brain networks: graph theoretical analysis of structural and functional systems.         Nat.Rev.Neurosc.10, 186-198 (2009)

Watts, D. J. & Strogatz, S. H. Collective dynamics of “small-world” networks. Nature 393, 440–442 (1998).

Sporns, O. & Kötter, R. Motifs in brain networks. PLoSBiol. 2, 1910–1918 (2004)

White, J. G., Southgate, E., Thomson, J. N. & Brenner, S. The structure of the nervous system of the nematode Caenorhabditis elegans. Philos. Trans. R. Soc. Lond. B Biol. Sci. 314, 1–340 (1986).

Leave a comment

Filed under Biology, Graph Theory, Neurons, Neuroscience