Category Archives: Gene Expression

Early Experiences and the Developing Brain

The developing brain is continuously absorbing sensory information and transforming this information into “fuel” for fine tuning the wiring and architecture of the brain circuit. The types of sensory information and early experiences the brain perceives during the malleable development period are critical for the evolution and structure of the brain circuit throughout life.

Part 1 from the “Three Core Concepts in Early Development” Series by the Center on the Developing Child at Harvard University.

Leave a comment

March 13, 2014 · 7:09 pm

Breast Cancer: New Driver Genes identified

Breast Cancer is the most common cancer type amongst women (but not to be mistaken, it can also affect men). Sadly, the cause of this disease has still to be pinned-down. The reason is the multiple-sided nature of the cancer i.e, each patient has a different cancer genetic profile. The genetic heterogeneity is not only among patients, but it is also, present at the very level of the tumor itself, i.e, researchers face the fact the the tumor is composed by cells of different genetic consistency, quality and mutational dictionary, thus forming a heterogeneous intra-tumor profile. Therefore this fact makes things even more complicated when it comes to understanding the genetic driving force behind tumor initiation, evolution and metastasis.

A recent publication by Stephens et al., (2012), undertook the advent of sequencing 21,416 protein coding genes, 1,664 microRNAs and copy numbers from breast cancer samples in order to understand the genetic roots and branches of  the cancer . From, the analysis, the authors concluded with 9 new candidate-cancer-driver genes: MAP3K1, MAP3K13, AKT2, NCOR1, SMARCD1, ARID1B, CDKN1B, CASP8 and TBX3.  In 6% of the cancers, a (somatic) mutation in  MAP3K1 (mitogen-activated protein kinase) was observed. Moreover, the authors state that this was observed predominantly in ER+ breast cancers. Moreover mutations were observed in MAP2K4 and MAP3KI3 and along with MAP3K1, these genes/proteins are implicated in the JUN kinase pathway and also in activation of the known tumor suppressor gene TP53. Along with MAP3K1, MAP3K13, AKT2 is another identified breast cancer-driver gene, which participates in the JUN kinase pathway.

A -complicated- illustration of JNK cascades and some of their effects on cellular physiology.

Genes NCORI, ARID1B and SMARCD1 , also included in the list of the newly recognized driver genes of this study, all participate in chromatin regulation.  CDKN1B is another identified target gene, which regulates cell cycle progression at phase G1 whereas, CASP8 is implicated in apoptosis. The last member on the 9 breast cancer driver-genes list is TBX3, which it self is a transcriptional factor that regulates morphogenesis of the forelimb in the anterior/posterior axis. Actually this gene participates in the normal development of the mammary tissue (Howard, B. & Ashworth), thus it could be considered a breast tissue specific-gene (?).

From the 40 driver mutations recognized, the authors state that a 58% was attributed to 7 known breast cancer genes: TP53, PIK3CA, ERBB2, MYC, FGFR1/ZNF703, GATA3 and CCND1.  This left a 42% (!) of driver mutations to be attributed in the relatively less frequently breast cancer associated/linked genes which includes the 9 new breast cancer driver-genes.

The importance of this study and others of its kind ( look for the specific edition of Nature Letters for similar studies) highlight the importance of deciphering the basic genetic dictionary and how it is read in cancer cells normal cells. Moreover, using whole-genome information, we can advance a step in understanding and predicting response to cancer treatment as Ellis et al., (2012) showed in the same edition of Nat.Letters.


Stephens et al., Nature Letters (2012) The landscape of cancer genes and mutational processes in breast cancer

Howard, B. & Ashworth, A. PLoS Genet. (2006) Signalling pathways implicated in early mammary gland morphogenesis and breast cancer. 

Ellis et al., Nature Letters (2012) Whole-genome analysis informs breast cancer response to aromatase inhibition.

Leave a comment

Filed under Cancer, Gene Expression, Science

On p300, enhancers and neurodevelopmental disorders.

The P300 or adenovirus E1A-associated cellular p300 transcriptional co-activator protein, is a transcriptional regulator (1, 2). It harbours an intrinsic acetyltransferase activity. Thus, by definition it affects gene transcription by inducing chromatin remodelling close to promoter sites and by providing chromatin accessibility to transcription factors and the transcriptional machinery (1). Moreover, P300 can interact with all four histone types of the nucleosome core i.e., H2A, H2B, H3 and H4 (2). Depending on the context of interactions with its co-regulators, gene transcription can either be upregulated  or downregulated, like in the case of p53 and ACTR respectively (3). By specifically interacting with the phosphorylated form of CREB, it also affects cAMP-gene regulation. Furthermore, the P300 protein has a critical role in embryonic development and neuraldevelopment. This is evident in humans, in the case of p300 mutations that cause loss of function and/or copy number alteration (reductions in copy number). These mutations cause an embryo to develop a condition called, broad thumb-hallux syndrome. Some of the phenotypic features of this syndrome, is craniofacial and limb formation abnormalities and mental retardation, highlighting the importance of p300 function during morphogenesis and neuraldevelopment in humans. The massive amount of genes under P300 control during these critical stages of embryonic development was revealed when Visel et al. (2009)(4), examined P300 binding sites by chromatin immunoprecipitation coupled to parallel massive sequencing (Chip-seq) in mouse embryos. Specifically the authors examined, P300 binding sites, in mouse embryos of embryonic age 11.5 (E11.5). This is an important stage for especially for neuraldevelopment, since in the mouse embryo, this is the stage were the neocortex epithelium initiates to expand by increasing proliferative divisions of radial glia cells (neural progenitors). Binding sites where examined in the forebrain (includes both the neocortex and the ventral telencephalon), the limbs and the midbrain. The sample size was more than 150 (!!!) embryos per tissue. In this case P300 binding was used to predict enhancer areas, since P300 was shown to associate in vitro with enhancer areas.Enhancers are DNA regions which enhance the transcription of a gene. Just for the forebrain, 2,543 P300 binding sites were identified by Chip-seq. Moreover, to examine the correspondence of binding sites to known genes, the authors performed tissue specific microarrays. In the forebrain’s case, they found that for the 885 genes which are overexpressed in the forebrain, 14 % of the identified P300 binding sites are within 101 kb from the promoter. Moreover, the enrichment for P300 binding sites was observed to increase according to the level of overexpression of forebrain-specific genes. The conclusion by the authors was that mapping P300 binding is a very accurate way to detect enhancers. In a relatively recent review, Williamson et al. (2011), provide information around enhancers and how knowing more about them might be useful to understand human disease. For example, quoting the authors and Noonan and McCallion (2010): “almost half of single nucleotide polymorphisms (SNPs) that show statistical associations with common/complex human disease and quantitative traits in genome-wide association studies (GWAS) are within noncoding regions and gene deserts and thus, potentially involve enhancers“. For the case of neurodevelopmental disorders, such as autism and schizophrenia, gaining more insights on how early imbalances in the brain structure come up though gene expression de-regulation, is critical. Achieving such progress, will help to understand how the disorder evolves and establishes in the brain. Also, this information, can hopefully lead us into finding ways to treat and perhaps prevent the disorder from evolving .

P300 Binding Sites in the mouse heart and (on the left ) conservation in rat, human, dog and elephant, by Hardison R.C., Nature Genetics (2010)

We are still a long way from home. Nevertheless, the rapid advancement of next generation sequencing technology (NGS), coupled with the parallel advancement in the computational methods created to explain sequencing and gene expression data, provide significant insights towards steps of progress.

1. Wikipedia>p300-CBP
2. GeneCards>p300
3. Li,Q. et al.(2002), Mol.Endo., 16(12),1819-2827.
4. Visel, A., et al.(2009), Nature 457(12), 854-858.
5. Williamson et al. (2011), Dev.Cell., 21(1), 17-19.
6. Noonan, J.P., and McCallion, A.S. (2010), Annu.Rev. Genomics Hum. Genet., 11, 1–23.
7. Hardison, R.C. (2010), Nature Genetics, 42, 734–735.

Leave a comment

Filed under Biology, Cell Cycle, Gene Expression, Microarray, Neurons, Neuroscience, Science, The Development Series

Extracting Biologically Meaningful Information from Gene Expression Data: Gene CoExpression Networks

Data generated from gene expression experiments hold a important amount of biological information (Eisen et al.1998). The end point of any analysis of this sort is to gain a thorough view and understanding in the “inner life” of a cell i.e. the ongoing biological processes in the cell. This can be considered as a bottom-up approach, whereby we can slowly build our way up from the transcript levels, to the cellular process and ultimately the understand biological process under question (of course by combining other appropriate methods in order to be able to extract causal relationships). A natural thought to do, is that genes that have similar expression patterns, within a dataset, may be participating in common biological processes or even be under the same regulatory mechanism(s) (Tavazoie et al., 1999). Clustering of genes with similar expression patterns is a useful approach to gain this sort of information and and also putatively extent the information of common regulatory control to extract participation of the genes in various pathways.  Paraphrasing/Quoting from Eisen et al., 1998: “Statistical organization (clustering) and graphical display  of a microarray dataset allows for researchers to assimilate and explore data in a biologically meaningful way.  … Also, similarity in the gene expression pattern may be the easiest way to make -at least provisional- attribution of function on a genomic scale”.

Along the same lines, Transcription factor (TF) binding sites are critical in our understanding of transcription and trascriptional regulation. A TF binding site lies close or in a promoter region, therefore it has the ability to regulate transcription by either recruiting the RNA-polymerase in the promoter, or by blocking its docking on the DNA. The actions of TFs are transcript specific i.e. the TFs has a range of genes whose transcription it modulates. Thephysical approachof constructing gene networks, seeks to determine the TFs and their respective DNA motifs to which they bind to regulate transcription. Another strategy, the “influence approach” of constructing gene networks, deals with gene expression data and describes the relationships between the transcript levels and how they interact to regulate each other’s transcription. The transcript interactions are described with a graph, in which the nodes represent transcripts and the edges represent a relationship between the connected transcripts, according to the graph-construction method followed. The graph can be constructed as a system of differential equation models, a bayesian network, a boolean network or as an association network. The latter approach creates a gene coexpression network by assigning edges to pairs of genes with high statistical similarity. Different similarity metrics have been used such as Euclidean distance, Pearson correlation coefficient, mutual information (e.g. ARACNE, CLR), partial correlation coefficient (graphical Gaussians models (GGMs)). Moreover to tackle with analysis of gene expression data from time-series experiments appropriate algorithms extract correlation relationships between transcript level changes at the different time points  (Schmit Raab Stephanopoulos Genome Res04; Arkin, Shen , Ross Science 1997).

Genomic strategies in our days are advancing with a speed-of-light and the amounts of data generated are massive. The aforementioned network approaches, borrowed by graph theory and statistics hold the promise to reveal critical biological information where the “data mining” ability of a bench researcher stops. This is especially important, but without being the only, for cancer research. For example, breast cancer is the leading cancer death cause in women. It self is of heterogeneous phenotype, both in terms of histological origin/initiation (e.g. can develop in the ducts or lobule of the breast) as also, in terms of heterogeneity in the mutational landscape of the cancer cells. The latter means that the tumor it self can be highly heterogeneous. Combining transcript level analysis by coexpression networks with the recent advancements in breast tumor whole-genome sequencing (see Gray and Druker Nature 2012), may prove critical in our understanding on cancer initiation and evolution.

For more information on coexpression network construction the interested reader is referred to Gardner and Faith PLReav 2005.


Tavazoie et al., Nature Genetics 1999 

Eisen et al .PNAS 1998

Gardner and Faith PLReav 2005

Schmit et al., Genome Res 2004

Arkin et al., Science 1997

Gray and Druker Nature 2012

Leave a comment

Filed under Biology, Coexpression, Gene Expression, Graph Theory, Microarray, Networks, Science, Similarity