Regulon Enrichment Analysis using GSEA #Written by Hanhae Kim. Contact kimhanhae@gmail.com for more information 1. A Regulon We define that ¡®A Regulon is a gene consisting of more than 15 neighbors (linkages) in functional gene network; Need discussion with Dr. Insuk Lee in order to adjust the size of Regulon. Normally thousands of regulons are generated under the 15 neighbors cut off. -If you use any other ¡®USUAL¡¯ gene sets, it¡¯s Gene Set analysis. So, Regulons are gene sets in Regulon Enrichment Analysis (REA) 2. Make file format To use Gene Set Enrichment Analysis (GSEA, www.broadinstitute.org/gsea ), you need three kind of data files. Easy to say, 1) GCT file; Expression data 2) GMT file; Gene Sets- in thise case, Regulons 3) CLS file; description about GCT file, such as conditions, number of arrays. For more details, click following link and make file format http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats 3. Run GSEA > Important: By permutation chances, output results are varied. It¡¯s actually Gene Set Analysis property. Therefore, run GSEA several times and find optimal results. You would get top Regulon ranks are consistent during several trials though. GSEA is served two types of version. One is R version, another is GUI version. Both use same file format described ¡®2. Make file format¡¯. You may know input parameters in order to use R version. I¡¯m not friendly with those parameters of R version, so I recommend using GUI version. One bothering thing to use GUI version is that you have to get JAVA software on your machine. Parameters on GSEA are important. For example, running GSEA needs permutation. There¡¯s two ways to permute. One is Phenotype (array samples) and another is Gene Set (here, we use Regulons). The default parameter for permutation of R version is ¡®Phenotype¡¯. Therefore, if you don¡¯t have enough array samples on your expression data, it¡¯s not running on R version as long as you change Gene Set permutation method. I¡¯m sure there¡¯s way to modify parameters. I¡¯m not clear with R version though. 1) R version ? command line version You can download R version of GSEA at Orion.; netbio/R.GSEA/ There¡¯s an example, GDS1012. You can see a sub directory, /GDS1012.C0085786/, which contains output example files after running R version of GSEA. It¡¯s welcome to test R version of GSEA with this sample. Before you test, please carefully read ¡®README¡¯ to run. (Dr. Sohyun Hwang made the README. Let's Thanks to her) You can also confirm and/or see how file formats are, by opening GDS1012.C0085786.gct, DOLite.gmt, GDS1012.C0085786.cls in the sub directory, /GDS1012.C0085786/. 2) GUI version ? Graphic User Interface GUI version serves very intuitive interface. Therefore, you can easily check and modify many parameters. You can run GSEA though you¡¯ve not enough array samples by Gese Set (Regulon) permutation. I personally prefer to use Gene Set permutation rather than phenotype permutation because we could get thousands of Regulons from our functional gene network. It may be more robust than phenotype permutation. (But I¡¯M NOT SURE~~~) To make sure with your project, read this paper about permutation method. (Thanks to Dr. Sohyun Hwang) http://www.ncbi.nlm.nih.gov/pubmed/?term=discovering%20statistically%20significant%20pathways%20in%20expression%20profiling%20studies >How to run (1) Load data: load gct, gmt, cls file. (2) Run GSEA: - Required files Expression dataset-input gct file will show up. Gene Sets database-gmt file; click ¡®Gene matrix (local gmx/gmt)¡¯ to show your input gmt file up. Number of permutations-fixed, 1000 Phenotype labels-Phenotype labels will show up according to your cls file information. Collapse dataset to gene symbols-default ¡®true¡¯; some has to change ¡®false¡¯ ==> GSEA support major organism array chip platform. In this case you can use this ¡®TURE¡¯, selecting chip platform on Chip platform(s) option. However, if your target array platform is not supported, choose ¡®false¡¯ to be not in trouble running. Permutation type-default ¡®phenotype¡¯; you can choose gene_set instead. Chip platform(s)-if the parameter of 5th option, Collapse dataset to gene symbols is true, you have to select which chip platform on which your target array sample was conducted. - Basic files and Advanced files; you can also modify or adjust parameter depending on your input data types and project. (3) click Run: You can run multiple times in optimizing different options as much as your CPU supports your work load.