Identifying Differentially Expressed Genes Involved in Signal Transduction Pathways in Breast Cancer Dataset Using K-Means Clustering And Kegg Pathway Analysis

B.Yaagasree, Dr.R.Porkodi


Data Mining is one of the most dynamic and motivating area of research with the objective of finding meaningful information from huge data sets. The proposed research work uses K-Means clustering data mining Technique to analyze the differentially expressed genes in Breast Cancer microarray dataset of Bengin Breast Abnormalities (BBA  ,Ectopic Cancer(EC) and  Malignant Breast Cancer MBC. Further the KEGG pathways are analyzed to identify the genes of different disease category of clusters involved in the 7 signal transduction pathways

Full Text:



Hastie, T.; Tibshirani, R.; Friedman, J.H. “The Elements of Statistical Learning: Data, Inference and Prediction” Springer.

Mewes, H.W.; Frishman, D.; Mayer, K. F.; Munster Kotter, M., Noubibou, O.; Pagel, P. and Rattei, T. (2006) Nucleic Acids Research, 34, D169.

Hall, R. E., and Lilien, D. M. (1988), “Micro-TSP User's Manual for Time Series Analysis, Regres-sion, clustering, Forecasting”, Version 6.0.

Jason T. L. Wang, Mohammed J. Zaki, Hannu T. T. Toivonen, Dennis Shasha (New Jersey Institute of Technology, Rensselaer Polytechnic Institute, University of Helsinki, Courant Institute, New York University), 3 – 8.

Peter Bajcsy, Jiawei Han, Lei Liu, Jiong Yang (University of Illinois at Urbana-Champaign), 9 - 39.

Gough, N.R, Yaffe, M.B, “The transcriptional landscape of the yeast genome

defined by RNA sequencing. Science” 320, 1344–1349.

Kim and Choil.,”The impact of next-generation sequencing technology ongenetics”.

Mardis, E.R “Statistical inferences for isoform expression in RNA-seq. Bioinformatics” 25, 1026–1032 (2009).

Mohammed J. Zaki , Cluster analysis. Chichester, ISBN 9780470749913.

Michael (2010). "Algorithm-driven Artifacts in median polish summarization of Microarray da-ta". BMC Bioinformatics. 11: 553. doi:10.1186/1471-2105-11-553. PMC 2998528.PMID 21070630.

Hoheisel JD,(2007). "Detection and Visualization of Subspace Cluster Hierarchies". : 152163. ISBN 978-3-540-71702-7.

Jitao David "The 2012 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection". Nucleic Acids Res. 40 (Database issue): D1. doi:10.1093/nar/gkr1 196. PMC 3245068 PMID 22144685.

Giorgi FM, Bolger AM, Lohse M, Usadel B. (2010). “Web-scale k-means clustering”.Extensions to the k-means algorithm forclustering large data sets with categorical values". Data Mining and Knowledge Discovery. 2: 283–304.

Jeffrey T. Chang, Hinrich Schutze, Ph.D. and Russ B. Altman, M.D.,"Extensions to the k-means algo rithm for clustering large data sets with categorical values". Data Mining and Knowledge Discov-ery. 2: 283–304.

Pietro, Alfredo Ferro, Giuseppe Pigola, “Preprocessing techniques”Shasha (University of Catania, It-aly, Courant Institute, New York University), 43 - 57.

Kaizhong Zhang “Types of Preprocessing “(University of Western Ontario, Canada), 59 - 81.

Chain Monte Carlo; Marko Salmen University of Technology, Finland), 85 - 103.kivi, Heikki Mannila “Piecewise Constant Modeling of Sequential Data Using Reversible Jump “85 - 103.

Mendez Sequence identification of 2,375 human brain genes.Nature 355, 632–634 (1992).

Zhang, Dongsang and Zhou, Lina. “Data Mining Techniques in Financial Application”.IEEE Trans-actions on Systems, Man and Cybernetics – Part C: Applications and Reviews, Vol – 34, No- 4, Nov-2004, pp. 513 – 522.

Shah, Shital C. and Kusiak, Andrew, “Data Mining and Genetic Algorithm Based Gene Selection”, Artificial Intelligence in Medicine 2004, (31), pp 183-196, Vol – 2139, 2004.

Fayyad, Usama. “Expression Proofling” (15 June 1999). "First Editorial by Editor-in-Chief". 1 (1): 1. Doi:10.1145/2207243.2207269.

Jain, A. K., Murty, M. N. and Flyn, P. J., “Data Clustering: A Review”, Vol – 31, No – 3, pp 264 – 323, 1999.

Friedman "Numerical Taxonomy and Cluster Analysis". Typologies and Taxonomies. p. 34. ISBN 9780803952591.

Janssens A.C.J.W., and Gwinn M “Why so many clustering algorithms — A Position Paper". 4 (1): 65

Kurgan and Petr Musilek (2006); “A survey of Knowledge Discovery and Data Mining process models. The Knowledge Engineering Review”. Volume 21 Issue 1, March 2006, pp 1–24.

Smeeton N.C (1973). "SLINK: an optimally efficient algorithm for the single-link cluster meth-od" (PDF). 30–34.Doi:10.1093/comjnl/16.1.30.

De Smet (2004). "Pathway Analysis". Proceedings of the 2004 ACM SIGMOD international confer-ence on Management of data - SIGMOD '04. p. 455. ISBN 1581138598.

Jui-Hung, "Data, information, knowledge and principle: back to metabolism in KEGG". Nucleic Ac-ids Res. 42(Database issue): D199–205 .doi: 10 .1093 /nar/ gkt1076. PMC 3965122 . PMID 24214961.

Kanehisa M, Goto S (2000). "KEGG: Kyoto Encyclopedia of Genes and Genomes". Nucleic Acids Res. 28 (1): 27–30. doi:10.1093/nar/28.1.27. PMC 102409 .PMID 10592173


  • There are currently no refbacks.