[Frontiers In Bioscience, Landmark, 22, 1697-1712, June 1, 2017]

Pathway-based classification of breast cancer subtypes

Alex Graudenzi1,2, Claudia Cava1, Gloria Bertoli1, Bastian Fromm3, Kjersti Flatmark3,4,5, Giancarlo Mauri2,6, Isabella Castiglioni1

1Institute of Molecular Bioimaging and Physiology of the Italian National Research Council (IBFM-CNR), Milan, Italy, 2Department of Informatics, Systems and Communication, University of Milan-Bicocca, Milan, Italy, 3Department of Tumor Biology, Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway, 4 Department of Gastroenterological Surgery, Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway, 5Institute of Clinical Medicine, University of Oslo, Oslo, Norway, 6 SYSBIO Centre of Systems Biology (SYSBIO), 20126 Milan, Italy


1. Abstract
2. Introduction
3. Biological background
4. Methods
4.1. Data sources
4.2. Multiclass classification of BC subtypes
4.2.1. Enrichment of relevant pathway for feature selection
4.2.2. SVM-based OvO classifier
4.2.3. Dataset preprocessing
4.2.4. Features selection
5. Results
5.1. Relevant pathway enrichment
5.2. Classification performance evaluation
5.2.1. Comparison with other techniques
6. Discussion
7. Acknowledgments
8. References


Cancer heterogeneity represents a major hurdle in the development of effective theranostic strategies, as it prevents to devise unique and maximally efficient diagnostic, prognostic and therapeutic procedures even for patients affected by the same tumor type. Computational techniques can nowadays leverage the huge and ever increasing amount of (epi)genomic data to tackle this problem, therefore providing new and valuable instruments for decision support to biologists and pathologists, in the broad sphere of precision medicine. In this context, we here introduce a novel cancer subtype classifier from gene expression data and we apply it to two different Breast Cancer datasets, from TCGA and GEO repositories. The classifier is based on Support Vector Machines and relies on the information about the relevant pathways involved in breast cancer development to reduce the huge variable space. Among the main results, we show that the classifier accuracy is preserved at excellent values even when the variable space is reduced by a 20-fold, hence providing a precious tool for cancer patient profiling even in case of limited experimental resources.


1. M Gerlinger, AJ Rowan, S Horswell, J Larkin, D Endesfelder, E Gronroos, P Martinez, N Matthews, A Stewart, P Tarpey, I Varela, B Phillimore, S Begum, NQ McDonald, A Butler, D Jones, K Raine, C Latimer, CR Santos, M Nohadani, AC Eklund, B Spencer-Dene, G Clark, L Pickering, G Stamp, M Gore, Z Szallasi, J Downward, PA Futreal, C Swanton: Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N Engl J Med, 366(10):883-92 (2012)

2. R Fisher, L Pusztai, C Swanton: Cancer heterogeneity: implications for targeted therapeutics. Br J Cancer, 108(3):479-85 (2013)

3. RA Burrell, C Swanton: Tumour heterogeneity and the evolution of polyclonal drug resistance. Mol Oncol, 8(6):1095-111 (2014)

4. R Mirnezami, J Nicholson, A Darzi. Preparing for precision medicine. N Engl J Med, 366(6):489-91 (2012)

5. National Cancer Institute; National Genome Research Institute (2015) The Cancer Genome Atlas (Natl Inst Health, Bethesda). Available at https://tcga-data.nci.nih.gov/tcga. Accessed Sept 30, 2016.

6. G Caravagna, A Graudenzi, D Ramazzotti, R Sanz-Pamplona, L De Sano, G Mauri, V Moreno, M Antoniotti, B Mishra: Algorithmic methods to infer the evolutionary trajectories in cancer progression. Proc Natl Acad Sci U S A, 113(28):E4025-34 (2016)

7. C Cava, G Bertoli, I Castiglioni: Integrating genetics and epigenetics in breast cancer: biological insights, experimental, computational methods and therapeutic potential. BMC Syst Biol, 1,9:62 (2015)

8. A Colaprico, TC Silva, C Olsen, L Garofano, C Cava, D Garolini, TS Sabedot, TM Malta, SM Pagnotta, I Castiglioni, M Ceccarelli, G Bontempi, H Noushmehr: TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data. Nucleic Acids Res,44(8):e71 (2016)

9. C Cava, I Zoppis, M Gariboldi, I Castiglioni, G Mauri, M Antoniotti: Copy–Number Alterations for Tumor Progression Inference. Lecture Notes in Computer Science, 7885:104-109 (2013)

10. T Sorlie, CM Perou, R Tibshirani, T Aas, S Geisler, H Johnsen, T Hastie, MB Eisen, M van de Rijn, SS Jeffrey, T Thorsen, H Quist, JC Matese, PO Brown, D Botstein, PE Lonning, AL Borresen-Dale: Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci U S A, 98(19):10869–10874 (2001)

11. J Khan, JS Wei, M Ringner, LH Saal, M Ladanyi, F Westermann, F Berthold, M Schwab, CR Antonescu, C Peterson, PS Meltzer: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med, 7(6):673–679 (2001)

12. D Singh, PG Febbo, K Ross, DG Jackson, J Manola, C Ladd, P Tamayo, AA Renshaw, AV D’Amico, JP Richie, ES Lander, M Loda, PW Kantoff, TR Golub, WR Sellers: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1(2):203–209 (2002)

13. LJ van ’t Veer, Dai H, MJ van de Vijver, YD He, AA Hart, M Mao, HL Peterse, K van der Kooy, MJ Marton, AT Witteveen, GJ Schreiber, RM Kerkhoven, C Roberts, PS Linsley, R Bernards, SH Friend: Gene expression profiling predicts clinical outcome of breast cancer. Nature, 415(6871):530–536 (2002)

14. C Sotiriou, P Wirapati, S Loi, A Harris, S Fox, J Smeds, H Nordgren, P Farmer, V Praz, B Haibe-Kains, C Desmedt, D Larsimont, F Cardoso, H Peterse, D Nuyten, M Buyse, MJ Van de Vijver, J Bergh, M Piccart, M Delorenzi: Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J Natl Cancer Inst, 98(4):262–272 (2006)

15. V Popovici, W Chen, BG Gallas, C Hatzis, W Shi, FW Samuelson, Y Nikolsky, M Tsyganova, A Ishkin, T Nikolskaya, KR Hess, V Valero, D Booser, M Delorenzi, GN Hortobagyi, L Shi, WF Symmans, L Pusztai: Effect of training-sample size and classification difficulty on the accuracy of genomic predictors. Breast Cancer Res, 12(1):R5 (2010)

16. AV Ivshina, J George, O Senko, B Mow, TC Putti, J Smeds, T Lindahl, Y Pawitan, P Hall, H Nordgren, JE Wong, ET Liu, J Bergh, VA Kuznetsov, LD Miller: Genetic reclassification of histologic grade delineates new clinical subtypes of breast cancer. Cancer Res, 1;66(21):10292-301 (2006)

17. C Sotiriou, P Wirapati, S Loi, A Harris, S Fox, J Smeds, H Nordgren, P Farmer, V Praz, B Haibe-Kains, C Desmedt, D Larsimont, F Cardoso, H Peterse, D Nuyten, M Buyse, MJ Van de Vijver, J Bergh, M Piccart, M Delorenzi: Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J Natl Cancer Inst, 98(4):262-72 (2009)

18. C Cava, G Bertoli, M Ripamonti, G Mauri, I Zoppis, PA Della Rosa, MC Gilardi, I Castiglioni: Integration of mRNA expression profile, copy number alterations, and microRNA expression levels in breast cancer to improve grade definition. PLoS One, 9(5):e97681 (2014)

19. PC Miller, J Clarke, T Koru-Sengul, J Brinkman, D El-Ashry: A novel MAPK-microRNA signature is predictive of hormone-therapy resistance and poor outcome in ER-positive breast cancer. Clin Cancer Res,21(2):373-85 (2015)

20. TM Severson, J Peeters, I Majewski, M Michaut, A Bosma, PC Schouten, SF Chin, B Pereira, MA Goldgraben, T Bismeijer, RJ Kluin, JJ Muris, K Jirström, RM Kerkhoven, L Wessels, C Caldas, R Bernards, IM Simon, S Linn: BRCA1-like signature in triple negative breast cancer: Molecular and clinical characterization reveals subgroups with therapeutic potential. Mol Oncol, 9(8):1528-38 (2015)

21. SG Zhao, M Shilkrut, C Speers, M Liu, K Wilder-Romans, TS Lawrence, LJ Pierce, FY Feng: Development and validation of a novel platform-independent metastasis signature in human breast cancer. PLoS One, 10(5):e0126631 (2015)

22. C Cava, I Zoppis, G Mauri, M Ripamonti, F Gallivanone, C Salvatore, MC Gilardi, I Castiglioni: Combination of gene expression and genome copy number alteration has a prognostic value for breast cancer. Conf Proc IEEE Eng Med Biol Soc, 2013:608-11 (2013)

23. VD Haakensen, V Nygaard, L Greger, MR Aure, B Fromm, IR Bukholm, T Lüders, SF Chin, A Git, C Caldas, VN Kristensen, A Brazma, AL Børresen-Dale, E Hovig, Å Helland: Subtype-specific micro-RNA expression signatures in breast cancer progression. Int J Cancer,139(5):1117-28 (2016)

24. A Colaprico, C Cava, G Bertoli, G Bontempi, I Castiglioni: Integrative Analysis with Monte Carlo Cross-Validation Reveals miRNAs Regulating Pathways Cross-Talk in Aggressive Breast Cancer. Biomed Res Int, 2015:831314 (2015)

25. J Tomfohr, J Lu, TB Kepler: Pathway level analysis of gene expression using singular value decomposition. BMC Bioinformatics, 6:225 (2005)

26. F Rapaport, A Zinovyev, M Dutreix, E Barillot, JP Vert: Classification of microarray data using gene networks. BMC Bioinformatics, 8:35 (2007)

27. J Su, BJ Yoon, ER: Dougherty: Accurate and reliable cancer classification based on probabilistic inference of pathway activity. PLoS One,4(12):e8161 (2009)

28. E Lee, HY Chuang, JW Kim, T Ideker, D Lee: Inferring pathway activity toward precise disease classification. PLoS comput biol, 4(11), e1000217 (2008)

29. L Yang, C Ainali, S Tsoka, LG Papageorgiou: Pathway activity inference for multiclass disease classification through a mathematical programming optimisation framework. BMC Bioinformatics,15:390 (2014)

30. A Zhavoronkov, AA Buzdin, AV Garazha, NM Borisov, AA Moskalev: Signaling pathway cloud regulation for in silico screening and ranking of the potential geroprotective drugs. Front Genet,5:49 (2014)

31. E Senkus, S Kyriakides, F Penault-Llorca, P Poortmans, A Thompson, S Zackrisson, F Cardoso: ESMO Guidelines Working Group.. Primary breast cancer: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up. Ann Oncol, 24 Suppl 6:vi7-23 (2013)

32. Cancer Genome Atlas Network: Comprehensive molecular portraits of human breast tumours. Nature, 490(7418):61-70 (2012)

33. R Edgar, M Domrachev, AE Lash: Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res, 30(1):207-10 (2002)

34. CM Perou, T Sørlie, MB Eisen, M van de Rijn, SS Jeffrey, CA Rees, JR Pollack, DT Ross, H Johnsen, LA Akslen, O Fluge, A Pergamenschikov, C Williams, SX Zhu, PE Lønning, AL Børresen-Dale, PO Brown, D Botstein: Molecular portraits of human breast tumours. Nature,406(6797):747-52 (2000)

35. T Sorlie, R Tibshirani, J Parker, T Hastie, JS Marron, A Nobel, S Deng, H Johnsen, R Pesich, S Geisler, J Demeter, CM Perou, PE Lønning, PO Brown, AL Børresen-Dale, D Botstein: Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci U S A,100(14):8418-23 (2003)

36. JS Parker, M Mullins, MC Cheang, S Leung, D Voduc, T Vickery, S Davies, C Fauron, X He, Z Hu, JF Quackenbush, IJ Stijleman, J Palazzo, JS Marron, AB Nobel, E Mardis, TO Nielsen, MJ Ellis, CM Perou, PS Bernard: Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol, 27(8):1160-7 (2009)

37. SG Wu, ZY He, Q Li, FY Li, Q Lin, HX Lin, XX Guan: Predictive value of breast cancer molecular subtypes in Chinese patients with four or more positive nodes after postmastectomy radiotherapy. Breast, 21(5):657-61 (2012)

38. M Kyndi, FB Sørensen, H Knudsen, M Overgaard, HM Nielsen, J Overgaard, Danish Breast Cancer Cooperative Group: Estrogen receptor, progesterone receptor,HER-2, and response to postmastectomy radiotherapy in high-risk breast cancer: the Danish Breast Cancer Cooperative Group. J Clin Oncol,26(9):1419-26 (2008)

39. KD Voduc, MC Cheang, S Tyldesley, K Gelmon, TO Nielsen, H Kennecke: Breast cancer subtypes and the risk of local and regional relapse. J Clin Oncol, 28(10):1684-91 (2010)

40. TO Nielsen, FD Hsu, K Jensen, M Cheang, G Karaca, Z Hu, T Hernandez-Boussard, C Livasy, D Cowan, L Dressler, LA Akslen, J Ragaz, AM Gown, CB Gilks, M van de Rijn, CM Perou: Immunohistochemical and clinical characterization of the basal-like subtype of invasive breast carcinoma. Clin Cancer Res, 10(16):5367-74 (2004)

41. MC Cheang, SK Chia, D Voduc, D Gao, S Leung, J Snider, M Watson, S Davies, PS Bernard, JS Parker, CM Perou, MJ Ellis, TO Nielsen: Ki67 index, HER2 status, and prognosis of patients with luminal B breast cancer. J Natl Cancer Inst,101(10):736-50 (2009)

42. Y Benjamini, Y Hochberg: Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B: Methodological, 57(1), 289–300 (1995)

43. A Fabregat, K Sidiropoulos, P Garapati, M Gillespie, K Hausmann, R Haw, B Jassal, S Jupe, F Korninger, S McKay, L Matthews, B May, M Milacic, K Rothfels, V Shamovsky, M Webber, J Weiser, M Williams, G Wu, L Stein, H Hermjakob, P D'Eustachio: The Reactome pathway Knowledgebase. Nucleic Acids Res,44(D1):D481-7 (2016)

44. D Nishimura: BioCarta. Biotech Software & Internet Report, 2(3):117–120 (2001)

45. M Kanehisa, S Goto: KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res, 28: 27–30 (2000)

46. V Vapnik: The nature of statistical learning theory. Springer science & business media (2013)

47. TS Furey, N Cristianini, N Duffy, DW Bednarski, M Schummer, D Haussler: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics,16(10):906-14 (2000)

48. XX Niu, CY Suen: A novel hybrid CNN–SVM classifier for recognizing handwritten digits. Pattern Recognition, 45(4), 1318-1325 (2012)

49. A Falanga, MN Levine, R Consonni, G Gritti, F Delaini, E Oldani, JA Julian, T Barbui: The effect of very-low-dose warfarin on markers of hypercoagulation in metastatic breast cancer: results from a randomized trial. Thromb Haemost, 79(1):23-7 (1998)

50. P Marcato, CA Dean, CA Giacomantonio, PW Lee: Aldehyde dehydrogenase: its role as a cancer stem cell marker comes down to the specific isoform. Cell Cycle, 10(9):1378-84 (2011)

51. W Shih, S Yamada: N-cadherin-mediated cell-cell adhesion promotes cell migration in a three-dimensional matrix. J Cell Sci, 125(Pt15):3661-70 (2012)

52. R Lamb, S Lehn, L Rogerson, RB Clarke, G Landberg: Cell cycle regulators cyclin D1 and CDK4/6 have estrogen receptor-dependent divergent functions in breast cancer migration and stem cell-like activity. Cell Cycle,12(15):2384-94 (2013)

53. A Nagarajan, P Malvi, N Wajapeyee: Oncogene-Directed Alterations in Cancer Cell Metabolism. Trends in Cancer, 2(7), 365-377 (2016)

54. F Di Virgilio: Purines, purinergic receptors, and cancer. Cancer Res,72(21):5441–7 (2012)

55. P Mehlen, C Delloye-Bourgeois, A Chédotal: Novel roles for Slits and netrins: axon guidance cues as anticancer targets? Nat Rev Cancer, 11(3):188–97 (2011)

56. S Ramaswamy, P Tamayo, R Rifkin, S Mukherjee, CH Yeang, M Angelo, C Ladd, M Reich, E Latulippe, JP Mesirov, T Poggio, W Gerald, M Loda, ES Lander, TR Golub: Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci U S A, 98(26):15149-54 (2001)

57. JC Ang, H Haron, HNA Hamed: Semi-supervised SVM-based feature selection for cancer classification using microarray gene expression data. In International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems. Springer International Publishing 468-477 (2015)

58. Z Cai, D Xu, Q Zhang, J Zhang, SM Ngai, J Shao: Classification of lung cancer using ensemble-based feature selection and machine learning methods. Mol Biosyst,11(3):791-800 (2015)

59. N Bandyopadhyay, T Kahveci, S Goodison, Y Sun, S Ranka: Pathway-Based Feature Selection Algorithm for Cancer Microarray Data. Adv Bioinformatics. 2009:532989 (2009)

60. W Engchuan, JH Chan: Pathway activity transformation for multi-class classification of lung cancer datasets. Neurocomputing 165: 81-89 (2015)

61. W Liu, X Bai, Y Liu, W Wang, J Han, Q Wang, Y Xu, C Zhang, S Zhang, X Li, Z Ren, J Zhang, C Li: Topologically inferring pathway activity toward precise cancer classification via integrating genomic and metabolomic data: prostate cancer as a case. Sci Rep,5:13192 (2015)

62. H Wang, H Zhang, Z Dai, MS Chen, Z Yuan: TSG: a new algorithm for binary and multi-class cancer classification and informative genes selection. BMC Med Genomics, 6 Suppl 1:S3 (2013)

63. HS Eo, JY Heo, Y Choi, Y Hwang, HS Choi: A pathway-based classification of breast cancer integrating data on differentially expressed genes, copy number variations and microRNA target genes. Mol Cells, 34(4):393-8 (2012)

64. S Kim, M Kon, C DeLisi: Pathway-based classification of cancer subtypes. Biol Direct, 7:21 (2012)

65. M List, AC Hauschild, Q Tan, TA Kruse, J Mollenhauer, J Baumbach, R Batra: Classification of breast cancer subtypes by combining gene expression and DNA methylation data. J Integr Bioinform,11(2):236 (2014)


1 Notice that the complex interaction among pathways ruling cancer development is sometimes referred to as pathway cloud (30).

2 website: https://gdc-portal.nci.nih.gov/projects/TCGA-BRCA.

3 website: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE58212.

4 As different classifiers might result in different sample-class associations we here do not show the classification results for each single classifier, yet we provide a performance evaluation of the method based on average values of accuracy, precision and recall.

Key Words: Cancer Subtypes Classification, Breast Cancer, BC, Pathway Enrichment; Differentially Expressed Genes, DEG, Review

Send correspondence to: Alex Graudenzi, Institute of Molecular Bioimaging and Physiology of the Italian National Research Council (IBFM-CNR), Milan, Italy, Tel: 390221717552, Fax: 390221717558, E-mail: alex.graudenzi@unimib.it