PhD thesis defense to be held on November 25, 2022, at 14:00 (Multimedia room, Central Library Building, NTUA)

Picture Credit: Michail Sarafidis

Thesis title: Application of bioinformatics techniques and machine learning algorithms for the identification of diagnostic, prognostic, and predictive biomarkers for urinary bladder cancer

Abstract: This PhD thesis concerns the application of bioinformatics techniques and machine learning algorithms in order to identify diagnostic, prognostic and predictive - in terms of response to treatment - biomarkers for bladder cancer. Bladder cancer is a heterogeneous disease accounting for high incidence and prevalence worldwide, and is responsible for significant morbidity and mortality. In the context of this study, a systematic review was performed and all the gene expression data from DNA microarrays registered in the Gene Expression Omnibus database of the National Center for Biotechnology Information (NCBI), were collected in order to study and compare healthy and cancerous tissues for this disease. The systematic review identified 18 datasets that fulfilled the inclusion criteria and were included in the integrative meta-analysis. For these datasets, the raw data were obtained, pre-processed according to the microarray platform and, after the quality control and normalization, were integrated into a merged meta-dataset. This merged meta-dataset was utilized to determine the differentially expressed genes between healthy and cancer samples. Then, the protein – protein interaction network analysis was performed and the hub genes were detected. Furthermore, the weighted gene co-expression network analysis, which is an unsupervised technique, was applied and the node genes, which showed a high correlation with the phenotype of the samples, were detected. Subsequently, the common hub genes of the above two methods were identified, which were the key hub genes of the present study. These genes were first studied for their differential expression in urine and blood plasma samples from bladder cancer patients and healthy controls. Subsequently, the predictive value of these genes was analyzed using univariate, multivariate and LASSO regression analysis. Kaplan-Meier survival curve and functional receiver operating characteristic analysis were also implemented to identify genes with prognostic value and a prognostic model was constructed based on the expression of three genes. This model was tested for its performance on two independent datasets, showing high performance. Furthermore, by applying the above methods, the predictive ability of these genes – in terms of predicting the response of patients with invasive bladder cancer to pre-operative chemotherapy – was analyzed. Thus, a prediction model based on the expression of six genes was created and tested on two independent datasets, showing good performance. From the above analyses, a set of nine biomarker genes were identified which were found to be differentially expressed in urine or blood plasma between bladder cancer patients and healthy controls, and, at the same time, appeared to possess some predictive or prognostic ability. The expression of these biomarkers in bladder tissue of patients and healthy controls was confirmed using immunohistochemistry images and by utilizing public bioinformatics platforms. Finally, these nine biomarkers were used as features to implement classification models, which showed a particularly high performance in terms of discriminating samples into cancerous and healthy ones, highlighting the diagnostic value of these biomarkers.

Supervisor: Professor Emeritus Dimitrios – Dionysios Koutsouris

PhD Student: Michail Sarafidis