- Title
- A novel feature selection approach for data integration analysis: applications to transcriptomics study
- Creator
- Puthiyedth, Nisha
- Relation
- University of Newcastle Research Higher Degree Thesis
- Resource Type
- thesis
- Date
- 2016
- Description
- Research Doctorate - Doctor of Philosophy (PhD)
- Description
- Meta-analysis has become a popular method for identifying novel biomarkers in the field of medical research. Meta-analysis has been widely applied to genome-wide association and transcriptomic studies due to the availability of datasets in the public domain. Joint analysis of multiple datasets has become a common technique for increasing statistical power in detecting biomarkers reported in smaller studies. The approach generally followed relies on the fact that as the total number of samples increases, greater power to detect associations of interest is anticipated. Integrating available information from different datasets to generate a combined result seems reasonable and promising. Consequently, there is a need for computationally based integration methods that evaluate multiple independent datasets investigating a common theme or disorder. This raises a variety of issues in the analysis of such data and leads to more complications than are seen with standard meta-analysis, including diverse experimental platforms and complex data structures. I illustrate these ideas using microarray datasets from multiple studies and propose an integrative methodology to combine datasets generated using different platforms. Having combined the data, the main challenge is to choose a subset of features that represent the combined dataset in a particular aspect. While the approach is well established in biostatistics, the introduction of new combinatorial optimisation models to address this issue has not been explored in depth. In 2004, a new feature selection approach based on a combinatorial optimisation method was proposed, entitled the (α,β)-k Feature Set problem approach. The main advantage of this approach over ranking methods for selecting individual features is that the features are evaluated as groups instead of on the basis of their individual performance. The (α,β)-k Feature Set problem approach has been defined having first in mind a single uniform dataset, and conceived in this ways, it is not readily applicable to the case of integrated datasets. An extended version of this approach handles integrated datasets in a consistent manner and selects features that differentiate sample pairs across datasets. The application of an (α,β)-k Feature Set problem -based approach for meta-analysis thus helps to identify the best set of features from a combined dataset, allowing researchers to reveal the genetic pathways that contribute to the development of a disease. I propose an extended version of the (α,β)-k Feature Set problem approach that aims to find a set of genes whose expression level may be used to identify a joint core subset of genes that putatively play an important role in two conditions: prostate cancer and Alzheimer's disease. The results of the current study suggest that the proposed method is an efficient meta-analysis method that is capable of identifying biologically relevant genes that other methods fail to identify. As the amount of data increases, this novel method can be applied to find additional genes and pathways that are significant in these diseases, which may provide new insights into the disease mechanism and contribute towards understanding, prevention and cures.
- Subject
- meta-analysis; biomarkers; combinatorial optimisation; prostate cancer; Alzheimer's disease
- Identifier
- http://hdl.handle.net/1959.13/1322449
- Identifier
- uon:24585
- Rights
- Copyright 2016 Nisha Puthiyedth
- Language
- eng
- Full Text
- Hits: 1466
- Visitors: 2143
- Downloads: 758
Thumbnail | File | Description | Size | Format | |||
---|---|---|---|---|---|---|---|
View Details Download | ATTACHMENT01 | Thesis | 14 MB | Adobe Acrobat PDF | View Details Download | ||
View Details Download | ATTACHMENT02 | Abstract | 256 KB | Adobe Acrobat PDF | View Details Download |