- Title
- Genetic algorithm-based ensemble methods for large-scale biological data classification
- Creator
- Haque, Mohammad Nazmul
- Relation
- University of Newcastle Research Higher Degree Thesis
- Resource Type
- thesis
- Date
- 2017
- Description
- Research Doctorate - Doctor of Philosophy (PhD)
- Description
- We study the search for the best ensemble combinations from the wide variety of heterogeneous base classifiers. The number of possible ways to create the ensemble with a large number of base classifiers is exponential to the base classifiers pool size. To search for the best combinations from that wide search space is not suitable for exhaustive search because of it's exponential growth with the ensemble size. Hence, we employed a genetic algorithm to find the best ensemble combinations from a pool of heterogeneous base classifiers. The classification decisions of base classifiers are combined using the popular majority vote approach. We used random sub-sampling for balancing the class distributions in the class-imbalanced datasets. The empirical result on benchmarking and real-world datasets apparently outperformed the performances of base classifiers and other state-of-the-art ensemble methods. Afterwards, we evaluated the performance of an ensemble of classifiers combination search in a weighted voting approach using the differential evolution (DE) algorithm to find if employing weights could increase the generalisation performances of ensembles. The weights optimised by DE also outperformed both of the base classifiers and other ensembles for benchmarking and real-world biological datasets. Finally, we extend the majority voting-based ensemble of classifiers combination search with multi-objective settings. The search space is spread over the all possible ensemble combinations created with 29 heterogeneous base classifiers and the selection of feature subset from six feature selection methods as wrapper approach. The optimisation of two objectives, the maximisation of training MCC scores and maximisation of the diversity among base classifiers, with NSGA-II, a popular multi-objective genetic algorithm, is used for simultaneously finding the best feature set and the ensemble combinations. We analyse the Pareto front of solutions obtained by NSGA-II for their generalisation performances. Datasets taken from UCI machine learning repository and NIPS2003 feature selection challenges have been used to investigate the performance of proposed method. The experimental outcomes suggest that the proposed multiobjective-based NSGA-II found the better feature set and the best ensemble combination that produces better generalisation performances in compared to other ensemble of classifiers methods.
- Subject
- ensemble of classifiers; data classification; feature selection; wrapper feature selection; biomarkers; large-scale data; data analytics; imbalanced data classification; class imabalance; machine learning; weighted vote fusion; majority vote fusion; evolutionary computing; genetic algorithm; differential evolution; NSGA-II; multiobjective optimisation
- Identifier
- http://hdl.handle.net/1959.13/1335393
- Identifier
- uon:27427
- Rights
- Copyright 2017 Mohammad Nazmul Haque
- Language
- eng
- Hits: 1909
- Visitors: 2637
- Downloads: 384
Thumbnail | File | Description | Size | Format | |||
---|---|---|---|---|---|---|---|
View Details Download | ATTACHMENT01 | Thesis | 15 MB | Adobe Acrobat PDF | View Details Download | ||
View Details Download | ATTACHMENT02 | Abstract | 291 KB | Adobe Acrobat PDF | View Details Download |