Using Permutation-Based Feature Importance for Improved Machine Learning Model Performance at Reduced Costs
| dc.contributor.author | Khan, Adam | |
| dc.contributor.author | Ali, Asad | |
| dc.contributor.author | Khan, Jahangir | |
| dc.contributor.author | Ullah, Fasee | |
| dc.contributor.author | Faheem, Muhammad | |
| dc.contributor.orcid | https://orcid.org/0000-0003-4628-4486 | |
| dc.date.accessioned | 2026-02-03T12:08:00Z | |
| dc.date.issued | 2025 | |
| dc.description.abstract | In Software Quality Assurance (SQA), predicting defect-prone software modules is essential for ensuring software reliability and consistency. This task is commonly achieved through Machine Learning (ML) techniques, but improving model performance typically incurs significant computational costs. These high computational costs and uncertain payoffs make most Software engineering researchers reluctant to optimize ML models. This creates a need for novel techniques that can achieve near-optimal performance of hyperparameter settings while maintaining the computational efficiency of default settings. To address this, we employed five ML models, Decision Tree, Ranger, Random Forest, Support Vector Machine, and k-nearest Neighbors, and optimized their parameters using the random search technique. Our experiments covered six diverse Software Fault Prediction (SFP) datasets, encompassing various software features, application domains, and defect patterns, to evaluate the approach’s generalizability and effectiveness. Moreover, the Permutation Feature Importance (PFI)-based model-agnostic method was employed to identify the top ten features most critical for model accuracy and efficiency. These selected features were used to retrain the ML models without hyperparameters (default settings) to determine whether similar performance could be achieved at low computational cost. The results show an average accuracy improvement of 77.39% and a 92.02% reduction in computational cost. The most important case attained a 99.25% accuracy improvement and a 96.77% cost reduction. Such results clearly show that PFI-based feature selection is capable of high performance at a fraction of computational cost, offering an efficient solution for software engineers to optimize ML models. | en |
| dc.description.notification | © 2025 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ | |
| dc.description.reviewstatus | fi=vertaisarvioitu|en=peerReviewed| | |
| dc.format.pagerange | 36421-36435 | |
| dc.identifier.uri | https://osuva.uwasa.fi/handle/11111/19737 | |
| dc.identifier.urn | URN:NBN:fi-fe2026020310977 | |
| dc.language.iso | en | |
| dc.publisher | IEEE | |
| dc.relation.doi | https://doi.org/10.1109/ACCESS.2025.3544625 | |
| dc.relation.ispartofjournal | IEEE access | |
| dc.relation.issn | 2169-3536 | |
| dc.relation.url | https://doi.org/10.1109/ACCESS.2025.3544625 | |
| dc.relation.url | https://urn.fi/URN:NBN:fi-fe2026020310977 | |
| dc.relation.volume | 13 | |
| dc.rights | https://creativecommons.org/licenses/by/4.0/ | |
| dc.source.identifier | WOS:001435462200028 | |
| dc.source.identifier | 2-s2.0-85218874952 | |
| dc.source.identifier | 71e02d93-8810-499e-944a-b7cba1ba8253 | |
| dc.source.metadata | SoleCRIS | |
| dc.subject | Computational modeling | |
| dc.subject | Feature extraction | |
| dc.subject | Accuracy | |
| dc.subject | Computational efficiency | |
| dc.subject | Predictive models | |
| dc.subject | Optimization | |
| dc.subject | Support vector machines | |
| dc.subject | Random forests | |
| dc.subject | Radio frequency | |
| dc.subject | Decision trees | |
| dc.subject | Model-agnostic techniques | |
| dc.subject | permutation feature importance (PFI) | |
| dc.subject | software fault prediction (SFP) | |
| dc.subject | predictive accuracy | |
| dc.subject | machine learning (ML) | |
| dc.subject | computational cost | |
| dc.subject | default settings | |
| dc.subject | hyperparameter | |
| dc.subject.discipline | fi=Tietotekniikka tekn|en=Information Technology tech| | |
| dc.title | Using Permutation-Based Feature Importance for Improved Machine Learning Model Performance at Reduced Costs | |
| dc.type.okm | fi=A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä (vertaisarvioitu)|en=A1 Journal article (peer-reviewed)| | |
| dc.type.publication | article | |
| dc.type.version | publishedVersion |
Tiedostot
1 - 1 / 1
