MACHINE LEARNING IN WATER TREATMENT PLANTS FOR WATER QUALITY PREDICTION AND PREDICTIVE MAINTENANCE

Parmar, Bhuminkumar

MACHINE LEARNING IN WATER TREATMENT PLANTS FOR WATER QUALITY PREDICTION AND PREDICTIVE MAINTENANCE

dc.contributor.author	Parmar, Bhuminkumar
dc.contributor.faculty	fi=Tekniikan ja innovaatiojohtamisen yksikkö\|en=School of Technology and Innovations\|
dc.contributor.organization	fi=Vaasan yliopisto\|en=University of Vaasa\|
dc.date.accessioned	2026-06-18T07:50:26Z
dc.date.issued	2026-05-22
dc.description.abstract	Water treatment plants are critical infrastructure for public health and sustainable resource management. Recently, Data center is a major industry which consume large source of ultra pure water for cooling. Most conventional water treatment facilities continue to operate using rule-based control systems and reactive monitoring practices, which limit their capacity for early detection of water quality deterioration and proactive maintenance planning. In Finland, growing national emphasis on digital transformation and sustainable development creates compelling opportunity to enhance an existing water treatment infrastructure with introducing machine learning through digital retrofitting rather than costly replacement. This thesis develops and evaluates a machine learning-based framework for water quality prediction by water chemical parameters and base on that predictive maintenance in conventional water treatment plants. The research develops ML models to predict water potability, compare algorithms, classify samples, create KNN health index, and enable retrofits without major infrastructure changes. This study utilizes the Water Potability data-set from Kaggle , comprising over three thousand water samples which is characterized by nine physico-chemical parameters including pH, sulfate, hardness, turbidity, chloramines, and trihalomethanes. The data preprocessing pipeline incorporates class-wise median imputation for missing values, inter-quartile range- based outlier capping, StandardScaler feature scaling, and SMOTE oversampling to address class imbalance. After that, seven supervised classification algorithms are implemented and tuned using GridSearchCV with five-fold cross-validation which are: Support Vector Machine, Decision Tree, Logistic Regression, Random Forest, XGBoost, K-Nearest Neighbors, and AdaBoost. The results demonstrate that ensemble tree-based methods consistently outperform distance- based and linear classifiers for water potability classification. Random Forest achieves the highest performance score , with an accuracy of approximately 79 % and an area under the ROC curve of approximately 0.88. The algorithm performance ranking is consistent with published comparative literature, strengthening confidence in its venerability. Feature importance analysis identifies sulfate and pH as the two dominant predictive features, together accounting for nearly half of the model's predictive capacity. Turbidity, despite its operational prominence, ranks the lowest in this dataset due to its limited variation across all samples. The KNN-based Health Index framework assigns a continuous health score from zero to one hundred to each water sample based on its proximity to a WHO guideline-defined ideal reference state. The framework classifies water samples into three operational maintenance zones: Healthy (HI ≥ 70), Warning (40–69), and Critical (HI < 40). A 365-day RO membrane maintenance simulation demonstrates that Health Index-triggered CIP scheduling reduces cleaning cycles by approximately 20 percent compared to conventional fixed-schedule maintenance - 20 versus 25 CIP events per year while keeping the membrane continuously within the Healthy and Warning zones. Traditional fixed-schedule maintenance, by contrast, performs approximately 30 percent of CIP events unnecessarily in the Healthy zone, wasting chemical resources, and 10 percent re-actively in the Critical zone, where increased energy consumption and membrane damage have already occurred. This finding is consistent with Gradiant's SmartOps AI deployment at the Bedok NEWater Factory, Singapore, where condition-based ML maintenance achieved 98.1 percent cleaning prediction accuracy (Gradiant, 2024). The integrated framework operates exclusively on parameters available from existing monitoring infrastructure, confirming its technical feasibility as a digital retrofitting for conventional water treatment plants. The findings have specific relevance for Finnish water utilities operating under the Health Protection Act and the EU Drinking Water Directive. The thesis contributes a validated, open-source, reproducible methodological framework that advances the state of the art in data-driven water treatment management and provides a practical foundation for the adoption of machine learning-based decision support in water utility operations.
dc.description.notification	fi=Opinnäytetyö kokotekstinä PDF-muodossa.\|en=Thesis fulltext in PDF format.\|sv=Lärdomsprov tillgängligt som fulltext i PDF-format\|
dc.format.content	fi=kokoteksti\|en=fulltext\|
dc.format.extent	118
dc.identifier.uri	https://osuva.uwasa.fi/handle/11111/20979
dc.identifier.urn	URN:NBN:fi-fe2026052251732
dc.language.iso	eng
dc.rights	CC BY 4.0
dc.subject.degreeprogramme	Master’s Programme in Computing Sciences
dc.subject.discipline	Sustainable and Autonomous Systems
dc.subject.yso	machine learning
dc.subject.yso	water quality
dc.subject.yso	water treatment
dc.subject.yso	optimisation
dc.subject.yso	water management
dc.title	MACHINE LEARNING IN WATER TREATMENT PLANTS FOR WATER QUALITY PREDICTION AND PREDICTIVE MAINTENANCE
dc.type.ontasot	fi=Diplomityö\|en=Master's thesis (M.Sc. (Tech.))\|sv=Diplomarbete\|

Tiedostot

Näytetään 1 - 1 / 1

Name:: Uwasa_2026_Parmar_Bhuminkumar.pdf
Size:: 1.62 MB
Format:: Adobe Portable Document Format

Lataa

Kokoelmat

Pro gradu -tutkielmat ja diplomityöt