Mikko Paaskoski Utilizing Advanced Analytics for reducing equipment failures and planning maintenance windows Case Predictive Maintenance with RUL (remaining useful life) estimation Vaasa 2025 School of Technology and Innovations Master’s thesis in Master of Science in Economics and Business Administration Industrial Management 2 UNIVERSITY OF VAASA School of Technology and Innovations Author: Mikko Paaskoski Title of the Thesis: Utilizing Advanced Analytics for reducing equipment failures and planning maintenance windows : Case Predictive Maintenance with RUL (remaining useful life) estimation Degree: Master of Science in Economics and Business Administration Programme: Industrial Management Supervisor: Ville Tuomi Year: 2025 Pages: 62 ABSTRACT: Predictive maintenance provides an alternative maintenance approach for more traditional maintenance strategies. Predictive maintenance utilizes data analytics, sensor technology and predictive algorithms in order to predict equipment/component failures before they actually occur, providing fundamentally different approach compared to more reactive and traditional maintenance strategies. The Remaining Useful Life (RUL) of an equipment/component is a cru- cial element in predictive maintenance and Prognostics and Health Management (PHM). The RUL is defined as the remaining time until the equipment reaches the end of its useful lifetime. The objective of this thesis is to develop a RUL estimation approach using a data set which rep- resents degradation process which is relevant in PHM applications in order to support planning of optimal maintenance activities and to identify propitious time windows for maintenance. The data set used in this thesis is generated in test bench simulation where filter is used to separate solid particles from the gas. This simulation process causes the clogging of filter which is considered to represent degradation process, and results in a non-linear growth of the dif- ferential pressure measured over the filter, causing the filter failure to occur when differential pressure exceeds certain threshold. There are a total of 50 different filter lifetime tests present in the data. In the literature review, the RUL estimation process framework is described according to related literature and the RUL estimation approach developed in this thesis is mainly implemented by following that framework. Also in the literature review, the related literature is studied further, in order see how different steps presented in RUL estimation process framework are imple- mented in various other researches. The actual RUL estimation approach was implemented by the author, by using R programming language and RStudio IDE. The developed RUL estimation approach was tested by simulating through each 50 filter lifetime tests by estimating RUL in various different stages during their lifetime. Based on analysis of the results received from RUL estimation simulation and testing, the devel- oped approach provides accurate RUL estimations at later stages of lifetime for majority of filter lifetime test cases. However, the earlier stages of filter lifetimes proved challenging for the de- veloped approach to produce accurate RUL estimates. KEYWORDS: predictive maintenance, advanced analytics, equipment failure, remaining use- ful life 3 VAASAN YLIOPISTO Teknologian ja innovaatiojohtamisen yksikkö Tekijä: Mikko Paaskoski Tutkielman nimi: Utilizing Advanced Analytics for reducing equipment failures and planning maintenance windows : Case Predictive Maintenance with RUL (remaining useful life) estimation Tutkinto: Kauppatieteiden maisteri Oppiaine: Tuotantotalous Työn ohjaaja: Ville Tuomi Valmistumisvuosi: 2025 Sivumäärä: 62 TIIVISTELMÄ: Ennakoiva huolto tarjoaa vaihtoehtoisen lähestymistavan perinteisempiin huoltotapoihin verrattuna. Ennakoiva huolto hyödyntää data-analytiikkaa, sensoriteknologiaa ja ennustemalleja vikojen ajankohtien ennustamiseksi ennen kuin ne todellisuudessa tapahtuvat, tarjoten olennaisesti erilaisen lähestymistavan huoltoon, verrattuna perinteisempiin ja reaktiivisempiin huoltostrategioihin. Laitteen/komponentin jäljellä oleva elinikä on olennainen elementti ennakoivassa huollossa sekä ennustehallinnassa ja se määritellään aikana, joka laitteella/komponentilla on jäljellä ennen sen käyttöiän loppua. Tämän tutkielman tavoitteena on kehittää jäljellä olevan eliniän estimointimenetelmä käyttäen tietojoukkoa, joka kuvaa ennustehallinnan sovelluksissa esiintyvää kulumisprosessia. Tämän estimointimenetelmän tavoitteena on tukea huoltotoimenpiteiden suunnittelua sekä tunnistaa huoltotoimenpiteille otollisia ajanhetkiä. Käytetty tietojoukko on tuotettu testipenkkisimulaatiossa, jossa suodattimia on käytetty erottelemaan kiinteitä hiukkasia kaasusta. Tämä simulointiprosessi aiheuttaa suodattimen tukkeutumisen, jonka katsotaan kuvaavan kulumisprosessia ja aiheuttaa suodattimen yli mitatun paine-eron epälineaarisen kasvun, jonka seurauksena paine-ero ylittää tietyn raja-arvon, joka puolestaan johtaa suodattimen vikaantumiseen/rikkoutumiseen. Tietojoukko sisältää dataa 50:n eri suodattimen elinikätestistä. Kirjallisuus katsauksessa kuvataan jäljellä olevan eliniän estimointiprosessin viitekehys, jota noudatettiin pääpiirteittäin tässä tutkielmassa esitetyn estimointimenetelmän kehityksen aikana. Tämän lisäksi kirjallisuuskatsauksessa esitetään, miten viitekehyksessä määriteltyjä vaiheita on toteutettu muissa tutkimuksissa. Varsinainen estimointimenetelmä on toteutettu tutkielman tekijän toimesta käyttäen R- ohjelmointikieltä ja RStudio-kehitysympäristöä. Kehitettyä estimointimenetelmää testattiin simuloimalla läpi erikseen jokainen 50 suodattimen elinikätesti estimoimalla suodattimen jäljellä oleva elinikä useana eri ajankohtana suodattimen koko eliniän ajalta. Estimointimenetelmän simuloinneista ja testauksesta saatujen tulosten analysointi osoittaa, että kehitetty estimointimenetelmä pystyy tuottamaan tarkkoja ennusteita suodattimien elinikien myöhäisemmissä vaiheissa suurimmalle osalle elinikätesteistä. Aikaisemmat eliniän vaiheet osoittautuivat kuitenkin haasteellisiksi kehitetylle estimointimenetelmälle tarkkojen ennusteiden tuottamisen kannalta. AVAINSANAT: ennakoiva huolto, edistynyt analytiikka, laiteviat, jäljellä oleva elinikä 4 Contents 1 Introduction 7 1.1 Background of the study 7 1.2 Research question, objectives and limitations 8 1.3 Structure of the thesis 9 2 Literature Review 10 2.1 Benefits, challenges and applications of predictive maintenance 10 2.2 Framework for RUL estimation process 17 2.3 Advanced analytics methods for RUL estimation 19 3 Methodology 23 3.1 Data used in development of RUL estimation approach 25 3.2 Data analysis and exploration 27 3.3 RUL estimation approach development and testing 34 4 Results 44 4.1 Example results 44 4.2 Analysis of all tests 49 4.3 Analysis of individual tests 52 5 Conclusions and discussion 55 5.1 Research question and objective 55 5.2 Research limitations 56 5.3 Recommendations for future research 57 References 58 5 Figures Figure 1. Framework for RUL estimation process (Ferreira & Goncalves 2022). 17 Figure 2. Test bench used for filter life testing (Hagmeyer et. al. 2024). 25 Figure 3. Data sample. 27 Figure 4. Differential pressure as a function of time for Test no. 14. 28 Figure 5. Differential pressure as a function of time for all tests. 29 Figure 6. Histogram presenting measurement data coverage percentages. 30 Figure 7. Flow rates as a function of time for all tests. 31 Figure 8. Dust feeds as a function of time for all tests. 31 Figure 9. Combining existing variables into new grouping variable for each test. 32 Figure 10. Differential pressures as a function of time by each newly formed group. 33 Figure 11. Differential pressure and feature values for Tests no. 4 & 8. 37 Figure 12. Percentages of variance for each principal component for Test no. 36. 38 Figure 13. Correlation data frame for Test no. 36. 39 Figure 14. HIs and differential pressure as a function of time for Test no. 36. 40 Figure 15. Example of total lifetime prediction for Test no. 36. 41 Figure 16. Simulation results for each iteration for Test no. 36. 43 Figure 17 Final prediction of total lifetime for Test no. 49. 45 Figure 18 Simulation results for each iteration for Test no. 27. 46 Figure 19 Actual and predicted total lifetimes for Test no. 49. 47 Figure 20 Actual and predicted RUL for Test no. 49. 48 Figure 21 Calculated MAPE and data coverage for all tests. 49 Figure 22 Calculated MAPE and data coverage for tests with data coverage >= 60%. 50 Figure 23 Prediction error of last total lifetime prediction for all tests. 51 Figure 24 Actual and predicted RUL for tests with data coverage >= 60%. 52 Figure 25 Final prediction of total lifetime for Test no. 43. 53 Figure 26 Final prediction of total lifetime for Test no. 35. 54 Abbreviations HI Health Indicator MAPE Mean Absolute Percentage Error 6 PCA Principal Component Analysis PHM Prognostics and Health Management RUL Remaining Useful Life 7 1 Introduction Traditionally, the maintenance of equipment and machinery has been a reactive process which has occurred only in the cases of break down and failure, and has resulted in repair or replacement. This maintenance approach causes significant repair costs, reduced op- erational efficiency and extended downtime periods. Predictive maintenance provides alternative maintenance approach, which relies on data analytics, predictive algorithms and sensor technology in order to predict equipment failures prior to their occurrence, providing fundamentally different approach compared to traditional and reactive pro- cesses. Consequently, predictive maintenance approach provides opportunity to proac- tively plan and schedule maintenance windows and activities (Patel et. al. 2023). Re- maining useful life (RUL) of an equipment or component is crucial in predictive mainte- nance and Prognostics and Health Management (PHM) and it is defined as a time re- maining until the equipment/component reaches the end of useful life (Wang Q. et. al. 2019). The objective of this thesis is to develop and demonstrate the RUL estimation approach with the example data set representing degradation process which is relevant in PHM applications, in order to support maintenance scheduling and planning when identifying propitious time windows for maintenance. 1.1 Background of the study In the industrial sector, maintenance has a crucial role due to the fact that it may cover a considerable amount of the companies’ production costs. By applying efficient mainte- nance practices and strategies, companies can reduce costs, prevent unplanned produc- tion stops and even extend the lifetime of industrial machines. Over time, the mainte- nance strategies have evolved and for instance, approaches as corrective maintenance and preventive maintenance have emerged. In corrective maintenance the main idea is to replace or repair a part/component only when it is damaged due to the damaged part/component leading to a situation where the equipment/machinery does not work. 8 In preventive maintenance, periodical inspections and replacements are performed in order to avoid critical failure (Nunes et. al. 2023). Nunes et. al. (2023) further argue that in preventive maintenance, periodical replace- ment could cause replacement of part/components which are still in fine condition and could still perform within machinery, causing additional maintenance costs. However, the opposite could also occur, where parts/components fail prior periodical replacement, causing the need for corrective maintenance to be performed. Due to these issues, more advanced maintenance strategy is required, where mainte- nance does not occur too early in order to utilize the majority of the component/part lifetime and other requirement is that maintenance still needs to occur prior critical fail- ure. 1.2 Research question, objectives and limitations The research question of the thesis is: How data measured from equipment/component could be utilized to identify optimal and propitious time windows for maintenance for the given equipment/component? In this thesis the objective is to utilize data generated by test bench simulation where single component, filter in this case, is used to separate solid particles from gas and measurement data collected from this process represents degradation process which is relevant in PHM applications. The idea is to predict the total lifetime of the filter in vari- ous points of its lifetime, providing RUL also for these different points of lifetime. The received RUL can be then used to support planning and scheduling maintenance win- dows. Additionally, the presented RUL approach is aimed to be scalable for all processes following similar degradation from data perspective as the tested filters. This thesis is limited to addressing only RUL estimation of the components, which follow the similar degradation process from data perspective as component studied in this 9 thesis. Also, it is assumed that the failures of the process, which is simulated in test bench, are dependent only on the single component, which RUL is predicted. In addition, the actual maintenance planning and maintenance window definition based on RUL predic- tion results is not included in this thesis. And finally, the developed RUL approach is ap- plicable only in cases when operating conditions of the component remain constant throughout its complete lifetime. 1.3 Structure of the thesis This thesis consists of 4 different chapters in addition to this Introduction chapter: - Chapter 2 consists of a literature review where: general topics related to predic- tive maintenance, RUL estimation process framework and methods used in RUL prediction solutions in related literature, are described. - Chapter 3 describes methodology, process steps of the thesis research and tools used in this thesis. Additionally, the used data set is described and further ex- plored and analyzed. Finally, the developed RUL estimation approach is pre- sented step by step. - Chapter 4 presents results of simulation and testing of the developed RUL esti- mation approach. - Chapter 5 includes conclusions and discussion of the thesis. 10 2 Literature Review This literature review consists of 3 different main sections: first section describes general topics related to predictive maintenance, second section describes RUL estimation pro- cess framework, which was mainly followed during the development of RUL estimation approach presented in this thesis and third section describes Advanced Analytics meth- ods used in RUL estimation solutions presented in related literature. 2.1 Benefits, challenges and applications of predictive maintenance This subchapter covers general topics related to predictive maintenance: required steps of the predictive maintenance workflow and solution life cycle when solution is imple- mented using data science or machine learning approaches, challenges faced when adopting predictive maintenance solutions, benefits of predictive maintenance and some real-life applications fields which utilize predictive maintenance. By applying predictive maintenance solutions into production environments, multiple benefits can be achieved (Dalzochio et. al. 2020; Huang et. al. 2024; Wen et. al. 2022): 1. Minimization of unplanned downtimes and maximization of runtime: minimizing downtime and maximizing runtime by identifying optimal time windows for maintenance activities. 2. Optimization in planning maintenance activities and interventions: identifying optimal maintenance windows leads to efficient and optimal maintenance activ- ities. 3. Reduction of system failures and risk mitigation: when identifying potential mo- ments of failures and performing maintenance according to those moments, fail- ures are avoided which also contributes to safety. 11 4. Increased efficiency in the usage of financial and human resources: Optimal maintenance activities lead to efficient human resource usage and minimize la- bor costs. 5. Material cost savings and inventory management: knowledge of optimal mainte- nance windows supports sourcing of spare parts used in maintenance, contrib- uting also inventory management activities. 6. Improved customer satisfaction: customer satisfaction is improved with mini- mized downtimes and maintenance costs. Predictive maintenance workflow and solution life cycle, when implementing predictive maintenance project solution that requires data science or machine learning approach, can be presented in 5 different steps, structuring the workflow and solution life cycle (Achouch et. Al. 2022): 1. Understanding project needs: The first step is to understand business problems and possible constraints to overcome within the project. This includes under- standing how the system or equipment, for which the project solution is devel- oped, operates. This step includes defining physical quantities to be measured, selection of sensors for collection of measurement data and installation of those sensors if not already done. Also defining failure types is required in this step. 2. Data collection, understanding and preparation: Second step includes ensuring that used sensors include capability to collect and transfer the measurement data into database, for instance, where the data can be stored and accessed later. Also understanding what portion of collected data is relevant for further analysis and usage is required, in addition to assessing the quality of available data. The final part of the second step is the data preparation. Data preparation, depending on the project, can be the most important and time-consuming single task within 12 the predictive maintenance project. Data preparation might include feature en- gineering, dimension reduction, processing data into desired structure, cleaning the data, handling outliers or removing erroneous data, for instance. 3. Data modeling: In the third step, data model is defined and developed. This data model takes the results of the second step as input and produces the required output, based on problem which data model is defined to solve. Data model can be selected to solve regression, classification or clustering problem, for instance. The third step also includes testing and parametrization of single or multiple al- gorithms in order to develop the best possible data model. 4. Evaluation and deployment: Fourth step includes evaluation and deployment of the data model developed in third step. In the model evaluation phase, the data model is evaluated by measuring accuracy (does the model describe accurately the used input data) and relevance (does the model answer the original problem to which it was developed and defined). After the model evaluation is done, model is deployed in required format and model outputs are then used in fifth, decision making step. 5. Decision making: Fifth step consists of decision making which includes defining multiple possible maintenance scenarios with associated repair costs and times. After defining several possibilities as maintenance scenarios, based on data model output and resources available (spare parts and available workforce), the best scenario is selected in order to minimize costs and delays. Final action is to review the effectiveness of the decision made since the predictive maintenance life cycle is repetitive, leaving room for possible improvements in future mainte- nance decisions. Achouch et. al. (2022) further argue that the adaptation of predictive maintenance is unavoidable in an industrial context, however there are challenges related to adoption 13 to this novel maintenance approach. Even though there are existing predictive mainte- nance algorithms available that can be utilized, companies interested in adopting and benefitting predictive maintenance approaches still need to consider the choice be- tween benefits of predictive maintenance and capital investments required to acquire necessary tools, expertise and software which are required in implementation of predic- tive maintenance solutions. This issue is more relevant in the early phases of develop- ment and adaptation of predictive maintenance solutions when the maturity of the data representing normal and abnormal equipment behavior is not sufficient, and also in the cases when expertise of new systems and their operation is lacking. Achouch et. al. (2022) additionally have distinguished four different groups for the pre- dictive maintenance challenges: Financial and Organizational Limits, Data Source Limits, Machine Repair Activity Limits and Limits in the Deployment of Industrial Predictive Maintenance Models. Challenges related to financial and organizational limits: Predictive maintenance ap- proaches rely on sensors, data, artificial intelligence, cloud computing and programma- ble logic controllers, forming complex business process which might require, for instance, installation of measurement systems, setting up data processing infrastructure and en- suring its availability and maintaining compatibility between information systems and business units (Wellsandt et. al. 2022). All mentioned prior are self-evident financial costs for the company. Jin et. al. (2016) state that the size of the company has an impact on how businesses can allocate costs to predictive maintenance development: larger companies have the ability to focus more on improvement of predictive maintenance technologies. Wellsandt et. al. (2022) further argue that from organizational perspective, challenges include convincing managers to allocate budget to predictive maintenance initiatives and convincing other employees, for instance experts to use and rely on re- sults and recommendations provided by predictive maintenance applications in some form. 14 Challenges related to data source limits: Maturity and quality of the input data affects significantly to the outputs of the predictive maintenance solutions and the majority of the input data consists of component and machine information in addition to historical sensor and maintenance data (Maktoubian et. al. 2021). This proves that relevant data for predictive maintenance solution of a certain component is not available at the start of the life cycle of the component, since accumulating historical data requires compo- nent to be operational for certain period of time, causing difficulties in developing accu- rate predictive maintenance solution in the start of the component life cycle. Keleko et. al. (2022) state that in addition to noisy, missing, complex and high-volume data used in predictive models, which can lead to erroneous results and predictions, data challenges include usage of the data collected from different systems which are diverse in nature, since combining these various data sources for predictive models could prove challeng- ing. Challenges related to machine repair activity limits: Achouch et. al. (2022) argue that even in the case when predictive maintenance models are able to predict the component remaining lifetime which result in the planning and determining the maintenance mo- ments, the actual maintenance activity may face challenges due to the lack of self- maintenance and human interaction. More precisely, the maintenance effectiveness of activities are dependable on the human management quality, since components are con- trolled and maintained by human operators. Achouch et. al. (2022) further discuss, that since maintenance activity planning and implementation are dependent on human de- cision making based on data and expertise, which are inputs that machine could also retrieve, it could be possible to develop intelligent component. This intelligent compo- nent could propose and even initiate action that are beneficial for the component and system health. Challenges related to limits in the deployment of industrial predictive maintenance mod- els: In general, the model deployment of prediction and machine learning models in- cludes three steps which can cause challenges: integration, monitoring and updating. 15 Integration of models is performed by deploying developed model to existing software infrastructure. Deployed model in the infrastructure requires maintenance over its life cycle, cumulating more work for maintenance purposes when additional models are de- ployed over time. Monitoring prediction quality, which is crucial in order to trigger alarms due to deviations in predictions, can be challenging to define since there are mul- tiple factors to consider including varying input data over time, prediction bias and the overall model performance. Updating the model is relevant after the initial deployment since model developers need to ensure that model reflects most recent trends in the data. Challenges related to model update are practical matters of the phenomenon which is predicted; for instance, how frequently models are updated and how much lat- est observations are given weight within models (Peleyes et. al. 2022). Wen et. al. (2022) further present dividing predictive maintenance applications into four different application fields: rotating machineries, power systems, electronic components and aircrafts. In industry and machinery, the most common and widely used rotating parts include gears, shafts, bearings and motors. Rotating machinery is crucial for the health of the machine and since any issues with rotating parts can result in significant failures and safety related consequences (Wen et. al. 2022). Das et. al. (2023) mention that in industry, these rotating machines have a critical role in multiple different pur- poses. For instance, the machines are utilized in automated production solutions which provide products for various needs. Failures in these machines could cause severe supply chain disruptions and cause significant costs. Wen et. al. (2022) further mention that predictive maintenance applications in rotating machinery include RUL estimations of gears and bearings, based on mechanical and vibration signals measured from those components. When considering the power systems, increasing operational and maintenance costs af- fect the wind energy industry for instance, due to the component failures. The wind tur- bine faults are accelerated by multiple causes, which include lubrication issues and tem- perature stress caused by temperature difference within different components (Wen et. 16 al. 2022). SCADA (Supervisory Control And Data Acquisition) system, which is built to control electricity generation within the wind turbine, can be used to collect data from sensors placed on main components of the wind turbine. These sensors collect meas- urements from bearing vibration, temperature and wind speed for instance. These sen- sor measurements can be then further used and processed in order to predict RUL of the wind turbine (Stetco et. al. 2019). Lithium-ion batteries are used as main energy sources in various electronic systems in- cluding electric vehicles, renewable energy devices, consumer electronics and airplanes. The performance of the batteries decrease over the battery lifetime which appears as resistance increase and capacity loss. RUL and health state estimations of the batteries are crucial tools to evaluate and monitor the performance of the batteries in order to ensure reliability and safety of those batteries (Wen et. al. 2022). Some factors affecting the ageing of the battery include varying temperatures (high/low), high currents and mechanical stress. Features can be derived from these measured factors which then are further used in RUL estimation (Li Y. et. al. 2019). In the aircraft propulsion system, the engine is the power component which produces the thrust through propels of the aircraft. Failure of the aircraft engine may lead to even serious accidents in worst cases. In addition to engine, other crucial systems include ac- tuators and auxiliary power unit. The main task of auxiliary power unit is to supply power at specific flying altitude and to produce bleed air to the cabin air system while on the ground. The role of the actuators is to convert electrical signals into mechanical move- ment and into other physical quantity, such as pressure or temperature, in control sys- tem (Wen et. al. 2022). Chehade et. al. (2017) developed RUL estimation for aircraft tur- bine engines based on data containing information from multiple sensors including tem- peratures, pressures and speeds for instance. 17 2.2 Framework for RUL estimation process In order to support understanding of the RUL estimation approach presented in this the- sis, framework for RUL estimation process is presented here in this subchapter. The framework is followed mainly during the development of RUL estimation approach. There is rather an overlap with predictive maintenance workflow presented earlier but the framework considered here provides more detail for the case of RUL estimation pro- cess. Ferreira & Goncalves (2022) present RUL estimation process consisting of a total of ten micro steps and four macro steps. Figure 1 below presents all micro and macro steps and the whole process. Figure 1. Framework for RUL estimation process (Ferreira & Goncalves 2022). Data extraction is the first macro step consisting of 2 micro steps: Raw Data Extraction & Data Pre-Processing (Ferreira & Goncalves 2022). Raw Data Extraction includes collect- ing and acquiring data from multiple different sensors installed into machine or system. 18 These sensors have collected data from machine or system during its lifetime and oper- ation (Wang B. et. al. 2019). Data Pre-Processing includes processing collected raw data into more representative form that can be further used in later steps of the RUL estima- tion process development (Zuo et. al. 2021). Feature Extraction and Classification is the second macro step consisting of 4 micro steps: Feature Extraction, Feature Classification, Health Indicator Construction and Fault Detection (Ferreira & Goncalves 2022). Feature Extraction includes transforming prepro- cessed, collected raw data into more relevant information to describe the operating sta- tus of the system or machine. When considering time-series data, possible features to extract include time and frequency domain features (Dong & Luo 2013). In the Feature Classification step, the goal is to determine which extracted features are most relevant and the best ones to detect the degradation of the inspected system or machine. (Dong & Luo 2013). The Health Indicator Construction step includes constructing single Health Indicator (HI) from selected extracted features. This fusion of these features is performed with dimensional reduction methods. The main idea of single HI is to represent the health state of the given system or component (Cheng et. al. 2021). Fault Detection step include activities such as determination of fault locations, identification of fault types and estimation of fault degrees (Wang R. et. al. 2021). Model Building and Training is the third macro step consisting of the 2 micro steps: Model Building and Model Training (Ferreira & Goncalves 2022). Model Building step includes formulating the regression problem and deciding algorithm/model which is used to carry out the regression task of predicting/estimating RUL of the given system or machine. Example algorithms used include Random Forest and Deep Neural Network (Liu et. al. 2020). Model Training includes training the RUL prediction model with, for instance, previously constructed HI, other possible features and selected algo- rithm/model according to formulated regression problem (Li X. et. al. 2020). 19 RUL Prediction and Evaluation is the final and fourth macro step consisting of 2 micro steps which are RUL Prediction and Evaluation/Maintenance (Ferreira & Goncalves 2022). RUL Prediction includes predicting/estimating future RUL with trained RUL prediction model and new future input feature value(s) for RUL prediction model (Cheng et. al. 2021). Evaluation of RUL prediction model includes evaluating model with performance metrics including, for instance Mean Absolute Error and Root Mean Square Error bet- ween predicted and actual RUL values (Wang Y. et. al. 2021). 2.3 Advanced analytics methods for RUL estimation This subchapter describes essential data analytics related terminology, presents exam- ples of methods used in RUL estimation solutions in related literature and describes methods used in thesis. Advanced analytics refers to a set of sophisticated quantitative techniques and the use of those techniques for data analysis purposes in order to provide insights that tradi- tional business intelligence approaches are not likely to discover. These sophisticated techniques include for instance predictive (classification and regression), descriptive and statistical techniques. Predictive analytics can be considered as a subset of advanced analytics since not all advanced analytics techniques are predictive (Boobier, 2018, p. 20). Machine Learning is a sub area of Artificial Intelligence, in which computers are pro- grammed to optimize selected performance criterion and parameters of a selected model with the use of historical data and observations. This model can be either predic- tive in case predictions are needed for the future or descriptive when insights or knowledge is required from the data. Machine learning uses theories of statistics and mathematics for models and machine learning algorithms can be divided into 2 different categories: supervised learning and unsupervised learning algorithms. Regression and classification algorithms for instance are supervised learning algorithms where algo- rithms are provided with input and output values, and the main idea is to train the algo- rithm to learn mapping from input to the output. In unsupervised learning only input 20 values are used and the goal is to find regularities and insights from the input data, clus- tering algorithms are examples of unsupervised learning algorithms (Alpaydin, 2020, p. 25-33). Since in RUL estimation process the HI construction and RUL prediction are individual steps, methods used for both steps in related literature should be listed separately. Pei et. al. (2024) propose an approach for bearings, where firstly the sensitive health indica- tor is built with dynamic grey relational analysis (DGRA) using sensor signals collected from the process, in order to detect incipient degradation of bearings, which helps to divide machine conditions into health and other stages. After the initial degradation of bearing is detected, an online-cross domain health indicator is constructed for RUL pre- diction by using Convolutional Long Short-Term Memory neural network using transfer quadratic function. Sun et. al. (2024) have constructed HI in their approach for cutting tool RUL estimation by first extracting time and frequency domain features from sensor data collected from cutting operations. After the feature extraction, the monotonicity is calculated for each extracted feature which results in each feature receiving fitness score, and the feature with best score is selected as HI, which is then post processed with cubic first-order ex- ponential smoothing filter in order to remove noise from HI. Yan et. al. (2022) developed RUL prediction approach for bearings, in which HI is con- structed by extracting Root Mean Square value from vibration sensor signal with sliding window. This is due to the fact that Root Mean Square value positively correlates with the vibration intensity and can reflect efficiently the growing degradation trajectories of the bearings. This HI is then normalized, and it is used to identify point when the degra- dation of bearing starts. This is done with elbow point detection by fitting piecewise lin- ear regression to HI values as a function of time and identifying when the slope value increases drastically, indicating that degradation of bearing has started. 21 Pei et. al. (2024) propose RUL prediction method with double exponential regression model using time as input variable and online-cross domain health indicator as output variable. The goal is to predict when online-cross domain health indicator value exceeds the defined threshold where failure occurs, and this indicates the moment of failure in time also, so the RUL can be calculated for any given time during bearing lifecycle. Sun et. al. (2024) present novel exponential regression model as RUL prediction method, in which after the HI is constructed, the initial exponential regression model is fitted where time as input variable and HI values are output values. After the initial fit, the model parameters are iteratively updated with Bayesian inference mechanism and ex- pectation maximization algorithm. This model is then used to find RUL by finding the time when predicted HI value exceeds failure threshold. Yen et. al. (2022) present RUL prediction approach which utilizes sliding window over constructed HI values where time is input variable and values of constructed HI function as an output variable. This dynamic degradation regression model and its parameters are updated iteratively according to HI observations within sliding window in order to produce the trajectory for HI values for the future. With the predicted HI values and the knowledge of the pre-defined threshold, where the failure occurs when HI values exceed that threshold, the RUL can be calculated by extracting that moment in time when HI value is predicted to exceed the threshold. In this thesis, the HI construction is implemented with Principal Component Analysis (PCA) and RUL prediction with Local Regression. PCA is an unsupervised machine learn- ing method which can be used as data fusion of multiple variables within a data set. PCA focuses on efficiently reducing the dimensions of the original input data into fewer di- mensions called principal components, which objective remains to explain significant proportion of the variance of the original input data set. The objective of these principal components is to standardize the original input data by capturing the correlations and 22 covariances of the variables within the original input data set. PCA produces orthogonal principal components which are uncorrelated between each other (Mafata et. al. 2022). Local regression is a regression method which utilizes sliding window for the observa- tions of the data, and it estimates separate regression equations locally with weighted least squares based on observations that fall within window. Local regression includes setting parameters α (indicating the proportion of observations from complete data used in each local regression, value between 0 and 1) and λ (indicating if linear equations are fitted with value 1 or quadratic equations with value 2) (Jacoby 2000). 23 3 Methodology This chapter discusses the methodology used in this thesis in order to conduct empirical and quantitative research. The thesis includes a quantitative approach, where existing data set is used to develop RUL estimation approach, which is mainly based on frame- work presented in related literature. The developed RUL estimation approach is then tested and validated using simulation. The RUL estimation approach and the simulator are implemented by the thesis author with open-source programming language. The RUL estimation approach developed in this thesis mainly follows the framework of the RUL estimation process presented in subchapter 2.2. Main difference between pre- sented framework in subchapter 2.2 and the research process workflow followed during this thesis is that in this thesis, data extraction macro step is left out since existing data set is used in this thesis and due to that no data collection or data extraction is required from any sensors installed in machinery. Instead of that, the first step in workflow fol- lowed during this thesis was to analyze existing data set and its variables, in order to form hypothesis and formulate the RUL estimation problem for which the proposed RUL estimation approach is developed. The research process workflow consists of 6 different steps presented below: 1. Data exploration and analysis: the first step of the research was to analyze the existing data set and its variables in order to understand the phenomena, decide which variables to use in RUL approach development and to formulate hypothe- sis and RUL estimation problem to solve. 2. Feature extraction: second step is the feature extraction where feature extraction is implemented by extracting time domain features from selected variable from existing data set. 24 3. Feature fusion and Health Indicator construction: After the feature extraction is done, the extracted features are fused using PCA, which produces principal com- ponent which is considered as raw HI. This raw HI is then further processed and constructed into final HI. 4. RUL estimation model development: constructed HI is then used to develop RUL estimation model. This is done with Local regression, where HI is output variable and time is the only input variable. This regression model predicts the total life- time of the filter for any given moment in filter lifetime, making it possible to calculate RUL for any given time. 5. RUL estimation approach testing with simulation: given data set includes data for 50 different filters. Developed RUL estimation approach is tested and simulated through lifetimes of all filters. 6. Results: The results received from the simulation step are then analyzed and the strengths and weaknesses of the developed RUL estimation approach are identi- fied. Besides existing data, the research is implemented with R programming language with RStudio IDE, which are used to develop RUL estimation approach and simulation/testing capabilities and to produce all visualization starting from Figure 3. R programming lan- guage and environment are developed for statistical computing and graphical presenta- tion. R provides many statistical (linear and non-linear modeling, clustering, time-series analysis, classification and classical statistics tests) and graphical techniques (R Core Team 2024). RStudio is Integrated development environment for R and Python. The func- tionalities of RStudio include console, code editor for direct code execution, tools for plotting and debugging, workspace and history management. RStudio is available as commercial and open-source editions and runs on desktop for Mac, Windows and Linus operation systems (Posit Team 2024). 25 3.1 Data used in development of RUL estimation approach This subchapter describes the data set used in this thesis for RUL estimation approach. This consists of general information and explanation of columns within the data. The used data set is generated to represent degradation process which is relevant in PHM applications. The degradation process in this case is clogging, which occurs for filters when solid particles are separated from gas. Data is generated by using a test bench, which performs life testing of filters by loading them with dust fed through the filters. Over time, when dust is fed through the filter, the differential pressure across the filter starts to rise and filter failure occurs when the differential pressure exceeds 600 Pa. Data and filter life testing was produced by Hochschule Esslingen – University of Applied Sciences (Hagmeyer et. al. 2024). Figure 2 below illustrates the test bench set up where the used data was generated. Figure 2. Test bench used for filter life testing (Hagmeyer et. al. 2024). 26 In this thesis, data file ‘Test_Data_CSV.csv’ is used and it contains a total of 39414 rows and 7 columns. The columns are listed below (Hagmeyer et. al. 2024) and further pre- sented in table format in Figure 3 below: - Data_No o Value between 1-50 which are identifiers for tested filters, so there are a total of 50 filters for which the life testing was done. - Differential_pressure o Differential pressure across the filter (Pa) measured during the tests. - Flow_rate o Flow rate through the filter. - Time o Measurement time in seconds (s) from start of the specific filter test. Sampling frequency used in data collection was 10Hz, so new measure- ment is recorded every 0,1 seconds. - Dust_feed o Volume of dust fed per time unit (mm³/s). - Dust o In the filter life tests standardized Arizona test dust (ISO 12103-1) was used with particle sizes of A2, A3 and A4. o Type of dust used to load filter with 3 different options: ▪ ISO 12103-1, A2 Fine Test Dust • Density of dust: 0,900g/cm³ ▪ ISO 12103-1, A3 Medium Test Dust • Density of dust: 1,025g/cm³ ▪ ISO 12103-1, A4 Coarse Test Dust • Density of dust: 1,200g/cm³ - RUL o Remaining useful life for filter before error occurs (differential pressure 600 Pa exceeded) in seconds (s). 27 Figure 3. Data sample. 3.2 Data analysis and exploration Before any RUL approach development is done or planned, the data should be explored and analyzed at certain levels. 28 Figure 4. Differential pressure as a function of time for Test no. 14. In Figure 4 above, it can be seen how differential pressure starts growing non-linearly as time progresses: this is the behavior of the differential pressure for all of these filter tests as a function of time. Also, it is worth noting that measurements for differential pressure only cover approximately 61,03 percent of total filter lifetime (Total lifetime being and 195 seconds and last measurement is from 119 seconds). 29 Figure 5. Differential pressure as a function of time for all tests. Figure 5 above shows how differential pressure develops when time progresses for each 50 filter tests. It can be observed that tests have different profiles for their differential pressure curves: some have faster growth rate for differential pressure than the others. Next matter to analyze is if any other column in data affects these profiles, mainly any of three columns ‘Flow_rate’, ‘Dust_feed’ or ‘Dust’ which were described in subchapter 3.1. From the Figure 5 it can also be seen that measurement data coverage from total filter lifetime differ between tests. Figure 6 below shows histogram which presents how meas- urement data coverage percentages (from total filter lifetime) are distributed. Histogram includes these percentage values for each 50 tests. From histogram it can be seen that percentage values are approximately between 30%-100%. 30 Figure 6. Histogram presenting measurement data coverage percentages. Figures 7 & 8 below show how values in columns ‘Flow_rate’ (Figure 7) and ‘Dust_feed’ (Figure 8) develop as a function of time. It can be immediately seen that that ‘Dust_feed’- values are constraints and ‘Flow_rate’-values behave almost as constraints: values are either near 80 or 60 for each filter test. As mentioned in subchapter 3.1, the column ‘Dust’ can only have 3 different values. In order to analyze if these 3 variables have effect on differential pressure profile, differential pressures of each test are analyzed from per- spective of a new variable, which is combination of these 3 columns mentioned above. 31 Figure 7. Flow rates as a function of time for all tests. Figure 8. Dust feeds as a function of time for all tests. 32 Figure 9 below presents how new variable is combined based on 3 variables ‘Dust_feed’, ‘Dust’ and ‘Flow_rate’. First median of ‘Flow_rate’ is calculated over lifetime for each test separately (column ‘med_fr’). Based on median value, ‘Flow_rate’ group (‘fr_group’) is assigned for each test: if median value is below 70 (this threshold is selected based on Figure 7), ‘Flow_rate’ group will be 1 and otherwise 2. The new grouping variable is then formed by pasting ‘Dust’ (A2, A3 or A4 based on one of the three options), rounded value of ‘Dust_feed’ and ‘Flow_rate’ group. Column ‘group_id’ illustrates the new grouping variable. Figure 9. Combining existing variables into new grouping variable for each test. After the new grouping variable is formed, differential pressure profiles are then ana- lyzed within the newly formed groups. Figure 10 shows how differential pressure profiles of individual tests behave within different groups. For instance, in group ‘A2_2_236’ (‘Dust’: ‘ISO 12103-1, A2 Fine Test Dust’, median value of ‘Flow_rate’ over lifetime: over 70, ‘Dust_feed’: 236) there are 2 different tests, having 2 different differential pressure profiles. Their number of samples differs however, it can be clearly seen that in other test the differential pressure grows more rapidly than in the other. Same phenomena can be observed for instance in groups ‘A2_1_177’ and ‘A3_2_59’. 33 Figure 10. Differential pressures as a function of time by each newly formed group. Based on the findings from analyzing how three variables effect differential pressures, in can be deemed that differential pressure profile and growth rate are affected by some latent variables which are not within given data, in addition to existing variables in data. Furthermore, based on the findings above, the decision was made that RUL estimation approach would be developed only based on differential pressure measurements and its profile. Since the differential pressure growth in non-linear as function on time, the hypothesis is that RUL estimation problem is non-linear regression problem that should be solved, and time should be used as an input variable only. The reason for this is that time is the only input variable which can be used to predict the exact moment in the future when the failure might occur, in other words, when the threshold of 600 Pa is exceeded. In the regression, output variable will consist of values of HI, which is built based on differential pressure. Next subchapter will describe the step by step how this regression problem is solved. 34 3.3 RUL estimation approach development and testing RUL estimation approach developed in this thesis is based on differential pressure meas- ured from 50 different filter lifetime tests. RUL estimation approach consists of 3 differ- ent steps: first, feature extraction was done by using the differential pressure measure- ment data. In the second step, the feature fusion was performed with PCA and results of the feature extraction step, in order to construct HI. In the third step, total filter life- time was then predicted with Local Regression with built HI as target variable and time as input variable, making it possible to estimate RUL for each given time of the lifetime. After the development, RUL estimation was tested by simulating through every 50 tests and predicting total filter lifetime in separate phases of the lifetime. In feature extraction step, first phase is to calculate support features using differential pressure measurements and in second phase the main features are calculated by mainly using those support features. Those main features are the selected features which will be used in the HI construction phase. Feature extraction is implemented by applying equations presented in this subchapter as window functions. This means that the feature value for given specific time is depend- ent on multiple measurement points of differential pressure. The width of these window functions is 20 measurements for each feature and measurements are taken from cur- rent and past measurement points (meaning current measurement and previous 19 measurements) for each moment in time. Equations 1-7 below are presented in article by Wang X. et al. (2015). First, the following support features are extracted. In equations all 1, 2 and 3, n is sample size and xi refers to indexed value within sample: - Root mean square as value of square root from average of squared sample values using equation 35 𝑇𝑟𝑚𝑠 = √ 1 𝑛 ∑ 𝑥𝑖 2 𝑛 𝑖=1 , (1) - Absolute mean value as average of absolute sample values using equation 𝑥 = 1 𝑛 ∑|𝑥𝑖| 𝑛 𝑖=1 , (2) - Absolute max value as maximum of absolute sample values using equation 𝑥𝑚𝑎𝑥 = 𝑚𝑎𝑥|𝑥𝑖| , (3) After extracting 3 support features, the following 5 main features are then extracted. - Standard deviation of sample values using equation 𝑇𝑠𝑑 = √ 1 𝑛 − 1 ∑(𝑥𝑖 − 𝑥̅)2 𝑛 𝑖=𝑖 , (4) where in summation operator the squared values of differences between indi- vidual samples and absolute mean value of sample, presented in equation 2, are summed. n is sample size and xi refers to indexed value within sample. - Shape factor of sample values using equation 𝑇𝑠𝑓 = 𝑇𝑟𝑚𝑠 𝑥̅ , (5) 36 where root mean square of sample (equation 1) is divided by absolute mean value of sample (equation 2). - Impulse factor of sample values using equation 𝑇𝑖𝑓 = 𝑥𝑚𝑎𝑥 𝑥̅ , (6) where absolute max value of sample (equation 3) is divided by absolute mean value of sample (equation 2). - Crest factor of sample values using equation 𝑇𝑐𝑓 = 𝑥𝑚𝑎𝑥 𝑇𝑟𝑚𝑠 , (7) where absolute max value of sample (equation 3) is divided by root mean square of sample (equation 1). - Median of sample by extracting middle value of sample when sample values are arranged by either in descending or ascending order in case when size of sample is odd. In case when size of sample is even, median of sample is received by cal- culating average of two middle values. Figure 11 below presents differential pressure measurements and values of extracted main features for two different tests 4 & 8. It can be observed that there is difference in behavior between main features as there are in profiles of differential pressures be- tween these 2 tests. These 5 main features are then used to build HI for RUL estimation separately for each test. 37 Figure 11. Differential pressure and feature values for Tests no. 4 & 8. After the feature extraction, the second step is to build HI by performing Feature fusion with PCA. As a dimensional reduction method applied to original feature space, PCA provides orthogonal principal components as a linear combinations of original features which are considered here as results of feature fusion. From these principal components, the one explaining most of the variance of the original feature space is selected and is further processed into a HI. As analyzed in subchapter 3.2, all the tests follow similar, non-linear growth for differen- tial pressure as the time progresses, even with different profiles and growth rates. There- fore, the hypothesis is that Feature fusion and HI construction presented here for indi- vidual test is relevant for all other 49 tests within the data set and this hypothesis is tested and validated later in this thesis. The first step is to extract PCA components from feature space consisting of 5 main fea- tures presented in feature extraction section. After the extraction, the results are 38 analyzed by examining how many percentages each principal component explains from total variance and how those components correlate against main features of original feature space. Finally, based on the analysis results, one principal component is then selected and further processed into HI. Figure 12. Percentages of variance for each principal component for Test no. 36. Figure 12 above presents how the total variance, which explains the original feature space, is distributed in percentages between 5 extracted principal components. It can be clearly seen that first (1) principal component explains the majority of the variance; therefore, it will be selected as raw HI. Figure 13 below presents correlation data frame between main features and principal components for test 36. The figure shows that the first principal component (PC1) has higher correlation with median (1.00) and standard deviation (0.74) than with Shape, Crest and Impulse factor. 39 Figure 13. Correlation data frame for Test no. 36. Construction of final HI is done next. The first principal component extracted in previous section can be considered as a raw HI, since principal component extraction does not guarantee that the values of the principal component are on the same scale as original differential pressure measurements. The final HI, which is then used to predict the total lifetime of the filter, is achieved by scaling raw HI (first principal component) having certain minimum and maximum. These minimum and maximum values are received from original differential pressure measure- ments. The scaling is mandatory since the original 600 Pa failure threshold is applied for HI values. Figure 14 below presents differential pressure measurements, HI and raw HI as a func- tion of time. From the figure it can be seen that raw HI starts with a negative value but follows a similar growth pattern as processed HI and differential pressure. HI follows similar pattern as differential pressure, but the curve is smoother and includes less noise. 40 Figure 14. HIs and differential pressure as a function of time for Test no. 36. After the construction of HI is finished the RUL estimation model is developed. The HI will be used as target variable and time as input variable with Local regression in order to predict total filter lifetime. 2-degree Local regression model will be used in this RUL model with α-parameter 0.8, indicating percentage value (80% in this case) from total data points which defines the width of the moving window, when fitting each local regression model with HI as target variable. 41 Figure 15. Example of total lifetime prediction for Test no. 36. Figure 15 above presents an example result for total lifetime predictions with 95 percent prediction intervals. Test no. 36 only covers 47.04 percents of measurement data from total filter lifetime and with that data, 2-degree Local regression model with 0.8 α-pa- rameter predicts total filter lifetime to be 68.3 seconds. True total lifetime for filter is 64.2 seconds, which means that prediction error in percentages is 6.4 percentages. This prediction error is received with the following equation, presented by Wu et al. (1995) 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 𝑒𝑟𝑟𝑜𝑟 = 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 − 𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 × 100 , (8) 42 where predicted value is predicted total filter lifetime and measured value true filter lifetime. In Figure 15, there is also a regression line for prediction with Local regression using α- parameter 0.6 instead. This regression model is included in order to demonstrate how the prediction differs when α-parameter is changed. In this case when using smaller value as α (0.8 > 0.6) it results in narrower moving window for local regression models, making the regression line more flexible than it is with a greater α-parameter. In this case the smaller α-parameter results in a worse prediction result with 121.5 seconds, which denotes prediction error of 89.3 percentages. The presented RUL estimation approach is tested with every 50 filter lifetime tests by simulating and making prediction for total filter lifetime at every 10th measurement point. The idea of the simulation is to investigate how the prediction changes during the filter lifetime and if the prediction improves over time. The reason why simulation is performed at every 10th measurement point is due to the fact that at every 10th point, all the steps presented in subchapter 3.3 prior, are per- formed. If these steps are performed at every measurement point for all 50 tests, the simulation could take a significant amount of time. In addition, for each test the first prediction is performed in 10 seconds (meaning after there are 100 measurement points present), this is due to the hypothesis that less than 100 measurement points cannot provide decent predictions. Figure 16 below shows how the results develop for each iteration (100 measurement points, 110 measurement points, 120 measurement points, …. ,302 measurement points) for Test No. 36. There are a total of 22 predictions simulated through the measurement points and figure illustrates clearly how predictions develop through the simulation. Ver- tical black line in every subplot presents the moment when differential pressure thresh- old is exceeded (64.2 seconds for Test No. 36). 43 Figure 16. Simulation results for each iteration for Test no. 36. 44 4 Results This chapter presents the results from the testing of the developed RUL estimation ap- proach and provides analysis for those results in order to identify strengths and weak- nesses of the proposed approach. Since all of the 50 filter lifetime tests present in the data have differing measurement data coverage from total lifetime (%) as presented in Figure 6, it should be stated that less measurement data from total lifetime could result as worse prediction accuracy. Other matter to mention is that filter lifetimes in this data, which is generated in test bench to represent degradation process, are rather short and hundreds of seconds at maximum. Therefore, it is not reasonable to analyze how well RUL is estimated before filter failure from a time unit perspective. Instead, it will be analyzed from perspective of how much filters have covered from their total lifetime in percentages. In this chapter, one of the key evaluation metrics used is Mean Absolute Percentage Error (MAPE), which is defined with following equation (Kim & Kim 2016) 𝑀𝐴𝑃𝐸 = 1 𝑁 ∑ | 𝐴𝑡 − 𝐹𝑡 𝐴𝑡 | 𝑁 𝑡=1 , (9) Where N is sample size, At refers to actual indexed value within sample and Ft refers to forecasted indexed value within sample. 4.1 Example results From testing and simulation, the following example results are achieved. 45 Figure 17 Final prediction of total lifetime for Test no. 49. Simulation of developed RUL estimation provides final prediction of total lifetime using the whole available measurement data for each filter test. Figure 17 above presents pre- dicted HI values (‘Prediction_loess_08’) for Test no. 49 after the Local regression model has used all available data (‘Health_indicator’) for model fitting. In this case, when filter has approximately 12 percent left of its total lifetime, regression line will exceed 600 Pa failure threshold when input variable time reaches 282,9 seconds. Actual total lifetime is 282 seconds, meaning that predicted total lifetime exceeds actual value with 0,9 sec- onds, which means that percentage prediction error, calculated according to equation 8, is 0,32%. Furthermore, when 88 percent of total filter lifetime has passed, also the actual and predicted RULs can be calculated. Since the last measurement data point is from 248,2 seconds and total lifetime is 282 seconds and predicted total lifetime is 282,9 sec- onds, actual RUL is 33,8 seconds and predicted RUL 34,7 seconds at the moment when filter has 12% left of its total lifetime. 46 Also, as already presented in subchapter 3.3 and Figure 16, total lifetime prediction is received for multiple time moments for each filter during their lifetime. Figure 18 below presents the predicted regression line for each iteration done during simulation for Test no. 27. Figure indicates that for first iterations (100-250, excluding exceptions as 170 and 180) the regression line does not start to grow non-linearly as expected. This results in no total lifetime prediction is received for those iterations, since regression line never exceeds the failure threshold. The reason for this phenomenon, is that measurement data has not started increasing non-linearly yet, which reflects into built HI, making it challenging for Local regression to detect non-linear growth and to predict when failure threshold is exceeded. In later iterations, it can be seen that when more measurement data is used in regression fitting, the non-linear growth is detected and total lifetime prediction received. Figure 18 Simulation results for each iteration for Test no. 27. 47 MAPE, which is defined in equation 9, is calculated for each filter test separately in order to compare how tests perform with presented RUL estimation approach. MAPE is calcu- lated using actual total lifetime values (constant value for each filter) and predictions from each iteration, where the prediction is received, in other words: where the regres- sion line exceeds failure threshold. Figure 19 Actual and predicted total lifetimes for Test no. 49. Figure 19 above presents how total lifetime prediction develops during the simulation for Test no. 49. The figure presents how predicted total lifetime prediction evolves as a function of elapsed lifetime (%) indicating how much total lifetime of filter has passed and how much measurement data is used from total lifetime for regression model fitting. Figure clearly indicates that prediction error is larger in the beginning of lifecycle, since there is not enough data for regression model to provide accurate prediction. In later 48 iterations, the prediction error decreases significantly when more measurement data is available. Also, it is worth noting that MAPE for this test is approximately 33%. Since MAPE is based on mean values of absolute percentage errors between actual and pre- dicted values, it can be deemed that in this case the prediction errors received from pre- dictions calculated in early phases of lifetime contribute significantly to the relatively high MAPE value. When predicted and actual total lifetime values are known for various time moments during the filter lifecycle, RUL can be also calculated for these time moments. Figure 20 below shows how actual and predicted RUL values evolve as function of elapsed lifetime (%). Figure 20 Actual and predicted RUL for Test no. 49. 49 4.2 Analysis of all tests Figure 21 Calculated MAPE and data coverage for all tests. Figure 21 above shows how calculated MAPEs and measurement data coverage from total lifetime are dependent on each other. It can be seen that there is slight dependency: the more the test has measurement data coverage for its total lifetime, the better (lower) MAPE it has. As analyzed in the previous subchapter based on Figure 19, the MAPE value is affected by worse predictions for single test (49) occurring in early stages of its lifecycle. In Figure 22 below MAPEs are calculated only for tests which have measurement data coverage at least 60% from their total lifetime. Additionally, for MAPE calculations, predicted and actual values are only included starting from the moment when at least 60% of total 50 lifetime has passed, meaning that for instance, those worse predictions which affected the MAPE of test no. 49, are left out. Figure 22 Calculated MAPE and data coverage for tests with data coverage >= 60%. When comparing the Figures 21 & 22 it can be seen that MAPE values decrease for those tests which have data coverage from total lifetime at least 60%, when predictions from early stages of lifecycle (only predictions included when at least 60% of total lifetime has passed) are left out. This validates the phenomenon seen in Figure 19, that the devel- oped RUL estimation approach provides worse predictions in earlier stages in filter lifecy- cle than in later stages during the lifetime. This phenomenon is further validated in Fig- ure 23 below, where each test’s prediction error of final total lifetime prediction is plot- ted as a function of measurement data coverage from total lifetime. From the plot it can be seen clearly that the larger measurement data coverage from total lifetime indicates smaller prediction error for last prediction. 51 Figure 23 Prediction error of last total lifetime prediction for all tests. Figure 24 below presents how predicted RUL develops compared to actual RUL over filter lifetime for tests having at least 60% data coverage from their total lifetime. This figure again proves that the developed RUL estimation approach produces accurate predictions after initial stages of lifetime for the majority of the tests having measurement data cov- erage at least 60%. 52 Figure 24 Actual and predicted RUL for tests with data coverage >= 60%. 4.3 Analysis of individual tests Since the analysis of the results has already shown that the developed RUL estimation approach does not predict the total lifetime accurately in the cases when filter is in its early stages of lifetime, this subchapter will focus on individual tests where the meas- urement data coverage is at least 60%. Bad and good tests are selected based on Figures 22 & 24. Test no. 43. is selected as a bad test since its MAPE is approximately 30,65% according to Figure 22 and according to Figure 24 its predicted RUL gets worse when compared to actual RUL when filter lifetime passes. The reason why RUL approach does not work properly with this test might be caused by the behavior of measurement data, since de- spite the non-linear behavior at the start of the lifetime, the data seems to be quite linear afterwards, causing poor prediction. Later observations for this test, which are not 53 present in the data, could show faster non-linear growth of the differential pressure which causes the failure earlier than regression model now predicts. Test no. 43 pre- sented in Figure 25 below. Figure 25 Final prediction of total lifetime for Test no. 43. Test no. 35. is selected as a good test since its MAPE is approximately 1,47% according to Figure 22 and according to Figure 24 its predicted RUL gets better when compared to actual RUL when filter lifetime passes. The reason why the RUL approach works properly with this test might be caused by the fact that it follows the non-linear growth trend which was hypothesis for the differential pressure. Test no. 35 presented in Figure 26 below. 54 Figure 26 Final prediction of total lifetime for Test no. 35. 55 5 Conclusions and discussion This chapter consists of conclusions based on previous sections. Firstly, conclusion are drawn for how research question and objective were met. Secondly, limitations of the research are discussed and thirdly recommendations for future research are given. 5.1 Research question and objective The research question of this thesis was ‘How data measured from equipment/compo- nent could be utilized to identify optimal and propitious time windows for maintenance for the given equipment/component?’ and the objective was to develop and demon- strate RUL estimation approach with the data collected from test bench simulation rep- resenting degradation process which is relevant in PHM applications and the demon- strated application aimed to be scalable for all processes following similar degradation from data perspective. The RUL estimation approach was developed by following RUL estimation process framework, presented in related literature. Also, the related litera- ture was studied additionally in order to see how different RUL estimation process steps were implemented in various researches. As the results show in chapter 4, the RUL can be predicted accurately for filters with low prediction errors in the majority of the cases, when enough filter lifetime has passed. The earlier stages of the lifetime are difficult and challenging from RUL prediction point of view, since there might not be enough measurement data for the prediction algorithm to detect the non-linear growth of measurement data which is crucial for accurate pre- dictions. Also, the data used contained filter tests which had insufficient measurement data coverage from their total lifetime, making it challenging for developed RUL ap- proach to predict total lifetime accurately. The developed RUL estimation approach proved that since RUL can be predicted accu- rately when enough lifetime has passed, using measurement data, the RUL estimation 56 can be used to support planning of optimal maintenance activities for the given equip- ment/component. 5.2 Research limitations The RUL estimation approach developed and presented in this thesis is limited to ad- dressing RUL estimation of the components, which follow a similar degradation process as the filters which data was used in this thesis. Also, the approach is only applicable for equipment which failure depends on the one component only, which in this case was filter which RUL was estimated. This thesis focused on estimating RUL of the given component, which produces valuable information to support the planning of optimal maintenance activities. However, the ac- tual maintenance planning and decision making related to maintenance windows based on RUL predictions was not included in this thesis. And finally, this developed RUL esti- mation approach is limited only to components which operate in constant conditions during their complete lifetime. Regarding validity and reliability of the thesis, the data set used in this thesis was gener- ated in a test bench with sensors and proper equipment for controlling operating condi- tions. The data was generated to represent degradation process which is relevant in PHM applications. Furthermore, data includes multiple filter lifetime tests. These facts indi- cate that used data was reliable for this given research. However, the data coverage from total lifetimes differ for lifetime tests, causing some unreliability in the research. Selected model, Local regression, performed well from evaluation point of view when using met- rics such as MAPE and prediction error, indicating that model performance was accurate and reliable in this research, when enough filter lifetime had passed. Furthermore, the regression problem solved in this thesis was regression where only one input and one output variables were present, meaning that model performance and behavior was easy to understand and justify. 57 5.3 Recommendations for future research As mentioned previously, the used data set contained data for 50 filters which have dif- fering measurement data coverage from their total lifetime. For the future works, it would be more consistent to use data where all the tests have the measurement data from the whole total lifetime, in order to support development and testing further. Related to the proposed RUL estimation approach, there are many steps that could be improved: the feature extraction step now includes only extraction of 5 main features and additional features should be extracted in order to produce more information. Ad- ditionally, the different HI techniques should be considered and tested in order to con- struct better and more informative HI. Also, different regression models for actual total lifetime prediction should be considered. Finally, the main issue of the presented RUL estimation approach is that it produces predictions with low accuracy in the early stages of component lifetime. This is a challenging issue and not an easy task to solve, but still necessary and crucial for future researches. 58 References Achouch, M., Dimitrova, M., Ziane, K., Sattarpanah Karganroudi, S., Dhouib, R., Ibrahim, H. & Adda, M. (2022). On Predictive Maintenance in Industry 4.0: Overview, Mod- els and Challenges. Applied sciences, 12(6), Article 8061. https://doi.org/10.3390/app12168081 Alpaydin, E. (2020). Introduction to Machine Learning (4th edition). MIT Press. ISBN: 978-0-262-35806-4. Boobier, T. (2018). Advanced Analytics and AI: Impact, Implementation and the Future Work (1st edition). John Wiley & Sons, Incorporated. ISBN: 978-1-119-39030-5. Chehade, A., Bonk, S. & Liu, K. (2017). Sensory-Based Failure Threshold Estimation for Remaining Useful Life Prediction. IEEE Transactions on Reliability, 66(3), p. 939- 949. https://doi.org/10.1109/TR.2017.2695119 Cheng, Y., Hu, K., Wu, J., Zhu, H. & Shao, X. (2021). A convolutional neural network based degradation indicator construction and health prognosis using bidirectional long short-term memory network for rolling bearings. Advanced Engineering Infor- matics, 48, Article 101247. https://doi.org/10.1016/j.aei.2021.101247 Dalzochio, J., Kunst, R., Pignaton, E., Binotto, A., Sanyal, S., Favilla, J. & Barbosa, J. (2020). Machine learning and reasoning for predictive maintenance in Industry 4.0: Current status and challenges. Computers in industry, 123, Article 103298. https://doi.org/10.1016/j.compind.2020.103298 Das, O., Bagci Das, D. & Birant, D. (2023). Machine learning for fault analysis in rotating machinery: A comprehensive review. Heliyon, 9(6), Article 17584. https://doi.org/10.1016/j.heliyon.2023.e17584 Dong, S. & Luo, T. (2013). Bearing degradation process prediction based on the PCA and optimized LS-SVM model. Measurement, 46(9), p. 3143-3152. https://doi.org/10.1016/j.measurement.2013.06.038 Ferreira, C. & Goncalves, G. (2022). Remaining Useful Life prediction and challenges: A literature review on the use of Machine Learning Methods. Journal of manufac- turing systems, 63, p. 550-562. https://doi.org/10.1016/j.jmsy.2022.05.010 https://doi.org/10.3390/app12168081 https://doi.org/10.1109/TR.2017.2695119 https://doi.org/10.1016/j.aei.2021.101247 https://doi.org/10.1016/j.compind.2020.103298 https://doi.org/10.1016/j.heliyon.2023.e17584 https://doi.org/10.1016/j.measurement.2013.06.038 https://doi.org/10.1016/j.jmsy.2022.05.010 59 Hagmeyer, S., Mauthe, F. & Zeiler, P. (2021). Creation of Publicly Available Data Sets for Prognostics and Diagnostics Addressing Data Scenarios Relevant to Industrial Ap- plications. International Journal of Prognostics and Health Management, 12(2). https://doi.org/10.36001/IJPHM.2021.v12i2.3087 Hagmeyer, S., Mauthe, F. & Zeiler, P. (2024). Preventive to Predictive Maintenance (8), [Data set], Kaggle. https://doi.org/10.34740/KAGGLE/DSV/8684322 Huang, C., Bu, S., Lee, H.H., Chan, C.H., Kong, S.W. & Yung, W.K.C. (2024). Prognostics and health management for predictive maintenance: A review. Journal of manufac- turing systems, 75, p. 78-101. https://doi.org/10.1016/j.jmsy.2024.05.021 Jacoby, W.G. (2000). Loess:: a nonparametric, graphical tool for depicting relationship between variables. Electoral Studies, 19(4), p. 577-613. https://doi.org/10.1016/S0261-3794(99)00028-1 Jin, X., Weiss, B.A., Siegel, D. & Lee, J. (2016). Present Status and Future Growth of Ad- vanced Maintenance Technology and Strategy in US Manufacturing. International journal of prognostics and health management, 7(3), Article 12. https://doi.org/10.36001/ijphm.2016.v7i3.2409 Li, X., Jia, X., Wang, Y., Yang, S., Zhao, H. & Lee, J. (2020). Industrial Remaining Useful Life Prediction by Partial Observation Using Deep Learning With Supervised Atten- tion. IEEE/ASME Transactions on Mechatronics, 25(5), p. 2241-2251. https://doi.org/10.1109/TMECH.2020.2992331 Li, Y., Liu, K., Foley, A.M., Zulke, A., Berecibar, M., Nanini-Maury, E., Van Mierlo, J. & Hoster, H.E. (2019). Data-driven estimation and lifetime prediction of lithium-ion batter- ies: A review. Renewable and Sustainable Energy Reviews, 113, Article 109254. https://doi.org/10.1016/j.rser.2019.109254 Liu, C., Zhang, L., Niu, J., Yao, R. & Wu, C. (2020). Intelligent prognostics of machining tools based on adaptive variational mode decomposition and deep learning method with attention mechanism. Neurocomputing, 417, p. 239-254. https://doi.org/10.1016/j.neucom.2020.06.116 https://doi.org/10.36001/IJPHM.2021.v12i2.3087 https://doi.org/10.34740/KAGGLE/DSV/8684322 https://doi.org/10.1016/j.jmsy.2024.05.021 https://doi.org/10.1016/S0261-3794(99)00028-1 https://doi.org/10.36001/ijphm.2016.v7i3.2409 https://doi.org/10.1109/TMECH.2020.2992331 https://doi.org/10.1016/j.rser.2019.109254 https://doi.org/10.1016/j.neucom.2020.06.116 60 Keleko, A.T., Kamsu-Foguem, B., Ngouna, R.H. & Tongne, A. (2022). Artificial intelligence and real-time predictive maintenance in industry 4.0: A bibliometric analysis. AI Ethics, 2(8), p. 553-577. https://doi.org/10.1007/s43681-021-00132-6 Kim, S. & Kim, H. (2016). A new metric of absolute percentage error for intermittent de- mand forecasts. International Journal of Forecasting, 32(3), p. 669-679. https://doi.org/10.1016/j.ijforecast.2015.12.003 Mafata, M., Brand, J., Kidd, M., Medvedovici, A. & Buica, A. (2022). Exploration of Data Fusion Strategies Using Principal Component Analysis and Multiple Factor Analy- sis. Beverages (Basel), 8(4), Article 66. https://doi.org/10.3390/bever- ages8040066 Maktoubian, J., Taskhiri, M S. & Turner, P. (2021). Intelligent Predictive Maintenance (IPdM) in Forestry: A Review of Challenges and Opportunities. Forests, 12(11), Article 1495. https://doi.org/10.3390/f12111495 Nunes, P., Santos, J. & Rocha, E. (2023). Challenges in predictive maintenance – A re- view. CIRP journal of manufacturing science and technology, 40, p. 53-67. https://doi.org/10.1016/j.cirpj.2022.11.004 Paleyes, A., Urma, R.G. & Lawrence, N.D. (2022). Challenges in Deploying Machine Learn- ing: A Survey of Case Studies. ACM Computing Surveys, 55(6), p. 1-29. https://doi.org/10.1145/3533378 Patel, M., Vasa, J. & Patel, B. (2023). Predictive Maintenance: A Comprehensive Analysis and Future Outlook. 2023 2nd International Conference on Futuristic Technolo- gies. https://doi.org/10.1109/INCOFT60753.2023.10425122 Pei, X., Gao, L. & Li, X. (2024). Remaining useful life prediction of machinery based on performance evaluation and online cross-domain health indicator under un- known working conditions. Journal of Manufacturing Systems, 75, p. 213-227. https://doi.org/10.1016/j.jmsy.2024.06.005 Posit Team. (2024). RStudio: Integrated Development Environment for R. Posit Software, PBC. Retrieved April 19, 2025, from https://posit.co/products/open-source/rstu- dio/ https://doi.org/10.1007/s43681-021-00132-6 https://doi.org/10.1016/j.ijforecast.2015.12.003 https://doi.org/10.3390/beverages8040066 https://doi.org/10.3390/beverages8040066 https://doi.org/10.3390/f12111495 https://doi.org/10.1016/j.cirpj.2022.11.004 https://doi.org/10.1145/3533378 https://doi.org/10.1109/INCOFT60753.2023.10425122 https://doi.org/10.1016/j.jmsy.2024.06.005 https://posit.co/products/open-source/rstudio/ https://posit.co/products/open-source/rstudio/ 61 R Core Team. (2024). R: A Language and Environment for Statistical Computing. R Foun- dation for Statistical Computing. Retrieved April 19, 2025, from https://www.r- project.org/about.html Stetco, A., Dinmohammadi, F., Zhao, X., Robu, V., Flynn, D., Barnes, M., Keane, J. & Ne- nadic, G. (2019). Machine learning methods for wind turbine condition monitor- ing: A review. Renewable Energy, 133, p. 630-635. https://doi.org/10.1016/j.renene.2018.10.047 Sun, M., Guo, K., Zhang, D., Yang, B., Sun, J., Li, D. & Huang, T. (2024). A novel exponential model for tool remaining useful life prediction. Journal of Manufacturing Systems, 73, p. 223-240. https://doi.org/10.1016/j.jmsy.2024.01.009 Wang, B., Lei, Y., Li, N. & Yan, T. (2019). Deep separable convolutional network for re- maining useful life prediction of machinery. Mechanical Systems and Signal Pro- cessing, 134, Article 106330. https://doi.org/10.1016/j.ymssp.2019.106330 Wang, Q., Zheng, S., Farahat, A., Serita, S. & Gupta, C. (2019). Remaining Useful Life Es- timation Using Functional Data Analysis. 2019 IEEE International Conference of Prognostics and Health Management (ICPHM). https://doi.org/10.1109/ICPHM.2019.8819420 Wang, R., Chen, H. & Guan, C. (2021). A Bayesian inference-based approach for perfor- mance prognostics towards uncertainty quantification and its applications on the marine diesel engine. ISA Transactions, 118, p. 159-173. https://doi.org/10.1016/j.isatra.2021.02.024 Wang, X., Zheng, Y., Zhao, Z. & Wang, J. (2015). Bearing Fault Diagnosis Based on Statis- tical Locally Linear Embedding. Sensors, 15(7), p. 16225-16247. https://doi.org/10.3390/s150716225 Wang, Y., Deng, L., Zheng, L. & Gao, R.X. (2021). Temporal convolutional network with soft thresholding and attention mechanism for machinery prognostics. Journal of Manufacturing Sytems, 60, p. 512-526. https://doi.org/10.1016/j.jmsy.2021.07.008 Wellsandt, S., Klein, K., Hribernik, K., Lewandowski, M., Bousdekis, A., Mentzas, G. & Thoben, K.D. (2022). Hybrid-augmented intelligence in predictive maintenance https://www.r-project.org/about.html https://www.r-project.org/about.html https://doi.org/10.1016/j.renene.2018.10.047 https://doi.org/10.1016/j.jmsy.2024.01.009 https://doi.org/10.1016/j.ymssp.2019.106330 https://doi.org/10.1109/ICPHM.2019.8819420 https://doi.org/10.1016/j.isatra.2021.02.024 https://doi.org/10.3390/s150716225 https://doi.org/10.1016/j.jmsy.2021.07.008 62 with digital intelligent assistants. Annual reviews in control, 53, p. 382-390. https://doi.org/10.1016/j.arcontrol.2022.04.001 Wen, Y., Fashiar Raman, Md., Xu, H. & Tseng, T-L.B. (2022). Recent advances and trends of predictive maintenance from data-driven machine prognostics perspec- tive. Measurement: journal of the International Measurement Confederation, 187, Article 110276. https://doi.org/10.1016/j.measurement.2021.110276 Wu, G., Baraldo, M. & Furlanut, M. (1995). Calculating percentage prediction error: A user’s note. Pharmacological Research, 32(4), p. 241-248. https://doi.org/10.1016/S1043-6618(05)80029-5 Yen, M., Xie, L., Muhammad, I., Yang, X. & Liu, Y. (2022). An effective method for remain- ing useful life estimation of bearings with elbow point detection and adaptive regression models. ISA Transactions, 128, p. 290-300. https://doi.org/10.1016/j.isatra.2021.10.031 Zuo, L., Zhang, L., Zhang, Z-H., Luo, X-L. & Liu, Y. (2021). A spiking neural network-based approach to bearing fault diagnosis. Journal of Manufacturing Systems, 61, p. 712-724. https://doi.org/10.1016/j.jmsy.2020.07.003 https://doi.org/10.1016/j.arcontrol.2022.04.001 https://doi.org/10.1016/j.measurement.2021.110276 https://doi.org/10.1016/S1043-6618(05)80029-5 https://doi.org/10.1016/j.isatra.2021.10.031 https://doi.org/10.1016/j.jmsy.2020.07.003 1 Introduction 1.1 Background of the study 1.2 Research question, objectives and limitations 1.3 Structure of the thesis 2 Literature Review 2.1 Benefits, challenges and applications of predictive maintenance 2.2 Framework for RUL estimation process 2.3 Advanced analytics methods for RUL estimation 3 Methodology 3.1 Data used in development of RUL estimation approach 3.2 Data analysis and exploration 3.3 RUL estimation approach development and testing 4 Results 4.1 Example results 4.2 Analysis of all tests 4.3 Analysis of individual tests 5 Conclusions and discussion 5.1 Research question and objective 5.2 Research limitations 5.3 Recommendations for future research References