Machine learning NLP-based recommendation system on production issues
Bi, Xiaotian (2023-05-19)
Lataukset:
Bi, Xiaotian
19.05.2023
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi-fe2023051945568
https://urn.fi/URN:NBN:fi-fe2023051945568
Tiivistelmä
The techniques related to Natural Language Processing (NLP) as information extraction are increasingly popular in media, E-commerce, and online games. However, the application with such techniques is yet to be established for production quality control in the manufacturing industry.
The goal of this research is to build a recommendation system based on production issue descriptions in a textual format. The data was extracted from a manufacturing control system where it has been collected in Finnish on a relatively good scale for years. Five different NLP methods (TF-IDF, Word2Vec, spaCy, Sentence Transformers and SBERT) are used for modelling, converting hu-man digital written texts into numerical feature vectors. The most relevant issue cases could be retrieved by calculating the cosine distance between the query sentence vector and corpus embed matrix which represents the whole dataset. Turku NLP-based Sentence Transformer achieves the best result with Mean Average Precision @10 equal to 0.67, inferring that the initial dataset is large enough using deep learning algorithms competing with machine learning methods. Even though a categorical variable were chosen as a target variable to compute evaluation metrics, this research is not a classification problem with single variable for model training. Additionally, the metric selected for performance evaluation measures for every issue case. Therefore, it is not necessary to balance and split the dataset.
This research work achieves a relatively good result with less data available compared to the size of data used for other businesses. The recommendation system can be optimized by feeding more data and implementing online testing. It also has the possibility to transform into collaborative filtering to find patterns of users instead of simply focusing on items, in the condition of comprehensive user information included.
The goal of this research is to build a recommendation system based on production issue descriptions in a textual format. The data was extracted from a manufacturing control system where it has been collected in Finnish on a relatively good scale for years. Five different NLP methods (TF-IDF, Word2Vec, spaCy, Sentence Transformers and SBERT) are used for modelling, converting hu-man digital written texts into numerical feature vectors. The most relevant issue cases could be retrieved by calculating the cosine distance between the query sentence vector and corpus embed matrix which represents the whole dataset. Turku NLP-based Sentence Transformer achieves the best result with Mean Average Precision @10 equal to 0.67, inferring that the initial dataset is large enough using deep learning algorithms competing with machine learning methods. Even though a categorical variable were chosen as a target variable to compute evaluation metrics, this research is not a classification problem with single variable for model training. Additionally, the metric selected for performance evaluation measures for every issue case. Therefore, it is not necessary to balance and split the dataset.
This research work achieves a relatively good result with less data available compared to the size of data used for other businesses. The recommendation system can be optimized by feeding more data and implementing online testing. It also has the possibility to transform into collaborative filtering to find patterns of users instead of simply focusing on items, in the condition of comprehensive user information included.