Machine learning NLP-based recommendation system on production issues

annif.suggestionsmachine learning|NLP|recommender systems|deep learning|data systems|artificial intelligence|quality control|data mining|neural networks (information technology)|information retrieval|enen
annif.suggestions.linkshttp://www.yso.fi/onto/yso/p21846|http://www.yso.fi/onto/yso/p14701|http://www.yso.fi/onto/yso/p28483|http://www.yso.fi/onto/yso/p39324|http://www.yso.fi/onto/yso/p3927|http://www.yso.fi/onto/yso/p2616|http://www.yso.fi/onto/yso/p2720|http://www.yso.fi/onto/yso/p5520|http://www.yso.fi/onto/yso/p7292|http://www.yso.fi/onto/yso/p2964en
dc.contributor.authorBi, Xiaotian
dc.contributor.facultyfi=Tekniikan ja innovaatiojohtamisen yksikkö|en=School of Technology and Innovations|-
dc.contributor.organizationfi=Vaasan yliopisto|en=University of Vaasa|
dc.date.accessioned2023-06-06T12:15:45Z
dc.date.accessioned2025-06-25T16:48:46Z
dc.date.available2023-06-06T12:15:45Z
dc.date.issued2023-05-19
dc.description.abstractThe techniques related to Natural Language Processing (NLP) as information extraction are increasingly popular in media, E-commerce, and online games. However, the application with such techniques is yet to be established for production quality control in the manufacturing industry. The goal of this research is to build a recommendation system based on production issue descriptions in a textual format. The data was extracted from a manufacturing control system where it has been collected in Finnish on a relatively good scale for years. Five different NLP methods (TF-IDF, Word2Vec, spaCy, Sentence Transformers and SBERT) are used for modelling, converting hu-man digital written texts into numerical feature vectors. The most relevant issue cases could be retrieved by calculating the cosine distance between the query sentence vector and corpus embed matrix which represents the whole dataset. Turku NLP-based Sentence Transformer achieves the best result with Mean Average Precision @10 equal to 0.67, inferring that the initial dataset is large enough using deep learning algorithms competing with machine learning methods. Even though a categorical variable were chosen as a target variable to compute evaluation metrics, this research is not a classification problem with single variable for model training. Additionally, the metric selected for performance evaluation measures for every issue case. Therefore, it is not necessary to balance and split the dataset. This research work achieves a relatively good result with less data available compared to the size of data used for other businesses. The recommendation system can be optimized by feeding more data and implementing online testing. It also has the possibility to transform into collaborative filtering to find patterns of users instead of simply focusing on items, in the condition of comprehensive user information included.-
dc.format.bitstreamtrue
dc.format.contentfi=kokoteksti|en=fulltext|-
dc.format.extent77-
dc.identifier.olddbid18564
dc.identifier.oldhandle10024/15926
dc.identifier.urihttps://osuva.uwasa.fi/handle/11111/10201
dc.identifier.urnURN:NBN:fi-fe2023051945568-
dc.language.isoeng-
dc.rightsCC BY-ND 4.0-
dc.source.identifierhttps://osuva.uwasa.fi/handle/10024/15926
dc.subject.degreeprogrammeMaster's Programme in Industrial Systems Analytics-
dc.subject.disciplinefi=Informaatiotekniikka|en=Information Technology-
dc.subject.ysomachine learning-
dc.subject.ysoNLP-
dc.subject.ysorecommender systems-
dc.subject.ysodeep learning-
dc.subject.ysoartificial intelligence-
dc.subject.ysoquality control-
dc.subject.ysodata mining-
dc.subject.ysoneural networks (information technology)-
dc.subject.ysoinformation retrieval-
dc.titleMachine learning NLP-based recommendation system on production issues-
dc.type.ontasotfi=Pro gradu -tutkielma|en=Master's thesis|sv=Pro gradu -avhandling|-

Tiedostot

Näytetään 1 - 1 / 1
Ladataan...
Name:
Thesis_Xiaotian_Bi.pdf
Size:
2.4 MB
Format:
Adobe Portable Document Format
Description:
Machine learning NLP-based recommendation system on production issues