UNIVERSITY OF VAASA FACULTY OF TECHNOLOGY COMMUNICATIONS AND SYSTEMS ENGINEERING Alabi Rasheed Omobolaji PREDICTION OF RECURRENCE AND MORTALITY OF ORAL TONGUE CANCER USING ARTIFICIAL NEURAL NETWORK (A case study of 5 hospitals in Finland and 1 hospital from Sao Paulo, Brazil) Master’s thesis for the degree of Master of Science in Technology submitted for inspection, In Vaasa, 12.08.2017 Thesis Instructor Dr Alhadi Almangush Thesis Supervisor Professor Mohammed Elmusrati 2 ACKNOWLEDGMENT In the name Allaah the Most beneficent, the Most Merciful. All thanks to Allaah and I beseech His peace and benedictions on the noblest of mankind, Muhammad, peace and blessings of Allaah be upon him and the generality of Muslims till the day of accountability. First and foremost, I would like to express my profound appreciation to my supervisor, Professor Mohammed Elmusrati, for his guidance throughout the development of this work and also throughout my Master’s program. You are indeed a role model. I have gained both academic knowledge and knowledge towards a unique approach to life and situations from you. It is a rare opportunity to work with you and I really appreciate the opportunity. Without an iota of doubt, you have left a positive mark in my life and I will always remain grateful. I am very proud to be your student. Similarly, I sincerely appreciate the efforts, contributions and continuous monitoring of the progress of the work from my instructor, Dr Alhadi Almangush. It was indeed a good learning curve to have worked with you. The professionalism and maturity shown during the course of this work was well appreciated. Most importantly, the opportunity given to me to work the dataset of the cancer patients. I thank you for the guidance throughout this work. I will forever be grateful to you. Furthermore, my deepest appreciation goes to my beloved mother for her unconditional love and unending support right from my primary school days. My mother is indeed the best teacher that I ever know. Talking to her alone gives me the joy and happiness needed to continue with my day to day activities. The whole of my Master program (MSc) and this thesis is specially and lovely dedicated to my wonderful wife, Ummu Mu'adh - Atunrase Mistura. She encouraged me to pursue this MSc program. Her emotional support was vital while acknowledging the long weekly separation and discomfort we both endured to accomplish this fit. It is not easy travelling from Helsinki-Vaasa-Helsinki on a weekly basis . She believed in me more than I have believed in myself and that gave me the courage to persevere whenever I hit roadblocks. I love you so much. 3 To my lovely son, Mu'adh Adebayo, May Allaah bless you. Thinking about you alone is enough for me to be happy. I love you so much. It was not easy to leave you in Helsinki on a weekly basis while I was busy in Vaasa with my studies. I sincerely appreciate the understanding. I pray to Allaah to make you a scholar of high repute. I am thankful to Professor Timo Mantere, Tobias Glocker, Ahmed Elgrgouri, Dr Ali Altowati, Shaima AbdulMageed for their immense contribution in deepening my knowledge through the various courses taught in this masters degree program. This acknowledgement will not be complete without showing appreciation to my brother and friend, AbdulRahman Olaobaju for his understanding and numerous tutorials to make sure I meet up with my academics. To all the members of academic and non-academic staff of the Faculty of Technology, Communications and Systems Engineering Group especially Marjukka Isaksen, I say a big thank you for their contribution to the success of this program. My classmates must be acknowledged for their support and positive contribution during the course work. Vaasa, Finland, August, 2017, Alabi Rasheed Omobolaji 4 CONTENTS ACKNOWLEDGEMENT 2 CONTENTS 4 SYMBOLS 8 ABBREVIATIONS 10 LIST OF FIGURES 12 LIST OF TABLES 18 ABSTRACT 19 1 INTRODUCTION 20 1.1 Dataset 22 1.2 Motivation 22 1.3 Thesis Structure 23 2 FUNDAMENTAL OF NEURAL NETWORK 24 2.1 Artificial Neural Network 25 2.2 Classification of ANN 30 2.3 Training of ANN 33 2.4 Training Algorithm 35 5 2.5 Advantages and Disadvantages of ANN 38 3 APPLICATION OF ANN IN MEDICINE 39 3.1 Artificial Neural Networks in Medicine 39 3.2 Types of Neural Network used in the thesis 41 3.2.1 Feedforward Neural Network (feedforwardnet) 41 3.2.2 Elman Neural Network (ENN) (elmannet) 42 3.2.3 Timedelay Neural Network (timedelaynet) 46 3.3 Layer Recurrent Neural Network (LRNN) 48 3.3.1 Fully Recursive Neural Network 50 3.3.2 Hopfield Neural Network 51 3.3.3 Recursive Neural Network 53 3.4 Non Autoregressive Neural Network (NARNET) 54 3.5 Non Autoregressive Neural Network with External (NARXNET) 57 4 SIMULATION OF FIXED AND DYNAMIC DATASETS 59 4.1 Simulation exercise with Feedforward Neural Network 60 4.2 Simulation using Elman Neural Network 66 4.3 Timedelay Neural Network Exercise 72 4.4 Layer Recurrent Neural Network 76 6 4.5 Nonlinear Autoregressive Neural Network 83 4.6 Nonlinear Autoregressive Neural Network with External (NARXNET) 83 5 ANN CASE STUDY SIMULATION OF TONGUE CANCER 87 5.1 Definition of SCC related terms 86 5.1.1 Tumour budding 88 5.1.2 Tumour Size, Prognosis and Metastasis 88 5.1.3 Depth of invasion 88 5.1.4 Symptoms 88 5.1.5 Pathological Stage 89 5.2 ANN in clinical Prognostication 89 5.3 Data collection and training process 90 5.4 Neural Network for predictions 92 5.4.1 Prediction of recurrence from feedforward network 97 5.4.2 Prediction of statuslatest for feedforward network 100 5.5 Prediction of statuslatest for feedforward 103 5.5.1 Prediction of mortality from feedforward network 107 5.6 Elman neural network for the prediction of recurrence and mortality 111 5.6.1 ENN for recurrence prediction 111 7 5.7 Layer Recurrent Neural Network for the prediction 114 5.7.1 Prediction of Recurrence of Tongue cancer using Layer Recurrent Neural Network for the prediction of recurrence 119 5.7.2 Prediction of Mortality of Tongue cancer using Layer Recurrent Neural Network for the prediction of recurrence 121 5.8 Prediction of mortality using LRNN 126 5.9 Analyses of the dependency of variables 128 5.9.1 Analyses of variables dependencies for recurrence 130 5.9.2 Verification of the newly proposed dependent markers 140 5.9.3 Variables dependencies on the prediction of mortality 144 5.9.4 Using the new markers to predict mortality 150 5.10 Sigmoid function on the output layer 153 5.10.1 Prediction of Recurrence based on sigmoidal function output 154 5.10.2 Prediction of Mortality based on sigmoidal function output 158 6 CONCLUSION AND FUTURE WORK 160 REFERENCES 164 APPENDIXES 172 8 SYMBOLS ɛ (t) Error of approximation of the series Ѳi Threshold of the unit i. a1......an Inputs bj Thresholds D i 1....D i N Delays f(x) Sigmoid function f'(x) Sigmoid function for back propagation h() Previous Values I 1 (t) .....I M (t) Inputs N1....NM Number of Delays n Number of data sample net Neural Network Oj Output Sj State of Unit J T1 Tumour Size one T2 Tumour Size two trainFcn Training Function u Weights v Weights Wnj Weights x(k) Outputs of the hidden layer xc(k) Outputs of context layer ý Approximated data obtained by the network for value yi 9 yi i-th Data Sample yj(t) Output of hidden layer yk(t) Final Output y(t) Data series for prediction y(t-1)..y(t-p) Past Values / Feedback Delays 10 ABBREVIATIONS AI Artificial Intelligence. ANN Artificial Neural Network. BAM Bidirectional Associated Memory (BAM) BPTS Back Propagation Through Structure BPTT Back Propagation Through Time CAFs Cancer Associated Fibroblasts cTNM Cancer Tumour Size, Lymph Nodes, Metastasis. DNA Deoxyribo Nucleic Acid dividerand This function divides the dataset automatically ENN Elman Neural Network. Elmannet Elman Neural Network function feedforwardnet Feedforward Neural Network function FS Feature/Input Selection GA Genetic Algorithm HNN Hopfield Neural Network ICT Information Communication Technology layrecnet Layer Recurrent Neural Network function LHR Lymphocytic Host Response LMBP Levenberg-Marquardt Propagation Procedure LRNN Layer Recurrent Neural Network LVQ Learning Vector Quantization RL Reinforcement Learning RNN Recurrent Neural Network 11 NAR Nonlinear Auto Regressive NN Neural Network. NARNET Nonlinear Autoregressive Neural Network function NARXNET Nonlinear Autoregressive Neural Network with external input function MATLAB MathWorks Simulation Tool R2014b. MLP Multi-Layer Perceptron MSE Mean Square Error PNI Peri Neural Invasion (PNI) SCC Squamous Cell Carcinoma. SGD Stochastic Gradient Descent SMA Smooth Muscle Acting SSE Sum of Squares Error SVM Support Vector Machine. tr.trainInd This function divides the dataset into training set tr.valInd This function divides the dataset into validation set tr.testind This function divides the dataset into testing set trainFcn Training Function timedelaynet Time Delay Network TDN Time Delay Neuron TDNN Time Delay Neural Network VALVIRA National Supervisory Authority for Welfare and Health. WPOI Worst Pattern Of Invasion 12 LIST OF FIGURES Page Figure 1. Artificial neurons and its components (Hassan et al 2016). 24 Figure 2. Feed-forward network structure (Tahmasebi et al 2011). 27 Figure 3. Elman simple recurrent neural network (Elman 1990). 28 Figure 4. Single layer feed-forward network. 30 Figure 5. The structure of multi-layer perceptron (Hassan et al 2016). 32 Figure 6. Neural Network training Structure. 34 Figure 7. Single node anatomy of ANN (Hassan et al 2016). 36 Figure 8. Block Diagram of the Elman Neural Network (Kannathal 2006). 42 Figure 9. Structural representations of Elman (Yin and Chen 2016). 43 Figure 10. Single time delay neuron (TDN) with inputs and delays at time (Hongying et al 2016). 46 Figure 11. Artificial of TDNN neural network (Hongying et al 2016.) 47 Figure 12. Layer recurrent network architecture (MATLAB 2017). 50 Figure 13. A four nodes Hopfield Neural Network. 51 Figure 14. An Architecture of recursive neural network (Hammer et al 2004). 53 Figure 15. Nonlinear autoregressive network (NARNET) (Luiz Gonzaga et al 2016) 55 Figure 16. The architecture of nonlinear regressive network with external inputs (NARXNET) (Luiz Gonzaga et al 2016). 45 13 Figure 17. Feedforwardnet MATLAB code window. 60 Figure 18. Neural Network training output. 60 Figure 19. Performance error plot for feedforwardnet. 61 Figure 20. Target outputs and the neural outputs. 61 Figure 21. Error histogram of the targets and the neural outputs . 62 Figure 22. Training state of the network. 63 Figure 23. Regression plot of the network training. 64 Figure 24. Plot of Target and Neural Outputs of feedforward network. 65 Figure 25. Elman neural network command window. 66 Figure 26. Training window for Elman neural network. 67 Figure 27. Target and Neural outputs of the Elman neural network. 68 Figure 28. Training performance of Elman neural network. 69 Figure 29. Regression pot of Elman neural network training. 70 Figure 30. Elman neural network plot of target and neural outputs. 71 Figure 31. Command window for time delay neural network. 72 Figure 32. Training window of timedelay network. 73 Figure 33. Target and neural outputs of timedelay neural network. 74 Figure 34. Regression plot of timedelaynet of the relationship between the target and the neural outputs. 74 Figure 35. Variation in target and neural output plot. 75 Figure 36. Command window for layer recurrent neural network. 76 Figure 37. Target and neural outputs of layer recurrent neural network. 76 14 Figure 38. Learning window of layer recurrent neural network. 77 Figure 39. Target and neural outputs of layer recurrent neural network. 78 Figure 40. Command window for nonlinear autoregressive neural network 79 Figure 41. Training window of nonlinear autoregressive training 80 Figure 42. Narnet training results showing both target and neural outputs 80 Figure 43. Narnet regression plot of the learning process. 81 Figure 44. A graph of target and neural values after training with Narnet. 82 Figure 45. NARNET MATLAB command window 83 Figure 46. Training window for narxnet 84 Figure 47. Narxnet target and neural outputs. 84 Figure 48. Regression plot for narxnet. 85 Figure 49. Narxnet target and neural outputs 87 Figure 50. Data collection and training process 91 Figure 51. The design of the network with desired inputs and outputs. 94 Figure 52. Schematic diagram of the output 95 Figure 53. The command window showing codes for feedforward network. 97 Figure 54. The expected and trained output after training with neural network. 98 Figure 55. A plot of the extent of variation between target and neural output. 99 Figure 56. Training summary for feedforward neural network of the real data. 99 Figure 57. Regression plot from feedforward neural network of the real dataset. 100 15 Figure 58. Prediction of recurrence as an output of a given new inputs. 101 Figure 59. Prediction of recurrence based on newly formed inputs. 102 Figure 60. The training summary of status latest as output. 103 Figure 61. The command window code for statuslatest as output. 104 Figure 62. Expected and neural output where mortality was the output variable. 104 Figure 63. The representation of the target and neural output where statuslatest was the output 105 Figure 64. The feedforward network performance when mortality was the output. 106 Figure 65. Regression plot for mortality as output using feedforward network. 106 Figure 66. Controlled prediction using one of the known input rows. 107 Figure 67. Uncontrolled predictions of arbitrary inputs. 108 Figure 68. Output from resilient backpropagation training algorithm. 109 Figure 69. The performance measurement of 20 inputs with a changed training algorithm. 110 Figure 70. The training summary of ENN on the real data. 111 Figure 71. The default training algorithm for ENN. 112 Figure 72. Performance measure of ENN on the real data. 113 Figure 73. Expected and neural output of ENN on the real data. 113 Figure 74. Layer recurrent neural network for prediction of recurrence and mortality. 114 Figure 75. Training algorithm for later recurrent neural network. 115 Figure 76. The command window showing the performance of LRNN. 116 16 Figure 77. The expected and neural output for LRNN in recurrence prediction. 116 Figure 78. The expected and neural output for prediction of recurrence in layer recurrent neural network. 117 Figure 79. The regression plot of the training phase of the layer recurrent network for recurrence prediction. 118 Figure 80. Prediction of recurrence using layer recurrent network. 119 Figure 81. Arbitrary input prediction for recurrence using layer recurrent network. 120 Figure 82. The performance output for the prediction of mortality. 121 Figure 83. Output of the performance when the algorithm was changed. 122 Figure 84. Layer recurrent neural network with increased number of neurons for mortality prediction. 123 Figure 85. Training summary of layer recurrent with increased hidden neurons. 124 Figure 86. Mean Square Error performance of layer recurrent neural network. 124 Figure 87. Expected and trained outputs after increased hidden neurons for layer recurrent. 125 Figure 88. Error histogram plot of the difference between the expected and the trained value. 126 Figure 89. The prediction of mortality using layer recurrent network. 127 Figure 90. Prediction of mortality using randomly predicted inputs. 128 Figure 91. The network performance when all the inputs are used. 130 Figure 92. Training network for the new markers. 139 Figure 93. The performance of the neural network with the new inputs. 141 17 Figure 94. Expected and neural output using the new markers. 141 Figure 95. The controlled prediction using the new markers. 142 Figure 96. Randomly generated input for the prediction of recurrence. 143 Figure 97. Input and output summary for the dependencies analysis. 144 Figure 98. The performance of the network for prediction of mortality. 145 Figure 99. The training performance of the network with the new markers for mortality. 150 Figure 100. Controlled prediction of mortality using the new markers. 151 Figure 101. Random prediction of inputs for mortality. 152 Figure 102. Neural output with activation function. 155 Figure 103. Prediction of recurrence from sigmoidal neural output 156 Figure 104. Arbitrary input prediction on sigmoidal function layer for recurrence. 157 Figure 105. Sigmoidal output for mortality. 159 18 LIST OF TABLES Page Table 1. Training Algorithms for ANN. 35 Table 2. Feedforwardnet parameters. 41 Table 3. Syntax parameters for Elman Neural Network. 44 Table 4. Layer recurrent parameters. 49 Table 5. Parameters for NARNET . 54 Table 6. NARXNET parameters. 57 Table 7. Explanation of inputs and output variables 92 Table 8. Important markers for the prediction of recurrence of tongue cancer 138 Table 9. Order of significance of identified markers by ANN for recurrence prediction 139 Table 10. Important markers for the prediction of mortality 148 Table 11. Order of significance of identified markers by ANN for mortality prediction 149 Table 12. Combined markers found to be important for ANN 152 Table 13. Sigmoidal function analysis on the neural output 154 Table 14. Sigmoidal neural output for mortality prediction 158 19 UNIVERSITY OF VAASA Faculty of technology Author: Alabi Rasheed Omobolaji Topic of the Thesis: Prediction of Recurrence and Mortality of Oral Tongue- Caner using Artificial Neural Network Supervisor: Professor Mohammed Salem Elmusrati Instructor: PhD Alhadi Almangush Degree: Master of Science in Technology Major of Subject: Communication and Systems Engineering Year of Entering the University: 2015 Year of Completing the Thesis: 2017 Pages: 180 ABSTRACT Cancer is a dreadful disease that had caused the death of millions of people. It is characterized by an uncontrollable growth of cell to form lumps or masses of tissue that are known as tumour. Therefore, it is a concern to all and sundry as these tumours mostly release hormones which have negative impact on the body system. Data mining approaches, statistical methods and machine learning algorithms have been proposed for effective cancer data classification. Artificial Neural Networks (ANN) have been used in this thesis for the prediction of recurrence and mortality of oral tongue cancer in patients. Similarly, ANN was also used to examine the diagnostic and prognostic factors. This was aimed at determining which of these diagnostic and prognostics factors had influence on the prediction of recurrence and mortality of oral tongue cancer in patients. Three different ANN have been applied for the learning and testing phases. The aim was to find the most effective technique. They are Elman, Feedforward, and Layer Recurrent neural networks techniques. Elman neural network was not able to make acceptable prediction of the recurrence or the mortality of tongue cancer based on the data. In contrast, Feedforward neural network captured the relationship between the prognostic factors and correctly predicted recurrence. However, it failed to predict the mortality based on the patient's data. Layer Recurrence neural network has been very effective and successfully predicted the recurrence and the mortality of oral tongue cancer in patients. The constructed layered recurrence neural network has been used to investigate the correlation between the prognostic factors. It was found that out of 11 prognostic factors in the data sheet, it was only 5 of them that had considerable impact on the recurrence and mortality. These are grade, depth, budding, modified stage, and gender. Time in months and disease free months were also used to train the network. KEYWORDS: Artificial Neural Network, Feedforward, Elman, Layer Recurrent, Recurrence, Mortality, Prediction, Prognostic factors, Cancer, Oral Tongue 20 1 INTRODUCTION Cancer is a distressing, alarming and dreadful disease. Several death cases have been recorded as a result of this atrocious disease. The figure of cancer-related death makes cancer a concern to the medical practitioner. Cancer is one of the main causes of death in many developed nations. Similarly, in developing countries, the impact of cancer, as well as diabetes as major players in the death rate, cannot be over emphasized (Shikha & Jitendra 2015) . It’s very important for treatment of cancer to classify tumor accurately. As with other diseases, proper identification, classification, and prediction are some factors to achieving efficient and effective treatment and management for cancer patients. Therefore, proper identification of cancer cells is ultimately an important step. In developed countries with advanced and up to date medical facilities, ineffective cancer classification methodology has been a major cause of death due to cancer-related condition because cancer classification was medically based on clinical and histopathological facts. More often than none, this classification approach often produces incomplete or misleading results. Thus, the need to consider other options in the classification of cancer in patients becomes imperative. DNA microarray, molecular level diagnostics to mention but a few offered a way of cancer classification (Oliver et al 2009:157; Wang et al 2005). Although, both DNA microarray and molecular level provide accurate prediction and diagnosis of cancer. Furthermore, gene expression data generally comprise of a huge number of genes has been a major concern for the stakeholders in the medical discipline. Hence, a better approach to efficient and effective classification and prediction through other disciplines and field of study becomes imperative. Data mining approaches, statistical methods and machine learning algorithms to effectually evaluate these data have been proposed (Sung-Bae Cho & Hong-Hee 2007). As a result of this, support vector machine (SVM), k-nearest neighbor and neural network techniques to mention but a few have attracted more attention in recent years. Specifically, the use of Artificial Neural Networks (ANN) in medical research is on the increase in recent years. It has been extensively employed in medical areas such as urology, radiology, medical microbiology, medical biochemistry to mention but a few. The use of ANN in other medical areas as well as in the analysis of patient’s data is at a geometric rate. 21 Neural network model has been successfully implemented in various classification problems. Neural network techniques are very useful for detection, prediction, and monitoring of cancer. For instance, it was recently used in the clustering and classification of gene expression data. There are two models used. These are the supervised and unsupervised models. Classification can be achieved with the supervised models while unsupervised models are used for clustering. Classification is crucially important for cancer diagnosis and treatment. The artificial neural network had been proven to be a very effective method for pattern recognition. This made them very useful for diagnosis of cancer disease at very early stages. Oral tongue cancer is the most common and the most aggressive epithelial cancer diagnosed within the oral cavity. The incidence of oral tongue cancer has increased tremendously and thus it has attracted attention of the clinicians and researchers. Therefore, the focus of this thesis is to examine the use of ANN in oral tongue cancer recurrence and mortality prediction. In addition, the thesis will examine ANN as a means to determine which of the prognostic factors are needed in the prediction of recurrence and mortality. Though, classification, as used in medicals includes detection, prediction, and treatment based on certain parameters. For the purpose of this thesis work, much attention will be given to prediction based on the learning outcome of MATLAB ANN using the dataset provided, that is, the case study. In addition, fixed and dynamic dataset generated by the supervisor based on certain algorithm would be tested to see the effectiveness of the code prior to testing with the real data, that is, the case study data. Furthermore, this thesis would also attempt to examine if modified_stage is a good prognostic factor to be considered in the prediction of recurrence and mortality. These can be achieved by removing certain columns from the given dataset during the training in MATLAB ANN to see if such column has any effect on the expected outcome. Conclusively, the thesis is poised to be instrumental as an approach for clinicians to achieving an efficient way to the prediction of the patient’s situation. The description of the dataset can be found in Section 1.1 named as the dataset. The motivation gives further explanation to the thesis and it is worthy of note that the thesis will focus on tongue cancer patients. 22 1.1 The Dataset This research entails the use of data from patients. Therefore, the ethical and privacy concern is put into serious consideration. The use of patient data were by the National Supervisory Authority for Welfare and Health (VALVIRA) (Almangush et al 2013). Despite this approval, it is worthy of note that the identification number used in this data set for each patient has been coined and developed by the author of this research work. This measure is to enhance and protect the privacy further as this research work will be accessed publicly (online or printed means) through the University of Vaasa. Hence, the identification given here has no relationship with the identification number as contained in the original data. The diagnostic histological slides of 340 patients with T1/T2 N0M0 oral tongue Squamous Cell Carcinoma (SCC) managed between 1979 and 2009 from the University Hospitals of Helsinki, Oulu, Turku, Tampere, and Kuopio were collected from the hospitals’ archives. Similarly, the data of patients from one hospital in San Paulo in Brazil were also included. The criteria for inclusion of cases were as described in (Brandwein-Gensler et al 2010). 1.2 Motivation Undoubtedly, early detection of any disease or cancer to be precise gives a good insight into the disease and ultimately, the management practicalities of such disease. In cancer detection and prediction, machine learning such as Artificial Neural Network is one of the methods being investigated. ANNs offered a unique and efficient approach to cancer prognosis due to ANN's ability to learn and generalize from data. The dataset as contained in Appendix III was used for analysis in this thesis. The data was supplied by the Dentistry Department of the Helsinki Teaching Hospital. The data has been named as Appendix III. Thus, this thesis is aimed to examine and identify the optimal cutoff point of tumor depth (column F) for risk stratification in T1 and T2 stages separately; and for both stages together. Furthermore, the thesis is poised to determine if the tumor depth is used to modify cTNM staging system (column K) with the help of ANN? In addition, is the modified T-stage (column G) is better in prognostication? Finally, this research work will aim to identify the optimal cutoff point of tumor budding (B) for risk stratification in T1 and T2 stages separately; and for both stages together. For future study, it would be ideal to test and examine the interpretation of ANNs for the other prognostic factor (e.g. WPOI, Grade.... etc) as contained in the data set. 23 1.3 Thesis Structure The thesis is organized in six chapters, Chapter one deals with the introduction of the subject areas of the research and the research questions. The literature review on the previous work on Artificial Neural Network for cancer prediction and diagnosis is examined. An overview of the Jeff Elman Neural Network is presented. Elman Neural Network (ENN) has been chosen to be the neural network in the prognostication of mobile cancer because of his ready- made function for the medical application. The recurrent nature also made it a good choice for this research work. Although there are better NN than Elman such as layer recurrent that produces better performance than Elman. Therefore both Elman and layer recurrent will be mainly examined. All these would be contained in Chapter 2. The third (3) chapter presents the recent application ANN in medicines. Chapter 4 examines simulations of the Feedforward and Elman neural network approach using some arbitrary data. The data was supplied by my thesis supervisor and generated based on a certain algorithm. The aim of this chapter is to understand the difference between static and dynamic variables. The case study would be examined in Chapter 5. Comprehensive analysis of the data set using the ANN so as to answer the research questions presented in Chapter 1. Chapter six (6) involves discussion on the results obtained. The research questions presented in Chapter 1 forms the foundation of this research question. In addition, the future research question presented in chapter 1 will be equally examined. The main contribution of this thesis is in Chapter 6 where conclusions, recommendations, and possibility for future study are made based on the results presented in Chapter 5. The Appendix I-III contains few samples of the dataset and Appendix IV contains some of the useful codes for the ANN. 24 2 FUNDAMENTALS OF ARTIFICIAL NEURAL NETWORKS 2.1 Artificial Neural Network An Artificial Neural Network (ANN) is a statically oriented modeling tool. It is a similitude of the biological nervous system. The basic processing element in the ANN is known as the neurons. The neuron is not the same as the neuron in the human body but in terms of functionality, it works in the same manner. Hence, the name artificial neurons. It has a normal range of output between (-1, +1). It could also be (0, 1) (Ying et al 1998, Ferari & Stengel 2005). The neuron can be viewed as a processor that computes the sum of weighted inputs and then applies a non-linear transfer function to the computed sum. The example of transfer function could be Tang-sigmoid. Figure 1 shows an example of artificial neuron and its components. Figure 1. Artificial neuron and its components (Hassan et al 2016). From the structure of the components of artificial neurons depicted in Figure 1, it can be said that an artificial neuron consists of inputs and weights, a transfer function and an activation function (Heimes & B.van Heuveln 2005, Jayadeva et al 2002, Setino & A.Gavada 2000). These neurons are interconnected to each other for the purpose of working in unison to address a particular problem. In most fields of study, it is becoming imperative to detect trends and extract the patterns in some scenarios. Doing these actions with the traditional method, that is, through human and statistical and computer techniques are becoming increasingly difficult. 25 Therefore, ANN has come to provide a unique approach to solving the problems. This unique characteristic of the ANN has it to be widely used in so many applications nowadays- engineering, medicine, statistics to mention but a few. Neural network operates in a similar way as an adaptive system. By that, it means that it changes its structure during the learning phase. ANN has been touted to be effective in modeling simple and complex relationships. With regards to its application in data science, it can be used to find out patterns and clusters in data (Spelt et al 2013). Today, ANN represents a major extension to computation. It provides better results and performance than the traditional statistical tools for the prediction and classification purposes in various applications (Paliwal M. & Kumar U.A 2009). ANNs offers short computation times, low computational burden and the opportunity of reformulating the problem thereby considering only on the important variables and parameters from the given data set or certain unknown areas of interest. Different types of neural networks are designed and developed for various applications; however, the solution offered is yet to reach 100% accuracy but the contribution cannot be over emphasized. Artificial Neural Networks (ANN) based expert system in tongue cancer study has been attracting much attention in the recent years just as the use of ANN had gained popularity amongst researchers in automatic breast cancer diagnosis in the past few years (Ubeyli 2007, Karabatak & Ince 2009, Furundzic et al 1998). Most of the current approach to the cancer diagnosis and treatment had been based mostly on the years of experience of the medical officers. ANN is however poised to approximate complex and non-linear problems without having to know the mathematical representations of the system or learn from the wealth of experience of the medical officer. This exemplary feature has made ANN to attract attention in the study of cancer, especially in cancer case prediction. Clinical sizes (T1 or T2) of early oral tongue cancer had failed to differentiate between patients with the possibly favourable condition and patient with the adverse outcome when both are given treatments respectively (Kellermann et al 2007:849- 853). 26 In addition, the early stage detection (T1/T2N0M0) of tongue cancer does not always represent a vibrant and viable prediction of oral cancer as 20-40% already have spread to other areas (metastasis) at the presentation stage (Ganly et al 2012, Ho et al 1992). Therefore, the need to have an effective and efficient ways of prediction becomes imperative. The prediction at the early stage of oral tongue cancer is necessary because it gives the opportunity to identify subsets of patients that have the probability of unfavourable condition from the mobile tongue cancer. Thus, such patients will need more aggressive treatment. Modality therapy is a good approach for such category of patients. Conversely, the prediction result would also provide the opportunity to know the patient's subsets that have the chances of a favourable outcome. In such case, surgical treatment should suffice for this latter case (Kellermann et al 2007:849-853). Hence, to differentiate between the two cases provided above, ANN has been touted to be a reasonable approach. Therefore, the idea is to supply the neural network with sufficient training data, and subsequently find relationships between these data without requiring user intervention. For example, in this thesis research, the status of the patients can be inferred from the results obtained as presented in Chapter 5. However, designing an ANN is a complex task. Various design aspects and parameters such as choosing an optimal network topology, suitable learning algorithm, the initial value of the weights, learning rate to mention but a few needs to be optimized properly to enhance the efficient performance of the ANN. It is imperative to mention that network topology includes the number of hidden layers and nodes. Also, some researches and books also considered input/feature selection (FS) as part of the ANN design (Walczak & Cerpa 1999). There is evidence that the FS has significantly improved ANN performance (Setiono & Liu 1997). Similarly, the design parameters and input feature subsets of ANN needs to be optimized. This is because the duo are problems that have an influence on each other. ANN offers some unique features. The learning process as an important feature of ANN has given flexibility to ANN in terms of its application. For example, ANN could be used for data classification and pattern categorization through a learning process. Neurons are arranged to form layers and connection pattern. Based on these arrangements, different network configurations and structures can be formed. 27 Based on these, ANN can be divided into feed-forward network and recurrent network. Feed- forward was considered to be the first and simplest artificial neural network. By feed-forward network, it means that: (i) the neurons in the network can be ordered without having a backward connection, that is, independent of time (ii) the output does not depend on the past (not cyclic) (iii) movement is in the forward direction (input through hidden nodes to the output nodes). Figure 2. Feed-forward network structure (Tahmasebi et al 2011). The feed-forward network is characterized by neither cycles nor loops in the network. It has been used extensively in classification, pattern recognition, and prediction. Conversely, the recurrent network contains loops and it has been used for processing tasks and control signals (Turkson et al 2016). An example of recurrent network is the Elman network. Figure 3 shows the diagrammatic representation of the concept of Elman neural network. 28 As shown in Figure 3, Elman is made up of three-layers network. These layers could be arranged to form cluster of layers each containing these three-layers. They are represented as A, B and C in Figure 3 respectively. The B layer is referred to as the hidden layer. The input are fed into the system in the forward direction through the first layer. That is why Elman neural network is also a variant of feedforward neural network. Apart from these three-layers, there are also the context units represented as K in Figure 3. Figure 3. The concept of Elman neural network . The hidden layer basically serves two functions. It connects to the context layer with a weight of one. Additionally, a copy of the previous value of the hidden units is stored in the context 29 units through back connections. This memory characteristic allows the network to remember the previously hidden layer states. When the inputs are feed-forwarded into the system, an appropriate training algorithm would be applied and thus, the learning process could be completed. Where the value (B* W1) is between the first and second layer and (C * W1) is the value fed into the third layer respectively (Elman 1990). Due to the complicated design issue and coupled with the requirement for the FS, there is an increasing suggestion of hybridizing the ANN design with the evolutionary algorithm. For instance, design methods such as trial-and-error, cannot simultaneously handle many design parameters and FS as well. Hence, the need for hybridization with evolutionary algorithm becomes imperative. Genetic Algorithm (GA) has global search features that makes it to be particularly preferred. GA is capable of generating both optimal feature subset and support vector machine (SVM) parameter without degrading the accuracy of the machine (Rui et al 2005; Huang & Wang 2006). Hence, there have been several efforts that combine GA with ANN. For instance, GA was used to search for the architecture, learning algorithm and nodes’ activation function (Ferentinos 2005), while the search for learning algorithm together with their parameters, hidden layer information, transfer function, weights and biases values was also an option (Almeida & Ludermir 2010). Additionally, a hybrid technique that combined fuzzy clustering, statistical tool and granular computing have been proposed (Yuchun et al 2008). A comprehensive review on combining the evolutionary algorithm with the ANN can be found in (Yao 1999). Promising results were obtained from combining evolutionary algorithm with ANN. Therefore, this thesis is aimed to further explore the ANN for mobile (oral) cancer prognosis. It employs the given dataset for FS, initial weight and hidden node size optimization of the most common ANN architecture. The dataset (Appendices I-III) were divided into training, validation and testing datasets respectively for the simulation exercises presented in Chapter 4 and Chapter 5 of this thesis write-up. 30 2.2 Classification of ANN ANN can broadly be classified as either feed-forward and recurrent ANN respectively. An example of the feed-forward network includes: I. Single layer feed-forward network. II. Multilayer feed- forward network. III. A Single node with its own feedback. Similarly, an example of recurrent layer includes: I. Multilayer recurrent network A single layer feed-forward structure is a simple perceptron. The schematic diagram is as shown in Figure Figure 4. Single layer feed-forward network. 31 A Single-layer feed-forward network as the name implies has one input layer, one output layer, and no feedback connections. Inputs are applied to the network and with the aid of series of weights, and subsequently to the outputs. Inputs are multiplied by the weights in each node and compare with a threshold as shown in Figure 7 below. The leverage for comparison is that if the value obtained from the product of inputs and corresponding weights are above some threshold (typically 0) the neuron fires and takes the activated value (typically 1); otherwise it takes the deactivated value (typically -1). Neurons that exhibit such kind of behavior are called artificial neurons or linear threshold units. In the literature, the term perceptron often refers to networks consisting of just one of these units. A similar neuron was described by Warren McCulloch and Walter Pitts in the 1940s. A perceptron can be created using any values for the activated and deactivated states as long as the threshold value lies between the two. Most perceptrons have outputs of 1 or -1 with a threshold of 0. Multi-layer neural network on the other is characterized by the fact that it can calculate continuous output instead of a step function. Sigmoid function or logistic function provides a common choice for the multi-layer neural network. The Sigmoid function is given by: Furthermore, the fact that sigmoid function has a continuous derivative has made it a preferred way in ANN as it can be used extensively in back-propagation. Interestingly, the derivative of the function can be easily calculated as depicted in the equation above. Multilayer recurrent neural network or simply recurrent neural network (RNN) is also examples of ANN. Multi-layer perception (MLP) is made up of two or more layers of neurons that are connected sequentially. The connection between neurons in the different layer is by weighted signal pathways. https://en.wikipedia.org/wiki/Artificial_neurons https://en.wikipedia.org/wiki/Perceptron https://en.wikipedia.org/wiki/Warren_McCulloch https://en.wikipedia.org/wiki/Walter_Pitts 32 Signals are sent through these pathways to the other neurons. The Input layer is the first layer of a network. It receives signals from the data entering the network. The last layer, called the output layer, generates the outcome to the outside world. Figure 5. The structure of multi-layer perceptron. Connections between various units of the network are in a fashioned and directed cycle. This fashioned and directed style gives it a dynamic temporal behaviour. Also, their internal memory can also be used to process arbitrary sequences of inputs. Thereby making RNN to be tremendously used in so many applications. Similarly, it uses back-propagation algorithm as well. However, it is worthy of note that back propagation is mainly used for networks that have activation functions that are differentiable. In addition, there are some issues that are associated with back-propagation. These include speed of convergence, over fitting and the possibility of ending up in a local minimum of the error function. 33 2.3 Training of ANN Training or learning as it is otherwise called. ANN learning processes have been divided into three namely: I. Supervised learning II. Unsupervised learning III. Reinforcement learning When ANN is trained in the presence of an instructor, teacher or someone that is more knowledgeable about how ANN works, then such training process is termed as supervised learning process of ANN (Hu et al 1994). This type of training minimizes the possibility of error in the training process. This is because it is assumed that the pre determined target outputs values are known for each input pattern. Back propagation, time delay, multiple adaptive linear neurons to mention but a few are all examples of supervised neural networks. Similarly, unsupervised training is characterized by the fact that it eschews the output knowledge from the instructor (Hu et al 1992;1994). In this case, the network finds the relationship between the inputs and the output by itself. Kohonen Self-Organizing Feature Maps and Learning Vector Quantization (LVQ) are examples of unsupervised training. Reinforcement learning (RL) is an area of machine learning inspired by behaviourist psychology. RL is beyond the scope of this thesis. By definition, training describes the procedure by which the parameters are tuned or adjusted in such way that makes the neural network to adapt itself to a stimulus. This parameter tuning consequently produced the desired output. The desired output is mostly compared with the expected output to see how effective the network had learned during the training period. https://en.wikipedia.org/wiki/Machine_learning https://en.wikipedia.org/wiki/Behaviorism https://en.wikipedia.org/wiki/Behaviorism 34 In general, neural network uses some internal calculations to computes output values from input values as shown in Figure 6 (Delgrange et al 1998). The values of the weights are generally adjusted between the inputs, expected output and the target until the network produces meaningful results with the targets. Such output is known as the trained output or neural output. Thus, for any given set of inputs, the trained network can predict the correct outputs. Traditionally, not all the dataset are used for training to enhance the effectiveness of the output and most importantly, minimize errors between the expected output and the target output. Figure 6. Neural Network training structure (Delgrange et al 1998) Therefore, datasets are generally divided into training, validation and testing datasets respectively. Training dataset is essentially used to compute the gradient. Also, it is used to for updating the network weights and biases. The validation datasets are used for validation purpose. The training phase equally monitors the errors during the validation process. However, the testing datasets are not used at all in the training process. The testing dataset are used mainly for comparison purposes. The testing dataset provides important measures on how well the trained network has learned in the training phase. http://www.sciencedirect.com/science/article/pii/S1110016812000518 35 2.4 Training Algorithm Since the thesis would examine both feed-forward neural network and the recurrent neural network, it is pertinent to examine some of the training algorithms used. Some of the training algorithms are given in Table 1. Table 1. Training Algorithms of ANN. The training algorithms are numerous and each neural network types has a default training algorithm that is appropriate for it. A few of the training algorithms are presented in Table 1. The most widely used algorithms are Levenberg-Marquardt and Quasi-Newton methods because they are very fast and produce exceptional computational errors. Both are mostly used for datasets that are not much. For datasets that are large, Scaled Conjugate Gradient and Resilient Backpropagation are mostly the preferred options for training. Training Function Algorithms trainlm Levenberg-Marquardt trainbr Bayesian Regularization trainbfg BFGS Quasi-Newton traingdm Gradient Descent with Momentum traingd Gradient Descent trainoss One Step Secant 36 However, the default training method for feedforward network is the Levenberg-Marquardt. It is worthy of mention that the term backpropagation refers to gradient descent algorithm for the training of neural network (Demuth et al 2007). Figure 7. Single node anatomy of ANN. As pointed out earlier, neuron or node as it can also be called formed the basic component of ANN. Figure 7 showed the anatomy of a single node ANN. Where the inputs are a1, a2 and an, and the output by Oj. As shown in Figure 7, the node is the summing point. The node can accept more inputs than the ones shown in Figure 7. The function of the node is to manipulate the inputs to give a single output signal. The values W1j, W2j, and Wnj, are weight factors associated with the inputs to the node. Weights are adaptive coefficients within the network that determine the intensity of the input signal. Each input (a1, a2, … , an) is multiplied by its corresponding weight factor (W1j, W2j, … , Wnj), and the node uses summation of these weighted inputs (W1j * a1, W2j * a2, … , Wnj * an) to estimate an output signal using a transfer function. http://www.sciencedirect.com/science/article/pii/S1110016812000518 37 The other input to the node, bj, is the node’s internal threshold, also called bias. This is a randomly chosen value that governs the node’s net input through the following equation: Node’s output is determined using transfer function on the node’s net input. Sigmoid, hyperbolic tangents and linear transfer functions can be effectively used. The transfer function can transform the node’s net input in a linear or non-linear manner. Sigmoid Transfer Function Hyperbolic Tangent Transfer Function Linear Transfer Function The neuron’s output Oj is obtained based on any of the afore-mentioned transfer functions on the neuron’s net input uj. Hence, the equations above are transfer function equations that can be applied to the net input Uj to give the neurons’ output Oj. 38 2.5 Advantages and Disadvantages of ANN Advantages of ANN: I. Adaptive learning: Sequel to the learning exercise for the network using appropriate training algorithm, the network can perform the task based on the data given for training. II. Self-organization: After receiving the information in learning time an ANN can create its own organization. III. Real-time operation: Many neural network computations can be carried out parallel. Specific hardware devices are being designed to take benefit of this ability of neural networks. IV. Fault-tolerance via redundant information coding: Partial damage of a neural network structures lead to the degradation of performance. Though, some network abilities may be recollected even after major network damage. Disadvantages of ANN: I. Size and Complexity: Neural networks size and complexity is very high. 39 3 APPLICATION OF ANN IN MEDICINE 3.1 Artificial Neural Network in Medicine The advancements in the field information and communication technology (ICT) have always been at a geometric rate. These advancements have been felt positively in other areas of endeavours. Medicine is not an exception. The tremendous development of ICT had contributed immensely to medicine through the development of powerful tools such as lasers, ultrasonic and so on that could aid medical treatments. Other areas that ICT have contributed to medicine are in data analysis and machine learning. For instance, Artificial Intelligence (AI) has contributed immensely to medicine and biological research. ANNs are an interesting and extensively studied branch of AI. It has been touted as a promising research area and it is opined by researchers in the field of machine learning and data science that ANNs would have extensive application to various biomedical problems in the future. Presently, it has gained the needed audience and attention in medicine as it was successfully applied to medical areas such as diagnostic systems, biochemical analysis, image analysis, and drug development (Konstantina 2017). This thesis will look at the application of ANN from the prediction of patient's situation point of view. ANNs have been extensively applied in diagnosis, electronic signal analysis, medical image analysis, and radiology. ANNs is aimed to assist the doctors to detect the complex nonlinear relationships between dependent and independent variables in the patient's data. The neural network is able to learn, capture, draw inferences and establish a relationship from the provided data. This is always produced as an output of the learning process. Therefore, trained ANNs are the digitized model of the biological brain. Nowadays, ANNs are widely used for medical applications in various disciplines of medicine especially in cancer treatment, cardiology and so on. 40 Also, it has been used extensively in diagnostic systems because the ANN is not affected by other factors such as stress, fatigue, working conditions, emotional states, and equipment error and so on that could affect the traditional diagnostic procedures. The network can easily be trained and the trained network can produce an output that demonstrates that the network understood the relationship between the variables contained in the dataset given. Furthermore, ANNs have also found its application in the biochemical analysis where it has been widely used to track the glucose levels in diabetic’s patients. ANN is also capable of detecting pathological conditions such as tuberculosis. Image analysis is nowadays a core aspect of medicine. The need for proper image analysis cannot be over-emphasized. Thus, ANNs have assisted in tumour detection and classification of chest X-rays. The results produced through the application of ANNs to medicine have been promising so far. Drug development, modeling, clinical research, pharmacoepidemiology, and medical data mining are some other medical areas where ANNs have been extensively used. It is important to mention that the high computation rates of ANNs had also contributed to its acceptance in medicine. Hence, paving way for ANN to be applied also in telemedicine. Having highlighted the importance of ANN in medicine, it is therefore important to ask if ANN can replace human experts? The answer is in negative- NO. ANN is thus a tool that is poised to help the doctors and researchers in the medical field. Finally, ANNs would assist the doctors in the screening process and ultimately to double-check and confirm their diagnosis. 41 3.2 Types of Neural Networks used in this research 3.2.1 Feed-Forward Neural Network (feedforwardnet) Feedforward Neural Network, also known as feedforwardnet consists of layers. The network input layer connects the first layer. The output is produced through the last layer. The intermediate layers are connected in such a way that each intermediate layer has a connection from the previous layer. It is specifically used to map out the relationship between input- output. In the simplest form, the network has one hidden layer and numerous neurons. In terms of the syntax, feedforwardnet is given by: Table 2. Feedforwardnet parameters. hiddenSizes Row vector of one or more hidden layer sizes (default = 10) trainFcn Training function (default = 'trainlm') As pointed out, feedforwardnet maps input-output relationships. When more functionalities are required, specialized versions such as fitnet and pattern recognition are good choices. 42 Similarly, cascade feedfoward neural network (cascadefeedforwardnet) offers unique functionality as it connects the input layer to all other layers. Hence, fully established connections between layers are employed in cascadefeedforwardnet. 3.2.2 Elman Neural Network (elmannet) The most widely cited example of feedforward network, also known as feedforwardnet is the Elman Neural Network (ENN) (Elman 1990). It was characterized by the fact that it has local memory and feedback connections. It was J.L Elman that first proposed it in 1990. It is also back propagation neural networks and a two-layer neural network. It basically consists of the input layer, hidden layer, and output layer respectively. The feedback connection is usually from the output of the hidden layer to its input as shown in Figure 8. It has a ready-made function called elmannet in MATLAB. Figure 8. Block Diagram of the Elman Neural Network (Kannathal 2006). The feedback connection ensures that Elman networks learn effectively. In addition, temporal and spatial patterns are easy to recognize and generate with the help of the feedback connection. 43 Mathematically, the algorithm of Elman neural network was presented in the equation below. However, it is important to mention that ENN uses staticderiv which was not a full dynamic derivative. The final output of the trained network is usually compared with the expected output to see how well and effective the network learned. Similarly, it is structurally represented as shown in Figure 9. Figure 9. Structural representation of Elman neural network . The structure of ENN was presented in Figure 9. It can be deduced from this figure that it has four nodes. The connections of input, hidden and output nodes are similar to the feed- forward network. As discussed in Chapter 2, the structure of Elman neural network consist of input nodes, hidden nodes, context nodes and output nodes respectively. The structure also includes the weights. Wc represents the weight between context and hidden nodes. Wij denotes the weight between input and hidden nodes. Wjk is the weight between hidden and output. X1......Xn represents the inputs. Ʈj and Ʈk are activation functions (Hyperbolic tangents). Uj is the hidden output. Xc (k) represent the context layer. Ѳj and Ѳk biases in the hidden and output layers. Z(k) represent the output 44 The feedback characteristic is a unique feature of Elman network and it basically utilizes the context node to memorize and return the hidden layer's output values. This essential feature makes Elman network to be sensitive and suitable for the learning purpose and also in the analysis of time series data and historical data respectively. In terms of the syntax, Elman network is given as follows: Table 3. Syntax parameters of Elman network. layerdelays Row vector of increasing 0 or positive delays (default = 1:2) hiddenSizes Row vector of one or more hidden layer sizes (default = 10) trainFcn Training function (default = 'trainlm') Based on the syntax and the structural representation of Elman neural network, the output of each nodes can be mathematically modelled as follows: Xc(k) = αXc (k-1) + U(j-1) (1) Uj = f [ (Wc * Xc(k)) + (Wij * Xn) ] (2) y(k) = g [(Wjk * Uj)] (3) Equations I-3 above represents the outputs of context, hidden and the final outputs respectively. 45 Additionally, f() and g() in the equation above are the linear and nonlinear output functions of the output nodes and hidden nodes respectively. Equations (I-3) can be modified further in such a way that when the input vectors are mapped to set of hidden nodes through an activation function Ʈj , the mathematical representation is as shown in equation 4. Since Elman has context layer, it means that there will be a delayed hidden variables represented as Uj' from the prior training iteration. Thus, the result of the mathematical modification is as shown above. In the same way, the output layer can be modified further through similar procedure as explained for the hidden layer. In this case, the activation function is given as Ʈk. Finally, the mapping relationship between hidden and output layer is represented as yk and given in equation 5 as : With the introduction of full dynamic derivative calculations that uses the concept of fpderiv and bttderiv, time delay neural networks (timedelaynet), layer recurrent neural network (layrecnet), nonlinear autoregressive neural (NARXNET) and nonlinear autoregressive neural network with external inputs (NARXNET) are now the preferred networks because they produced better error performance than Elman neural network. These networks would be examined in Chapter 4. 46 3.2.3 Time delay neural network (timedelaynet) Time delay neural network, TDNN was first developed in the 1980s (Waibel et al 1989). It is an artificial neural network that is characterized by two special layers. These are the hidden layer and output layer. In TDNN, the nodes are connected fully by direct connections as shown in Figure 10. Figure 10. Single time delay neuron (TDN) with inputs and delays at the time (t) (Hongying et al 2016). The nodes of hidden layer and output layer are time-delay neurons (TDNs). The inputs are multiple inputs, time series with time step (t). The inputs are grouped as M inputs but explicitly it is made up of I 1 (t) until I M (t). Each of the explicit inputs has a bias value represented as bi. 47 Similarly, as shown in Figure 10, TDNN also has N delays explicitly as D i 1 till D i N. Since it is the time delay, it is capable of storing previous inputs I i (t-d) where d varies from 1 to N. Also, N is the independent unknown weights represented as w i 1...w i N. F is the transfer function of f(x). The output O(t) is represented by the equation below: From the output equation above, the overall outcome of the neurons could be considered to be dependent on the current time (t) and also the previous time steps (t-d). Figure 11. The Architecture of TDNN neural network (Hongying et al 2016). Where: Wid j weight of the hidden node H j . Wjd r weight of the output node O r bi j and ci r are biases. N1 number of delays for output layer. N2 number of delays for hidden layer. 48 Having understood the single TDN it becomes easier to model dynamic nonlinear characteristics of series inputs. This forms the basic building block of TDNN. In terms of architecture, TDNN has hidden layer with J TDNs and conversely an output layer with R number of TDNs that are fully connected as shown in Figure 11. TDNN can be trained using Levenberg-Marquardt algorithm (Levenberg 1944; Marquardt 1963). Levenberg-Marquardt is an example of the traditional feedforward-feedback network. The training process of Levenberg-Marquardt optimizes the weights through iterations. Input time series (X) and known labels Y(t) are iterated for t = 1,....T, given that T is the length of the sequence. 3.3 Layer Recurrent Neural Network (layrecnet) Layer recurrent neural networks (LRNN) are dynamic and artificial neural network. In this network, there are connections between units, and these connections are in a directed cycle manner. Hence, the name - dynamic neural network. LRNN is similar to feedforward network but differs in the sense that each layer has a recurrent connection with a tap delay associated with it. This is an important feature that makes LRNN to have an infinite dynamic response to time series input data. 49 However, when a finite input responses are desired, time delay (timedelaynet) and distributed delay (distdelaynet) neural networks are the neural networks of choice. In terms of the syntax of LRNN, it is given by: Where: Layrecnet (layer recurrent network) takes the following arguments as shown in Table 4. Table 4. Layrecnet parameters layerDelays Row vector of increasing 0 or positive delays (default = 1:2) hiddenSizes Row vector of one or more hidden layer sizes (default = 10) trainFcn Training function (default = 'trainlm') With the above parameters, the layrecnet function returns a layer recurrent neural network. Though ELMAN is a simplified form of LRNN, LRNN is characterized by the fact that there is a feedback loop, with a single delay, around each layer of the network except for the last layer as shown in Figure 12. https://se.mathworks.com/help/nnet/ref/timedelaynet.html https://se.mathworks.com/help/nnet/ref/distdelaynet.html 50 There are different types of LRNN such as fully recurrent network, recursive neural networks, Hopfield Network, Elman and Jordan Neural Networks, continuous-time RNN, Bi- directional RNN to mention but a few. Figure 12. Layer recurrent network architecture (MATLAB 2017). 3.3.1 Fully Recurrent Neural Network The Fully recurrent network is a network of neuron-like units developed in the 1980s. Each of the units has a directed connection to every other unit. Each connection has a real-valued weight that is modifiable. Similarly, each unit is characterised by a time varying real-valued activation. 51 3.3.2 Hopfield Neural Network (HNN) It is a form of recurrent artificial neural network invented in 1982. It was named after the inventor, John Hopfield. It has an essential feature that it guarantees that all its dynamics will converge, that is, to the local minimum. Albeit, it sometimes converges to false local minimum. It is not used for sequence of patterns and this makes it to be a unique neural network. All connections are symmetry therefore, it is designed specifically to require stationary inputs as shown in Figure 13. Figure 13. A four nodes Hopfield neural network (Hopfield 1982) Hence, connections in Hopfield network have two restrictions. These are (i) no unit has a connection with itself (ii) connections must be symmetry. It has few variations such as bidirectional associative memory (BAM). Hopfield neural network basically takes two values for their states, that is, 1 and -1. These values are determined by whether the inputs exceed the threshold or not. If the inputs exceed the threshold, it is +1, but if it is within the threshold, it is -1. However, 0 and 1 values are used in some literature. 52 Updating one unit in the HNN is based on the under listed rules: Given that: wij is the weight of the connection from unit j to unit i. Sj state of unit j. Ѳi is the threshold of unit i. HNN units can be updated either through synchronous and asynchronous means. By synchronous, it means that all the units are updated at a time. This method has a disadvantage that it is not effective. Similarly, in asynchronous, only one unit is updated at a time. The unit to be updated can be randomly picked or in a pre-defined order. With regards to the learning rules in HNN, it can either be local or incremental learning rules respectively. Learning or training rule is considered local in HNN if each weight is updated based on the information available to neurons on either side of the connection associated with a particular weight. Conversely, in incremental learning rules, the new pattern can be learned without using information from the previous patterns. 53 3.3.3 Recursive Neural Network As the name implies, same sets of weights are recursively applied over a structure. The architecture of a simple recursive neural network is as shown in Figure 14. Figure 14. An architecture of recursive neural network (Hammer et al 2004). The RNN was trained using one of the widely used algorithms like scaled conjugate gradient. The gradient is calculated using backpropagation through structure (BPTS) which is a family of backpropagation through time (BPTT) used in recurrent neural networks (Goller and Kuchler 1996). In addition, recurrent and recursive neural network differs. In recurrent neural network, the hidden representation and previous time step are combined to produce a unique representation of the current time step. Also, the chain of the recurrent neural network is linear. Recursive neural network, on the other hand, operates on the hierarchical structure where parents representation arises when the child representations are combined (Hammer et al 2004). https://en.wikipedia.org/wiki/Recursion 54 3.4 Nonlinear autoregressive neural network (NARNET) NARNET is used to make predictions of a time series based on that particular series past values (Nyanteh et al 2013 and Lopez 2012). In MATLAB, nonlinear autoregressive neural network has a function known as narnet. It takes the following arguments: narnet(feedbackDelays,hiddenSizes,trainFcn) Table 5. Parameters for narnet feedbackDelays Row vector of increasing 0 or positive delays (default = 1:2) hiddenSizes Row vector of one or more hidden layer sizes (default = 10) trainFcn Training function (default = 'trainlm') Modelling time series using linear model is more often than not a difficult task. This is because time series applications have high variations and fast transient durations. Therefore, the need for a better model for such applications becomes imperative. NARNET has been touted to handle some of the complications that are peculiar to time series data. NARNET is given by: The equation above explains how predictions could be made using NARNET. For example, predictions can be made using past values of the data series (Ibrahim et al 2016). Where y(t) is the predicted value, the function known as h (previous values) is not known but the past values of y(t) can make the prediction to be realizable. 55 In the world of neural network, with network training, the function h() can be achieved by means of the weights and bias optimizations respectively. Where ɛ(t) denotes the error of the approximation of the series. The structure of the NARNET is as shown in Figure 15. Figure 15. Nonlinear autoregressive network (NARNET) (Luiz Gonzaga et al 2016) The past values, that is, the p values y(t-1).....y(t-p) are known as the feedback delays. To obtain the network topology that can provide the best performance, several factors need to be considered. Example of such factors include: the training algorithm, number of hidden layers and neurons to mention but a few. Usually, adjusting the number of hidden neurons or changing the training algorithm normally gives better performance. The number of hidden layers and training algorithm are adjusted and varied through trial-and-error means. It is nevertheless important to understand that the complexity of the system has a direct proportionality to the number of neurons. As increasing the number of neurons increases, the network also becomes more complex. 56 Although, an increased number of neurons gives better generalization efficiency and the speed of computation of the network. NARNET is based on Levenberg-Marquardt propagation procedure (LMBP) algorithm (Alwakeel & Shaaban 2010; Marquardt 1963; Hagan et al 1996). It is the default because it is the fastest type of backpropagation algorithm. In addition, the training duration is very fast and it uses the second-order derivative, hence, there is no need to compute Hessian Matrix. Instead, it uses the Jacobian Matrix for calculation. NARNET uses either Mean Square Error (MSE) or Error Sum of Squares (SSE) and both are given by: Where: yi is i-th data sample. ý approximated data obtained by the network for the value yi. n is the number of the data sample. In this thesis work, layer recurrent neural network would be used to predict of recurrence and mortality of tongue cancer in the patients. Both NARNET and NARXNET can be considered as a recommendation for future work. 57 3.5 Nonlinear Autoregressive Neural Network with external input (narxnet) This is similar to NARNET but differs with the addition of external input. It is also a nonlinear model that predicts the future values based on the past values and also an exogenous or external data supplied to the network. In terms of syntax, it is given by: Where: Table 6. NARXNET parameters InputDelays Row vector of increasing 0 or positive delays (default = 1:2) feedbackDelays Row vector of increasing 0 or positive delays (default = 1:2) hiddenSizes Row vector of one or more hidden layer sizes (default= 10) trainFcn Training function (default = 'trainlm') In NARX network, known as NARXNET, the network is able to predict series y(t) given the past values of series y and another external series x(t) as shown in the equation below: 58 The external or exogenous series could be single or multidimensional. The architecture for NARXNET is as shown in Figure Figure 16. The architecture of nonlinear regressive network with external inputs (Luiz Gonzaga et al 2016). NARXNET AND NARNET are quite similar but differs only with the addition of external inputs in the case of NARXNET. The training of NARXNET equally uses LMBP. NARXNET produces better performance than NARNET but the complexity nature of NARXNET has made NARNET to be widely preferred (Safavieh et al 2007). Therefore, in this thesis work, both methods will be used in the analysis of the dataset. 59 4 ANN SIMULATION OF FIXED AND DYNAMIC DATASETS This chapter is aimed at examining the performance analysis, error estimation and most importantly to have a clear understanding of some of the neural network types discussed in Chapter 3. This is in preparation for the neural networks for the real data, that is, the case study to be examined in Chapter 5 of this thesis. The dataset used in this chapter was provided by my supervisor, Professor Elmusrati. The dataset has been generated by a certain algorithm known to him. My main task is to test the dataset with the enumerated neural network types discussed in Chapter 3. The datasets were sent in two batches. The first batch of dataset was fixed dataset, while the second batch was dynamic dataset. The datasets were sent in excel formats (xlxs) and the datasets can be found as Appendixes I and II respectively in the appendix section of this thesis write up. By fixed dataset, it means that the values within the rows and columns of the dataset are related by a direct equation without the need for any past or future values. Conversely, dynamic datasets were obtained by establishing a relationship with either the past or future value or both as the case may be. In both cases, especially in the dynamic datasets, it is possible that the values of the columns and rows change each time that the files/documents are opened. Thereby giving false results. To avoid this error, the file was opened once and it was imported directly into MATLAB workspace. Therefore, Chapter 4 is poised to look at the neural network with regards to dynamic datasets only. Finally, in this chapter, the MATLAB code, various plots of performance analysis, regression plots, and expected and trained values plots will be shown without interpreting the plots. This is because the main work of this thesis is in Chapter 5 and that is the main dataset of interest. All the analysis and explanations of each plot will be explained in Chapter 5. 60 4.1 Simulation exercise with Feedforward Neural Network Using the dynamic dataset as contained in Appendix II, the following commands were issued to MATLAB as shown in Figure 17. Figure 17. Feedforwardnet MATLAB code window. From Figure 17, the performance error was very small. The learning was thus successful. Figure 18. Neural network training output It is worthy of note that some of the code might be missing from Figure 17. The output shown in Figure 17 is a truncated output. 61 Similarly, Figure 18 showed the neural network training output. Figure 19. Performance error plot for feedforwardnet. The network was trained using Levenberg-Marquardt algorithm (trainlm) and the error from the training was calculated using Mean Squared Error. Figure 20. Target outputs and the neural outputs 62 The dataset was divided randomly using dividerand output. Hence, no need to manually divide the data into training, validation, and testing. However, the network can be trained manually using the tr.trainInd, tr.valInd and tr.testInd. These functions can be used to divide the data into training, validation and testing respectively. The plot from the training is as shown in Figure 19. Figure 21. Error histogram of the targets and the neural outputs. The variation between the target output and the neural outputs is as shown in Figure 20. Following Figure 20, the difference between the target and the neural output gives the error as shown in error histogram of Figure 21. The network training state is shown in Figure 22. It is a plot that shows the gradient, validation check, and epoch. The epoch gives the iteration level at which the network validation performance reached the minimum. In this case, as shown in Figure 22, the network validation reached the minimum at about 1000 epoch. 63 Figure 22. Training state of the network. It is important to mention that it is possible to have errors after the training and the results might not be as expected. To improve the results in this case, it is always a good practice, to initialize the network again and the training can be performed again. This is because, each time the network is initialized especially a feedforwardnet, the network parameters are different and thus the results might be different on each occasion. Also, the number of hidden neurons can be increased. It is important to mention that the number of hidden layer neurons should not be unnecessarily large to avoid under-characterization issues. Finally, the training algorithm can be changed may be from Levenberg-Marquardt to Bayesian regularization training. 64 Regression plot is shown in Figure 23. It shows the relationship between the variables that made up of the inputs and most importantly, how the inputs and outputs are related. The dashed line in the plot indicates how the expected output (target output) relates with the trained outputs (neural outputs). Figure 23. Regression analysis of the network training. While the solid line indicated the linear regression between outputs and targets. The regression value, R, as shown in Figure 23 for the training, testing and validation had the value of 1. 65 Hence, it is an indication of an exact linear relationship between the targets and the neural outputs. Finally, a simple plot of the expected or target output is shown against the neural output in Figure 24. It can be seen from Figure 24 that there is no much difference between the target output and the expected output plots. It is quite difficult to observe any difference in the plots. Figure 24. Plot of Target and Neural Outputs of feedforward network This is because the network effectively learned the relationship between the inputs and the output. Consequently, the performance error was quite insignificant as shown in Figure 17. Similarly, the regression plot of Figure 23 was also a strong indication to the fact that the network had effectively learned the relationship between the input variables and the output. 66 4.2 Simulation using Elman Neural Network From the dataset contained in Appendix II, the network performance of the Elman neural network can be examined. Unlike the feedforward neural network that uses the function feedforwardnet, Elman neural network uses the elmannet function with the performance evaluation done by another function called preparets. Though Elman neural network is also a variant of feedforwardnet, the difference lies in the training function. The dataset was randomly divided using dividerand but the default training function for elmannet was Gradient Descent With Momentum and Adaptive LR using the traingdx function. Figure 25. Elman neural network command window MATLAB Command window of Elman neural network is as shown in Figure 25. It was observed that this function did not train the network effectively, thus, I changed the training function to Levenberg-Marquardt function of trainlm. 67 Furthermore, the number of hidden neurons was increased from 10 to 20 to ensure a better performance as shown in Figure 26. Figure 26. Training window for Elman neural network. The truncated outputs of the target and the trained output, that is , neural output is shown in Figure 27. Despite the change in the training algorithm and also increasing the number of hidden neurons, Elman neural network did not learn the relationship between the inputs and the outputs effectively. This is because Elman neural network uses simplified derivatives calculations known as staticderiv. 68 This simplified derivative calculations used by Elman is known to ignore delayed connections. Thus, the learning was not as efficient as the feedforwardnet discussed in Section 4.1. Figure 27. Target and Neural outputs of the Elman neural network. Therefore, the need to check other network becomes imperative. The advent of full derivative calculations such as fpderiv and bttderiv, gives the researchers wide range of neural network to choose from to enhance better performance. Based on the failure of Elman neural network to performed the learning effectively, it is important to examine another network such as timedelay, layrecnet, narnet, narxnet for better training and efficient prediction of inputs-outputs relationships. These will be shown in subsequent Sections within Chapter 4 of this thesis. To probe further on why Elman did not learn as expected, the network performance of the network is shown in Figure 28. This plot shows the performance of the data when divided into training, validation and testing. 69 Mean Squared Error was used in the calculation of the training performance and as shown in Figure 28; it was at 6 epoch iterations where the best fit occurred at 4th iterations. Figure 28. Training performance of Elman neural network. The regression plot of Figure 29 clearly showed that there exists some level of relationships between the training, testing and validation data respectively. The regression value, R was not exactly 1 but closer to 1, which means that there is a relationship between the inputs and the output to a large extent. 70 In furtherance, the 'all plot' that was shown on the regression plot in Figure 29 is an indication of the extent of the relationship between the inputs and output. Figure 29. Regression plot of Elman neural network training The cluster points on each point showed how many of the variables that have no relationship between the inputs and the outputs. Interestingly, the training was thorough as shown on the training plot of the regression plot. 71 Finally, to put the difference between the target and the neural outputs succinctly, Figure 30 provides a quick view of the variation in the target and the neural output. Figure 30. Elman neural network plot of target and neural outputs Based on Figure 30, it can be seen that variation occurs only on few occasions. That is, about 8 times (8 rows) out of the 100 rows considered. Elman neural network learned well but it was not as effective as with the case of feed-forward neural network. 72 4.3 Timedelay Neural Network's simulation exercise Timedelay neural network is of paramount importance from the point of view that the input weights have a tap delay line that is associated with it. Hence, it is the mostly preferred in time series input data because it has a finite dynamic response. Levenberg-Maquardt was the training method used and the dataset was randomly divided using dividerand. The command window implementation of timedelay neural network is shown in Figure 31. Figure 31. Command window for time delay neural network. Both the training algorithm and the number of hidden neurons were changed to ensure that the network effectively learned the relationship. The training algorithm used is as shown in Figure 32. 73 However, these efforts of changing the training algorithm and increasing the number of hidden neurons have no positive impact on the network as far as improving the performance was concerned. Figure 32. Training window of timedelay neural network. As shown in Figure 33, a great deal of variation was observed between the target and the neural output. The reason for such great disparities in values remains unknown having changed the training algorithm to a more sophisticated training like trainscg and trainrp respectively. 74 The output is as given below: Figure 33. Target and neural outputs of timedelay neural network. Regression analysis plot shown in Figure 34 clearly summarizes the fact that the network failed to effectively learn the relationship. Figure 34. Regression plot of the relationship between the target and the neural outputs. 75 The regression value R, indicated the degree of the relationship. Finally, the target and the neural outputs are put on a distinct Figure 35 to clearly have a hint of the variation between the target and the neural output. Figure 35. Variation in target and neural output plot. Though, it is worthy to mention that an improvement in the learning efficiency was observed in terms of the performance and how close the target values were with the neural output when distributed delay neural network, distdelaynet was used. This is because it uses delay on the layer weights as well as the input weight. However, this is beyond the scope of this thesis. 76 4.4 Layer Recurrent Neural Network Layer recurrent neural network uses the function layrecnet and the command window is shown in Figure 36. Figure 36. Command window for layer recurrent neural network. Similarly, Figure 37 showed some of the learning outcomes of both target and neural outputs respectively Figure 37. Target and neural outputs of layer recurrent neural network. 77 From Figure 37, it can be said that layer recurrent neural network perfectively learned the relationship between the inputs and the output. Figure 38. Learning window of layer recurrent neural network. Levenberg-Marquardt was also used in the training and the dataset was randomly divided. The performance was calculated using the Mean Squared Error. 78 The quick view plot shown in Figure 39 gives a quick view of the values in the target and neural output. Figure 39. Target and neural outputs of layer recurrent neural network. From plot obtained in Figure 39, it means that the neural network effectively learned the relationship and thus, there was no observable difference between the target and neural output. 79 4.5 Nonlinear Autoregressive Neural Network (NARNET) Nonlinear autoregressive neural network is trained using the function narnet. The command window for narnet is shown in Figure 40. Figure 40. Command window for the nonlinear autoregressive neural network. It is important to examine the performance of narnet as it offers full dynamic derivative calculation. Although, this neural network will not considered in the analysis of data contained in Appendix IV, but it is necessary to introduce this network as it will be recommended to be the neural network for future analysis due to its usage of fpderiv algorithm. 80 As with the previously enumerated neural networks, the dataset was randomly divided. Figure 41. Training window of the nonlinear autoregressive training. Levenberg-Marquardt was also used in the training. Some of the outputs are as shown in Figure 41. From the output, clear disparities in the values of the target and the neural can be clearly seen. Figure 42. Narnet training results showing both target and neural outputs. 81 In addition, the regression plot shown in Figure 43 indicated that the neural network did not effectively learn the relationships between the inputs and output. Figure 43. Narnet regression plot of the learning process. The regression value was R = 0.17587, this value is nowhere less than 0.5. It therefore means that, there was no relationship whatsoever between the target and the neural output. 82 To put the relationship in a more readable format, Figure 44 quickly gives an insight into the variation between the expected output and also the trained output respectively. Figure 44. A graph of the target and neural values after training with Narnet. Since there was no much agreement between the expected value and the trained value. Changing the training algorithm and increasing the number of hidden neurons did not have any meaningful result on the output, the need to examine a variant of the nonlinear autoregressive neural network which is known as nonlinear autoregressive neural network with external inputs. This will be examined in the final section of this chapter, Chapter 4. 83 4. 6 Nonlinear Autoregressive Neural Network with External Inputs The function used by this network is known as narxnet. This is similar to narnet with the exception of the external inputs. Appendix II was used as the training dataset for this network. Sequel to the use of Appendix II, the MATLAB command window is shown in Figure 45. Figure 45. Narxnet MATLAB command window. Interestingly, narxnet is able to make prediction given the values of the past series, feedback input, and another time series called external or exogenous time series. Hence, the name nonlinear autoregressive neural network with exogenous input. However, this feature is not needed for the purpose of this thesis as this thesis deals with patient's data and the physiological conditions of patients varies from patients to patients. 84 The training algorithm used was Levenberg-Marquardt and the dataset was randomly divided as shown in Figure 46. Figure 46. Training window for narxnet. The target and neural outputs are shown in Figure 47. Figure 47. Narxnet target and neural outputs. 85 The regression plot in Figure 48 indicates the inputs-output relationship. The value of the regression plot showed that there was a little or no relationship. Figure 48. Regression plot for narxnet. 86 Finally, a quick view of the target and the neural plot is shown in Figure 49. Figure 49. Narxnet target and neural outputs. In conclusion, based on the use of dataset contained in Appendix II on different neural networks in Chapter 4 of this thesis, it can be concluded that feedfowardnet, elmannet and layrecnet provided a good learning outcome of the relationships between the inputs and output. In other words, the values of the target and the neural outputs are in agreements beyond reasonable doubts. Therefore, these will be the neural networks to be considered for the case study analysis that will be done in the next chapter, that is, Chapter 5. 87 5. ANN SIMULATION OF THE TONGUE CANCER’S CASE STUDY This chapter is aimed at examining the objectives of the thesis as outlined in Chapter 1 of this write-up. As pointed out in the introductory chapter of this thesis, cancer is a dreadful disease and the need to be proactive in the diagnosis and treatment becomes imperative. Tongue cancer, that is, oral (mobile) tongue Squamous Cell Carcinoma (SCC) is not an exception. Although, the tongue is characterised by the fact that it has a high amount of muscle bundles which may inhibit the potential tumour spread on it. Despite this fact, detection and prognostic studies are very important in other to record significance success in the crusade against cancer. While detection cancer has to do with having the information that the patient has cancer at an early state, the prognosis on the other hand examines the likelihood of survival of the patients. It is worthy of note that early detection of tongue cancer is not always an indicator for good prognosis (chance of survival from treatment). This is because evidence had shown that about 20% to 40% had spread vigorously to other parts (metastasis) (Ganly et al 2012 & Ho et al 1992). Thus, prognostic studies are poised to divide the patients into two- firstly, the patients whose tongue cancer is severe and thus would need aggressive treatment such chemotherapy or multimodal therapy. Secondly, the patients who would need surgical treatment alone (Kellermann et al 2007). This important classification into low and high risk will represent a major advancement in the management of this dreadful disease. Clinical size classification into T1 and T2 of early oral tongue SCC was unable to divide the patients into low risk and high risks respectively (Keski-Santti et al 2007). Hence, the need to look out for parameters to be used in the prognostication of cancer becomes imperative. Previous studies had used histomorphological parameters such as depth of invasion, tumour budding, the histological risk to mention but a few (Almangush et al 2013). Most of these histomorphological parameters made up the columns of Appendix III. In addition, the meaning of some of these parameters would be defined in section 5.1. 88 5.1 Definition of SCC related terms 5.1.1 Tumour Budding Tumour budding means loss of cellular cohesion as well as loss of active invasive movement. Both of these are malignancy properties. 5.1.2 Tumor size, Prognosis and Metastasis Prognosis is a measure of survival while metastasis defines the spread of cancer from one part of the body or the affected part to another. Tumor Size is the diameter of the tumor (Baran et al 2015). 5.1.3 Depth of Invasion (DOI) This is a measure of the thickness of the tumor. In other words, it defines the extent of the growth of the tumour. There are different pattern of invasions with worst patterns of invasions (WPOI) known as type 4 . There are other types such as type 5 tumour satellite and also perineural invasion (PNI). 5.1.4 Symptom The symptom can be said to be an indicator for a person or patients when the person changes from the normal condition or feelings to an unusual state or feelings due to the presence of something or a disease. In terms of cancer, it can be interpreted as a measure of the aggressiveness of the tumour, which has significant effects on prognosis (Baran et al 2015). 89 5.1.5 Pathological Stage (TNM Stage ) Pathological stage cTNM classifies in terms of the tumour size and location. TNM is the nowadays the commonly used pathological staging and it is defined below: T size and location of the main t