UNIVERSITY OF VAASA         

FACULTY OF TECHNOLOGY 

COMMUNICATIONS AND SYSTEMS ENGINEERING 

 
Alabi Rasheed Omobolaji 

PREDICTION OF RECURRENCE AND MORTALITY OF ORAL TONGUE 

CANCER  USING ARTIFICIAL NEURAL NETWORK 

(A case study of 5 hospitals in Finland and 1 hospital from Sao Paulo, Brazil)  

 
Master’s thesis for the degree of Master of Science in Technology submitted for inspection, 

In Vaasa, 12.08.2017 
 

Thesis Instructor  Dr Alhadi Almangush 

Thesis Supervisor Professor Mohammed Elmusrati 


2 

 
ACKNOWLEDGMENT 

In the name Allaah the Most beneficent, the Most Merciful. All thanks to Allaah and I 

beseech His peace and benedictions on the noblest of mankind, Muhammad, peace and 

blessings of Allaah be upon him and the generality of Muslims till the day of accountability.  

First and foremost, I would like to express my profound appreciation to my supervisor, 

Professor Mohammed Elmusrati, for his guidance throughout the development of this work 

and also throughout my Master’s program. You are indeed a role model. I have gained both 

academic knowledge and knowledge towards a unique approach to life and situations from 

you. It is a rare opportunity to work with you and I really appreciate the opportunity. Without 

an iota of doubt, you have left a positive mark in my life and I will always remain grateful. I 

am very proud to be your student. 

Similarly, I sincerely appreciate the efforts, contributions and continuous monitoring of the 

progress of the work from my instructor, Dr Alhadi Almangush. It was indeed a good 

learning curve to have worked with you. The professionalism and maturity shown during the 

course of this work was well appreciated. Most importantly, the opportunity given to me to 

work the dataset of the cancer patients. I thank you for the guidance throughout this work. I 

will forever be grateful to you.  

Furthermore, my deepest appreciation goes to my beloved mother for her unconditional love 

and unending support right from my primary school days. My mother is indeed the best 

teacher that I ever know. Talking to her alone gives me the joy and happiness needed to 

continue with my day to day activities.  

The whole of my Master program (MSc) and this thesis is specially and lovely dedicated to 

my wonderful wife, Ummu Mu'adh - Atunrase Mistura. She encouraged me to pursue this 

MSc program. Her emotional support was vital while acknowledging the long weekly 

separation and discomfort we both endured to accomplish this fit. It is not easy travelling 

from Helsinki-Vaasa-Helsinki on a weekly basis . She believed in me more than I have 

believed in myself and that gave me the courage to persevere whenever I hit roadblocks. I 

love you so much.  

 
3 

 
To my lovely son, Mu'adh Adebayo, May Allaah bless you. Thinking about you alone is 

enough for me to be happy. I love you so much. It was not easy to leave you in Helsinki on a 

weekly basis while I was busy in Vaasa with my studies. I sincerely appreciate the 

understanding. I pray to Allaah to make you a scholar of high repute. 

I am thankful to Professor Timo Mantere, Tobias Glocker, Ahmed Elgrgouri, Dr Ali 

Altowati, Shaima AbdulMageed for their immense contribution in deepening my knowledge 

through the various courses taught in this masters degree program. This acknowledgement 

will not be complete without showing appreciation to my brother and friend, AbdulRahman 

Olaobaju for his understanding and numerous tutorials to make sure I meet up with my 

academics. To all the members of academic and non-academic staff of the Faculty of 

Technology, Communications and Systems Engineering Group especially Marjukka Isaksen, 

I say a big thank you for their contribution to the success of this program. My classmates 

must be acknowledged for their support and positive contribution during the course work.  

 
Vaasa, Finland, August, 2017, 

Alabi Rasheed Omobolaji 

 
4 

 
CONTENTS 

ACKNOWLEDGEMENT         2 

CONTENTS           4 

SYMBOLS           8 

ABBREVIATIONS          10 

LIST OF FIGURES          12 

LIST OF TABLES          18 

ABSTRACT           19 

1  INTRODUCTION         20 

   1.1 Dataset        22 

   1.2 Motivation        22 

   1.3 Thesis Structure       23 

2  FUNDAMENTAL OF NEURAL NETWORK     24 

   2.1 Artificial Neural Network      25 

   2.2 Classification of ANN      30 

   2.3 Training of ANN       33 

   2.4 Training Algorithm       35 


5 

 
   2.5 Advantages and Disadvantages of ANN    38 

3  APPLICATION OF ANN IN MEDICINE      39 

  3.1 Artificial Neural Networks in Medicine     39 

  3.2 Types of Neural Network used in the thesis     41 

   3.2.1 Feedforward Neural Network  (feedforwardnet)   41 

   3.2.2 Elman Neural Network (ENN) (elmannet)    42 

   3.2.3 Timedelay Neural Network (timedelaynet)    46 

  3.3 Layer Recurrent Neural Network (LRNN)     48 

   3.3.1 Fully Recursive Neural Network      50 

   3.3.2 Hopfield Neural Network      51 

   3.3.3 Recursive Neural Network      53 

  3.4 Non Autoregressive Neural Network (NARNET)    54 

  3.5 Non Autoregressive Neural Network with External (NARXNET)  57 

4  SIMULATION OF FIXED AND DYNAMIC DATASETS    59 

  4.1 Simulation exercise with Feedforward Neural Network   60 

  4.2 Simulation using Elman Neural Network     66 

  4.3 Timedelay Neural Network Exercise      72 

  4.4 Layer Recurrent Neural Network      76 


6 

 
  4.5 Nonlinear  Autoregressive Neural Network     83 

  4.6 Nonlinear Autoregressive Neural Network with External (NARXNET) 83 

5  ANN CASE STUDY SIMULATION OF TONGUE CANCER   87 

  5.1 Definition of SCC related terms       86 

   5.1.1 Tumour budding       88 

   5.1.2 Tumour Size, Prognosis and Metastasis    88 

   5.1.3 Depth of invasion                  88 

   5.1.4 Symptoms                88 

   5.1.5 Pathological Stage       89 

  5.2 ANN in clinical Prognostication      89 

  5.3 Data collection and training process      90 

  5.4 Neural Network for predictions      92 

   5.4.1 Prediction of recurrence from feedforward network   97 

   5.4.2 Prediction of statuslatest for feedforward network   100 

  5.5 Prediction of statuslatest for feedforward     103 

   5.5.1 Prediction of mortality from feedforward network   107 

  5.6  Elman neural network for the prediction of recurrence and mortality 111 

   5.6.1 ENN for recurrence prediction     111 


7 

 
  5.7  Layer Recurrent Neural Network for the prediction    114  

   5.7.1 Prediction of Recurrence of Tongue cancer using Layer Recurrent  

   Neural Network for the prediction of recurrence   119  

   5.7.2    Prediction of Mortality of Tongue cancer using Layer Recurrent  

   Neural Network for the prediction of recurrence   121 

  5.8  Prediction of mortality using LRNN      126 

  5.9  Analyses of the dependency of variables     128 

   5.9.1 Analyses of variables dependencies for recurrence   130 

   5.9.2 Verification of the newly proposed dependent markers  140 

   5.9.3 Variables dependencies on the prediction of mortality  144 

   5.9.4 Using the new markers to predict mortality    150 

  5.10  Sigmoid function on the output layer      153 

   5.10.1  Prediction of Recurrence based on sigmoidal function output 154 

   5.10.2  Prediction of Mortality based on sigmoidal function output  158 

6  CONCLUSION AND FUTURE WORK      160 

  REFERENCES         164 

  APPENDIXES         172 

 
8 

 
SYMBOLS 

ɛ (t) Error of approximation of the series 

Ѳi  
Threshold of the unit i. 

a1......an Inputs 

bj Thresholds 

D
i
1....D

i
N Delays 

f(x) Sigmoid function 

f'(x) Sigmoid function for back propagation 

h() Previous Values 

I
1
(t) .....I

M
(t) Inputs 

N1....NM Number of Delays 

n Number of data sample 

net Neural Network 

Oj Output 

Sj State of Unit J 

T1 Tumour Size one 

T2 Tumour Size two 

trainFcn Training Function 

u Weights 

v Weights 

Wnj Weights 

x(k) Outputs of the hidden layer 

xc(k) Outputs of context layer 

ý Approximated data obtained by the network  for value yi 


9 

 
yi i-th Data Sample 

yj(t) Output of hidden layer 

yk(t) Final Output 

y(t) Data series for prediction 

y(t-1)..y(t-p) Past Values / Feedback Delays 


10 

 
ABBREVIATIONS 

 
AI Artificial Intelligence. 

ANN Artificial Neural Network. 

BAM Bidirectional Associated Memory (BAM) 

BPTS Back Propagation Through Structure  

BPTT Back Propagation Through Time  

CAFs Cancer Associated Fibroblasts 

cTNM Cancer Tumour Size, Lymph Nodes, Metastasis. 

DNA Deoxyribo Nucleic Acid 

dividerand This function divides the dataset automatically 

ENN Elman Neural Network. 

Elmannet Elman Neural Network function 

feedforwardnet Feedforward Neural Network function 

FS Feature/Input Selection 

GA Genetic Algorithm 

HNN Hopfield Neural Network 

ICT  Information Communication Technology  

layrecnet Layer Recurrent Neural Network function 

LHR Lymphocytic Host Response 

LMBP Levenberg-Marquardt Propagation Procedure 

LRNN Layer Recurrent Neural Network 

LVQ Learning Vector Quantization  

RL Reinforcement Learning  

RNN Recurrent Neural Network 


11 

 
NAR Nonlinear Auto Regressive 

NN Neural Network. 

NARNET Nonlinear Autoregressive Neural Network function 

NARXNET Nonlinear Autoregressive Neural Network with external 

input  function 

MATLAB MathWorks Simulation Tool R2014b. 

MLP  Multi-Layer Perceptron 

MSE Mean Square Error 

PNI Peri Neural Invasion (PNI)  

SCC Squamous Cell Carcinoma. 

SGD Stochastic Gradient Descent 

SMA Smooth Muscle Acting  

SSE Sum of Squares Error 

SVM Support Vector Machine. 

tr.trainInd This function divides the dataset into training set 

tr.valInd This function divides the dataset into validation set 

tr.testind This function divides the dataset into testing set 

trainFcn Training Function 

timedelaynet Time Delay Network 

TDN Time Delay Neuron 

TDNN Time Delay Neural Network 

VALVIRA National Supervisory Authority for Welfare and Health. 

WPOI Worst Pattern Of Invasion 


12 

 
LIST OF FIGURES                        Page 

 
Figure 1. Artificial neurons and its components (Hassan et al 2016).          24 

Figure 2. Feed-forward network structure (Tahmasebi et al 2011).          27 

Figure 3. Elman simple recurrent neural network (Elman 1990).          28 

Figure 4. Single layer feed-forward network.             30 

Figure 5. The structure of multi-layer perceptron (Hassan et al 2016).          32 

Figure 6. Neural Network training Structure.             34 

Figure 7. Single node anatomy of  ANN (Hassan et al 2016).           36 

Figure 8. Block Diagram of the Elman Neural Network  

                       (Kannathal 2006).                                                                      42 

Figure 9. Structural representations of Elman  (Yin and Chen 2016).          43 

Figure 10. Single time delay neuron (TDN) with inputs and delays at time  

  (Hongying et al 2016).             46 

Figure 11. Artificial of TDNN neural network (Hongying et al 2016.)          47 

Figure 12. Layer recurrent network architecture (MATLAB 2017).          50 

Figure 13. A four nodes Hopfield Neural Network.            51 

Figure 14. An Architecture of recursive neural network (Hammer et al 2004).                                                             

             53 

Figure 15. Nonlinear autoregressive network (NARNET) (Luiz Gonzaga et al 2016)  55 

Figure 16. The architecture of nonlinear regressive network with external inputs   

(NARXNET)  (Luiz Gonzaga et al 2016).             45 


13 

 
Figure 17. Feedforwardnet MATLAB code window.           60 

Figure 18. Neural Network training output.            60 

Figure 19. Performance error plot for feedforwardnet.           61 

Figure 20. Target outputs and the neural outputs.          61 

Figure 21. Error histogram of the targets and the neural outputs .        62 

Figure 22. Training state of the network.            63 

Figure 23. Regression plot of the network training.           64 

Figure 24. Plot of Target and Neural Outputs of feedforward network.    65 

Figure 25. Elman neural network command window.          66 

Figure 26. Training window for Elman neural network.     67 

Figure 27. Target and Neural outputs of the Elman neural network.   68 

Figure 28. Training performance of Elman neural network.    69 

Figure 29. Regression pot of Elman neural network training.    70 

Figure 30. Elman neural network plot of target and neural outputs.   71 

Figure 31. Command window for time delay neural network.    72 

Figure 32. Training window of timedelay network.     73 

Figure 33. Target and neural outputs of timedelay neural network.   74 

Figure 34. Regression plot of timedelaynet of the relationship  between the target and the 

neural outputs.           74 

Figure 35. Variation in target and neural output plot.     75 

Figure 36. Command window for layer recurrent neural network.   76 

Figure 37. Target and neural outputs of layer recurrent neural network.    76 


14 

 
Figure 38. Learning window of layer recurrent neural network.     77 

Figure 39. Target and neural outputs of layer recurrent neural network.    78 

Figure 40. Command window for nonlinear autoregressive neural network                  79                                             

Figure 41. Training window of nonlinear autoregressive training     80 

Figure 42. Narnet training results showing both target and neural outputs                     80                                              

Figure 43. Narnet regression plot of the learning process.      81 

Figure 44. A graph of target and neural values after training with Narnet.     82                                          

Figure 45. NARNET MATLAB command window      83 

Figure 46. Training window for narxnet         84 

Figure 47. Narxnet target and neural outputs.        84 

Figure 48. Regression plot for narxnet.         85 

Figure 49. Narxnet target and neural outputs        87 

Figure 50. Data collection and training process        91 

Figure 51.       The design of the network with desired  inputs and outputs.   94                                 

Figure 52.       Schematic diagram of the output                 95 

Figure 53.       The command window showing codes for feedforward network.  97 

Figure 54.  The expected and trained output after training with neural network.  98                                                                        

Figure 55.   A plot of the extent of variation  between target and neural output.             99 

Figure 56.   Training summary for feedforward  neural network of the real data.  99 

Figure 57.  Regression plot from feedforward neural network of the real dataset.   100                                      

 
15 

 
Figure 58.   Prediction of recurrence as an output of a given new inputs.   101 

Figure 59.   Prediction of recurrence based on newly formed inputs.   102 

Figure 60.   The training summary of status latest as output.    103 

Figure 61.   The command window code for statuslatest as output.   104 

Figure 62.    Expected and neural output where mortality was the output variable. 104 

Figure 63.   The representation of the target and neural output where statuslatest was the 

output            105 

Figure 64.   The feedforward network performance  when mortality was the output. 106 

Figure 65.    Regression plot for mortality as output using feedforward network.  106 

Figure 66.   Controlled prediction using one of the known input rows.   107 

Figure 67.   Uncontrolled predictions of arbitrary inputs.     108 

Figure 68.   Output from resilient backpropagation training algorithm.   109 

Figure 69.    The performance measurement of 20 inputs with a changed training algorithm. 

            110 

Figure 70.  The training summary of ENN on the real data.     111 

Figure 71.  The default training algorithm for ENN.     112 

Figure 72.  Performance measure of ENN on the real data.    113 

Figure 73.  Expected and neural output of ENN on the real data.    113 

Figure 74.  Layer recurrent neural network for prediction of recurrence and mortality. 114 

Figure 75.  Training algorithm for later recurrent neural network.   115 

Figure 76.  The command window showing the performance of LRNN.   116 

 
16 

 
Figure 77.  The expected and neural output for LRNN in recurrence prediction. 116 

Figure 78.  The expected and neural output for prediction of recurrence in layer recurrent 

neural network.          117 

Figure 79.  The regression plot of the training phase of the layer recurrent network for 

recurrence prediction.          118 

Figure 80.  Prediction of recurrence using layer recurrent network.   119 

Figure 81.  Arbitrary input prediction for recurrence using layer recurrent network. 120 

Figure 82.  The performance output for the prediction of mortality.   121 

Figure 83.  Output of the performance when the algorithm was changed.  122 

Figure 84.  Layer recurrent neural network with increased number of neurons for 

mortality prediction.          123 

Figure 85.  Training summary of layer recurrent with increased hidden neurons.  124 

Figure 86.  Mean Square Error performance of layer recurrent neural network.  124 

Figure 87.  Expected and trained outputs after increased hidden neurons for layer 

recurrent.            125 

Figure 88.  Error histogram plot of the difference between the expected and the trained 

value.            126 

Figure 89.  The prediction of mortality using layer recurrent network.   127 

Figure 90.  Prediction of mortality using randomly predicted inputs.   128 

Figure 91.  The network performance when all the inputs are used.   130 

Figure 92.  Training network for the new markers.     139 

Figure 93.  The performance of the neural network with the new inputs.   141 

 
17 

 
Figure 94.  Expected and neural output using the new markers.    141 

Figure 95.  The controlled prediction using the new markers.    142 

Figure 96.  Randomly generated input for the prediction of recurrence.   143 

Figure 97. Input and output summary for the dependencies analysis.   144 

Figure 98.  The performance of the network for prediction of mortality.   145 

Figure 99.  The training performance of the network with the new markers for mortality. 

            150 

Figure 100.  Controlled prediction of mortality using the new markers.   151 

Figure 101.  Random prediction of inputs for mortality.     152 

Figure 102.  Neural output with activation function.     155 

Figure 103.  Prediction of recurrence from sigmoidal neural output   156 

Figure 104.  Arbitrary input prediction on sigmoidal function layer for recurrence. 157 

Figure 105.  Sigmoidal output for mortality.      159 

 
18 

 
LIST OF TABLES                           Page 

 
Table 1.          Training Algorithms for ANN.       35 

Table 2.          Feedforwardnet parameters.        41 

Table 3.          Syntax parameters for Elman Neural Network.     44      

Table 4. Layer recurrent parameters.        49 

Table 5.          Parameters for NARNET .       54 

Table 6.          NARXNET parameters.                  57 

Table 7.          Explanation of inputs and output variables      92 

Table 8.          Important markers for the prediction of recurrence of tongue cancer           

            138 

Table 9.           Order of significance of identified markers by ANN  

   for recurrence prediction                  139 

Table 10.       Important markers for the prediction of mortality               148 

Table 11.        Order of significance of identified markers by ANN  

   for mortality prediction       149 

Table 12.        Combined markers  found to be important for ANN             152 

Table 13.        Sigmoidal function analysis on the neural output               154 

Table 14.        Sigmoidal neural output for mortality prediction               158 

 
19 

 
UNIVERSITY OF VAASA  

Faculty of technology 

Author:  Alabi Rasheed Omobolaji 

Topic of the Thesis:  Prediction of Recurrence and Mortality of Oral Tongue- 

 Caner using Artificial Neural Network 

Supervisor:  Professor Mohammed Salem Elmusrati 

Instructor:  PhD Alhadi Almangush 

Degree:  Master of Science in Technology 

Major of Subject:  Communication and Systems Engineering 

Year of Entering the University:  2015  

Year of Completing the Thesis:  2017  Pages: 180 

ABSTRACT 

Cancer is a dreadful disease that had caused the death of millions of people. It is 

characterized by an uncontrollable growth of cell to form lumps or masses of tissue that are 

known as tumour. Therefore, it is a concern to all and sundry as these tumours mostly release 

hormones which have negative impact on the body system. Data mining approaches, 

statistical methods and machine learning algorithms have been proposed for effective cancer 

data classification. Artificial Neural Networks (ANN) have been used in this thesis for the 

prediction of recurrence and mortality of oral tongue cancer in patients. Similarly, ANN was 

also used to examine the diagnostic and prognostic factors. This was aimed at determining 

which of these diagnostic and prognostics factors had influence on the prediction of 

recurrence and mortality of oral tongue cancer in patients. Three different ANN have been 

applied for the learning and testing phases. The aim was to find the most effective technique. 

They are Elman, Feedforward, and Layer Recurrent neural networks techniques. Elman 

neural network was not able to make acceptable prediction of the recurrence or the mortality 

of tongue cancer based on the data. In contrast, Feedforward neural network captured the 

relationship between the prognostic factors and correctly predicted recurrence. However, it 

failed to predict the mortality based on the patient's data. Layer Recurrence neural network 

has been very effective and successfully predicted the recurrence and the mortality of oral 

tongue cancer in patients. The constructed layered recurrence neural network has been used 

to investigate the correlation between the prognostic factors. It was found that out of 11 

prognostic factors in the data sheet, it was only 5 of them that had considerable impact on the 

recurrence and mortality. These are grade, depth, budding, modified stage, and gender.  Time 

in months and disease free months were also used to train the network.  

KEYWORDS: Artificial Neural Network, Feedforward, Elman, Layer Recurrent, 

Recurrence, Mortality, Prediction, Prognostic factors, Cancer, Oral Tongue 

 
20 

 
1 INTRODUCTION 

 
Cancer is a distressing, alarming and dreadful disease. Several death cases have been 

recorded as a result of this atrocious disease.  The figure of cancer-related death makes cancer 

a concern to the medical practitioner. Cancer is one of the main causes of death in many 

developed nations. Similarly, in developing countries, the impact of cancer, as well as 

diabetes as major players in the death rate, cannot be over emphasized (Shikha & Jitendra 

2015) . It’s very important for treatment of cancer to classify tumor accurately. As with other 

diseases, proper identification, classification, and prediction are some factors to achieving 

efficient and effective treatment and management for cancer patients. Therefore, proper 

identification of cancer cells is ultimately an important step. In developed countries with 

advanced and up to date medical facilities, ineffective cancer classification methodology has 

been a major cause of death due to cancer-related condition because cancer classification was 

medically based on clinical and histopathological facts.  

 
More often than none, this classification approach often produces incomplete or misleading 

results. Thus, the need to consider other options in the classification of cancer in patients 

becomes imperative. DNA microarray, molecular level diagnostics to mention but a few 

offered a way of cancer classification (Oliver et al 2009:157; Wang et al 2005).  Although, 

both DNA microarray and molecular level provide accurate prediction and diagnosis of 

cancer. Furthermore, gene expression data generally comprise of a huge number of genes has 

been a major concern for the stakeholders in the medical discipline. Hence, a better approach 

to efficient and effective classification and prediction through other disciplines and field of 

study becomes imperative.  

 
Data mining approaches, statistical methods and machine learning algorithms to effectually 

evaluate these data have been proposed (Sung-Bae Cho & Hong-Hee 2007). As a result of 

this, support vector machine (SVM), k-nearest neighbor and neural network techniques to 

mention but a few have attracted more attention in recent years.  Specifically, the use of 

Artificial Neural Networks (ANN) in medical research is on the increase in recent years. It 

has been extensively employed in medical areas such as urology, radiology, medical 

microbiology, medical biochemistry to mention but a few. The use of ANN in other medical 

areas as well as in the analysis of patient’s data is at a geometric rate. 


21 

 
Neural network model has been successfully implemented in various classification problems. 

Neural network techniques are very useful for detection, prediction, and monitoring of 

cancer. For instance, it was recently used in the clustering and classification of gene 

expression data. There are two models used. These are the supervised and unsupervised 

models. Classification can be achieved with the supervised models while unsupervised 

models are used for clustering. Classification is crucially important for cancer diagnosis and 

treatment. The artificial neural network had been proven to be a very effective method for 

pattern recognition. This made them very useful for diagnosis of cancer disease at very early 

stages.  

Oral tongue cancer is the most common and the most aggressive epithelial cancer diagnosed 

within the oral cavity. The incidence of oral tongue cancer has increased tremendously and 

thus it has attracted attention of the clinicians and researchers. Therefore, the focus of this 

thesis is to examine the use of ANN in oral tongue cancer recurrence and mortality 

prediction. In addition, the thesis will examine ANN as a means to determine which of the 

prognostic factors are needed in the prediction of recurrence and mortality. Though, 

classification, as used in medicals includes detection, prediction, and treatment based on 

certain parameters. For the purpose of this thesis work, much attention will be given to 

prediction based on the learning outcome of MATLAB ANN using the dataset provided, that 

is, the case study. In addition, fixed and dynamic dataset generated by the supervisor based 

on certain algorithm would be tested to see the effectiveness of the code prior to testing with 

the real data, that is, the case study data.  

Furthermore, this thesis would also attempt to examine if modified_stage is a good  

prognostic factor to be considered in the prediction of recurrence and mortality. These can be 

achieved by removing certain columns from the given dataset during the training in 

MATLAB ANN to see if such column has any effect on the expected outcome.  

Conclusively, the thesis is poised to be instrumental as an approach for clinicians to 

achieving an efficient way to the prediction of the patient’s situation.  The description of the 

dataset can be found in Section 1.1 named as the dataset. The motivation gives further 

explanation to the thesis and it is worthy of note that the thesis will focus on tongue cancer 

patients. 

 
22 

 
1.1 The Dataset 

This research entails the use of data from patients. Therefore, the ethical and privacy concern 

is put into serious consideration. The use of patient data were by the National Supervisory 

Authority for Welfare and Health (VALVIRA) (Almangush et al 2013). Despite this approval, 

it is worthy of note that the identification number used in this data set for each patient has 

been coined and developed by the author of this research work. This measure is to enhance 

and protect the privacy further as this research work will be accessed publicly (online or 

printed means) through the University of Vaasa. Hence, the identification given here has no 

relationship with the identification number as contained in the original data. The diagnostic 

histological slides of 340 patients with T1/T2 N0M0 oral tongue Squamous Cell Carcinoma 

(SCC) managed between 1979 and 2009 from the University Hospitals of Helsinki, Oulu, 

Turku, Tampere, and Kuopio were collected from the hospitals’ archives. Similarly, the data 

of patients from one hospital in San Paulo in Brazil were also included. The criteria for 

inclusion of cases were as described in (Brandwein-Gensler et al 2010).  

1.2 Motivation 

Undoubtedly, early detection of any disease or cancer to be precise gives a good insight into 

the disease and ultimately, the management practicalities of such disease. In cancer detection 

and prediction, machine learning such as Artificial Neural Network is one of the methods 

being investigated. ANNs offered a unique and efficient approach to cancer prognosis due to 

ANN's ability to learn and generalize from data. The dataset as contained in Appendix III was 

used for analysis in this thesis. The data was supplied by the Dentistry Department of the 

Helsinki Teaching Hospital. The data has been named as Appendix III. Thus, this thesis is 

aimed to examine and identify the optimal cutoff point of tumor depth (column F) for risk 

stratification in T1 and T2 stages separately; and for both stages together. Furthermore, the 

thesis is poised to determine if the tumor depth is used to modify cTNM staging system 

(column K) with the help of ANN? In addition, is the modified T-stage (column G) is better in 

prognostication? Finally, this research work will aim to identify the optimal cutoff point of 

tumor budding (B) for risk stratification in T1 and T2 stages separately; and for both stages 

together. For future study, it would be ideal to test and examine the interpretation of ANNs 

for the other prognostic factor (e.g. WPOI, Grade.... etc) as contained in the data set. 


23 

 
1.3 Thesis Structure 

The thesis is organized in six chapters, Chapter one deals with the introduction of the subject 

areas of the research and the research questions. The literature review on the previous work 

on Artificial Neural Network for cancer prediction and diagnosis is examined. An overview 

of the Jeff Elman Neural Network is presented. Elman Neural Network (ENN) has been 

chosen to be the neural network in the prognostication of mobile cancer because of his ready-

made function for the medical application. The recurrent nature also made it a good choice 

for this research work. Although there are better NN than Elman such as layer recurrent that 

produces better performance than Elman.   

 
Therefore both Elman and layer recurrent will be mainly examined. All these would be 

contained in Chapter 2. The third (3) chapter presents the recent application ANN in 

medicines. Chapter 4 examines simulations of the Feedforward and Elman neural network 

approach using some arbitrary data. The data was supplied by my thesis supervisor and 

generated based on a certain algorithm. The aim of this chapter is to understand the difference 

between static and dynamic variables. The case study would be examined in Chapter 5. 

Comprehensive analysis of the data set using the ANN so as to answer the research questions 

presented in Chapter 1. Chapter six (6) involves discussion on the results obtained.  

 
The research questions presented in Chapter 1 forms the foundation of this research 

question. In addition, the future research question presented in chapter 1 will be equally 

examined. The main contribution of this thesis is in Chapter 6 where conclusions, 

recommendations, and possibility for future study are made based on the results presented in 

Chapter 5.  The Appendix I-III contains few samples of the dataset and Appendix IV 

contains some of the useful codes for the ANN.  

 
24 

 
2 FUNDAMENTALS OF ARTIFICIAL NEURAL NETWORKS 

2.1 Artificial Neural Network 

An Artificial Neural Network (ANN) is a statically oriented modeling tool. It is a similitude 

of the biological nervous system. The basic processing element in the ANN is known as the 

neurons. The neuron is not the same as the neuron in the human body but in terms of 

functionality, it works in the same manner. Hence, the name artificial neurons. It has a 

normal range of output between (-1, +1). It could also be (0, 1) (Ying et al 1998, Ferari & 

Stengel 2005). The neuron can be viewed as a processor that computes the sum of weighted 

inputs and then applies a non-linear transfer function to the computed sum. The example of 

transfer function could be Tang-sigmoid. Figure 1 shows an example of artificial neuron and 

its components.   

 
Figure 1. Artificial neuron and its components (Hassan et al 2016).  

 
From the structure of the components of artificial neurons depicted in Figure 1, it can be said 

that an artificial neuron consists of inputs and weights, a transfer function and an activation 

function (Heimes & B.van Heuveln 2005, Jayadeva et al 2002, Setino & A.Gavada 2000). 

These neurons are interconnected to each other for the purpose of working in unison to 

address a particular problem. In most fields of study, it is becoming imperative to detect 

trends and extract the patterns in some scenarios. Doing these actions with the traditional 

method, that is, through human and statistical and computer techniques are becoming 

increasingly difficult.  

 
25 

 
Therefore, ANN has come to provide a unique approach to solving the problems. This unique 

characteristic of the ANN has it to be widely used in so many applications nowadays-

engineering, medicine, statistics to mention but a few. Neural network operates in a similar 

way as an adaptive system. By that, it means that it changes its structure during the learning 

phase. ANN has been touted to be effective in modeling simple and complex relationships. 

With regards to its application in data science, it can be used to find out patterns and clusters 

in data (Spelt et al 2013).  

 
Today, ANN represents a major extension to computation. It provides better results and 

performance than the traditional statistical tools for the prediction and classification purposes 

in various applications (Paliwal M. & Kumar U.A 2009). ANNs offers short computation 

times, low computational burden and the opportunity of reformulating the problem thereby 

considering only on the important variables and parameters from the given data set or certain 

unknown areas of interest. Different types of neural networks are designed and developed for 

various applications; however, the solution offered is yet to reach 100% accuracy but the 

contribution cannot be over emphasized.   

 
Artificial Neural Networks (ANN) based expert system in tongue cancer study has been 

attracting much attention in the recent years just as the use of ANN had gained popularity 

amongst researchers in automatic breast cancer diagnosis in the past few years (Ubeyli 2007,  

Karabatak & Ince 2009, Furundzic et al 1998). Most of the current approach to the cancer 

diagnosis and treatment had been based mostly on the years of experience of the medical 

officers.  ANN is however poised to approximate complex and non-linear problems without 

having to know the mathematical representations of the system or learn from the wealth of 

experience of the medical officer.  

 
This exemplary feature has made ANN to attract attention in the study of cancer, especially in 

cancer case prediction.  Clinical sizes (T1 or T2) of early oral tongue cancer had failed to 

differentiate between patients with the possibly favourable condition and patient with the 

adverse outcome when both are given treatments respectively (Kellermann et al 2007:849-

853).  

 
26 

 
In addition, the early stage detection (T1/T2N0M0) of  tongue cancer does not always 

represent a vibrant and viable prediction of oral cancer as 20-40% already have spread to 

other areas (metastasis) at the presentation stage (Ganly et al 2012, Ho et al 1992).  

 
Therefore, the need to have an effective and efficient ways of prediction becomes imperative. 

The prediction at the early stage of oral tongue cancer is necessary because it gives the 

opportunity to identify subsets of patients that have the probability of unfavourable condition 

from the mobile tongue cancer. Thus, such patients will need more aggressive treatment. 

Modality therapy is a good approach for such category of patients. Conversely, the prediction 

result would also provide the opportunity to know the patient's subsets that have the chances 

of a favourable outcome. In such case, surgical treatment should suffice for this latter case 

(Kellermann et al 2007:849-853).  

 
Hence, to differentiate between the two cases provided above, ANN has been touted to be a 

reasonable approach. Therefore, the idea is to supply the neural network with sufficient 

training data, and subsequently find relationships between these data without requiring user 

intervention. For example, in this thesis research, the status of the patients can be inferred 

from the results obtained as  presented in Chapter 5. However, designing an ANN is a 

complex task. Various design aspects and parameters such as choosing an optimal network 

topology, suitable learning algorithm, the initial value of the weights, learning rate to mention 

but a few needs to be optimized properly to enhance the efficient performance of the ANN. It 

is imperative to mention that network topology includes the number of hidden layers and 

nodes. Also, some researches and books also considered input/feature selection (FS) as part 

of the ANN design (Walczak & Cerpa 1999).  

 
There is evidence that the FS has significantly improved ANN performance (Setiono & Liu 

1997). Similarly, the design parameters and input feature subsets of ANN needs to be 

optimized.  This is because the duo are problems that have an influence on each other.  ANN 

offers some unique features. The learning process as an important feature of ANN has given 

flexibility to ANN in terms of its application.  

 
For example, ANN could be used for data classification and pattern categorization through a 

learning process.  Neurons are arranged to form layers and connection pattern. Based on these 

arrangements, different network configurations and structures can be formed.  


27 

 
Based on these, ANN can be divided into feed-forward network and recurrent network. Feed-

forward was considered to be the first and simplest artificial neural network. By feed-forward 

network, it means that: (i) the neurons in the network can be ordered without having a 

backward connection, that is, independent of time (ii) the output does not depend on the past 

(not cyclic) (iii) movement is in the forward direction (input through hidden nodes to the 

output nodes). 

 
Figure 2. Feed-forward network structure (Tahmasebi et al 2011). 

 
The feed-forward network is characterized by neither cycles nor loops in the network.  It has 

been used extensively in classification, pattern recognition, and prediction. Conversely, the 

recurrent network contains loops and it has been used for processing tasks and control signals 

(Turkson et al 2016). An example of recurrent network is the Elman network. Figure 3 

shows the diagrammatic representation of the concept of Elman neural network.  

 
28 

 
As shown in Figure 3, Elman is made up of three-layers network. These layers could be 

arranged to form cluster of layers each containing these three-layers. They are represented as 

A, B and  C in Figure 3 respectively. The B layer is referred to as the hidden layer. The input 

are fed into the system in the forward direction through the first layer. That is why Elman 

neural network is also a variant of feedforward neural network. Apart from these three-layers, 

there are also the context units represented as K in Figure 3. 

 
Figure 3. The concept of Elman neural network .  

 
The hidden layer basically serves two functions. It connects to the context layer with a weight 

of one. Additionally, a copy of the previous value of the hidden units is stored in the context 


29 

 
units through back connections. This memory characteristic allows the network to remember 

the previously hidden layer states.  

 
When the inputs are feed-forwarded into the system, an appropriate training algorithm would 

be applied and thus, the learning process could be completed. Where the value (B* W1) is 

between the first and second layer and (C * W1) is the value fed into the third layer 

respectively (Elman 1990). 

 
Due to the complicated design issue and coupled with the requirement for the FS, there is an 

increasing suggestion of hybridizing the ANN design with the evolutionary algorithm.  

For instance, design methods such as trial-and-error, cannot simultaneously handle many 

design parameters and FS as well. Hence, the need for hybridization with evolutionary 

algorithm becomes imperative. Genetic Algorithm (GA) has global search features that makes 

it to be particularly preferred. GA is capable of generating both optimal feature subset and 

support vector machine (SVM) parameter without degrading the accuracy of the machine 

(Rui et al 2005; Huang & Wang 2006).  

 
Hence, there have been several efforts that combine GA with ANN. For instance, GA was 

used to search for the architecture, learning algorithm and nodes’ activation function 

(Ferentinos 2005), while the search for learning algorithm together with their parameters, 

hidden layer information, transfer function, weights and biases values was also an option 

(Almeida & Ludermir 2010). Additionally, a hybrid technique that combined fuzzy 

clustering, statistical tool and granular computing have been proposed (Yuchun et al 2008).  A 

comprehensive review on combining the evolutionary algorithm with the ANN can be found 

in (Yao 1999). Promising results were obtained from combining evolutionary algorithm with 

ANN. Therefore, this thesis is aimed to further explore the ANN for mobile (oral) cancer 

prognosis. It employs the given dataset for FS, initial weight and hidden node size 

optimization of the most common ANN architecture.  The dataset (Appendices I-III) were 

divided into training, validation and testing datasets respectively for the simulation exercises 

presented in Chapter 4 and Chapter 5 of this thesis write-up. 

 
30 

 
2.2 Classification of ANN 

ANN can broadly be classified as either feed-forward and recurrent ANN respectively. An 

example of the feed-forward network includes: 

I. Single layer feed-forward network. 

II. Multilayer feed- forward network. 

III. A Single node with its own feedback.  

 
Similarly, an example of recurrent layer includes: 

I. Multilayer recurrent network  

 
A single layer feed-forward structure is a simple perceptron. The schematic diagram is as shown in 

Figure  

 
Figure 4. Single layer feed-forward network. 

 
31 

 
A Single-layer feed-forward network as the name implies has one input layer, one output 

layer, and no feedback connections. Inputs are applied to the network and with the aid of 

series of weights, and subsequently to the outputs. Inputs are multiplied by the weights in 

each node and compare with a threshold as shown in Figure 7 below.  

 
The leverage for comparison is that if the value obtained from the product of inputs and 

corresponding weights are above some threshold (typically 0) the neuron fires and takes the 

activated value (typically 1); otherwise it takes the deactivated value (typically -1). Neurons 

that exhibit such kind of behavior are called artificial neurons or linear threshold units. In the 

literature, the term perceptron often refers to networks consisting of just one of these units. A 

similar neuron was described by Warren McCulloch and Walter Pitts in the 1940s. 

A perceptron can be created using any values for the activated and deactivated states as long 

as the threshold value lies between the two. Most perceptrons have outputs of 1 or -1 with a 

threshold of 0. Multi-layer neural network on the other is characterized by the fact that it can 

calculate continuous output instead of a step function.  Sigmoid function or logistic function 

provides a common choice for the multi-layer neural network. The Sigmoid function is given 

by: 

        
Furthermore, the fact that sigmoid function has a continuous derivative has made it a 

preferred way in ANN as it can be used extensively in back-propagation.  

 
Interestingly, the derivative of the function can be easily calculated as depicted in the 

equation above. Multilayer recurrent neural network or simply recurrent neural 

network (RNN) is also examples of ANN. Multi-layer perception (MLP) is made up of two or 

more layers of neurons that are connected sequentially. The connection between neurons in 

the different layer is by weighted signal pathways.  

 
https://en.wikipedia.org/wiki/Artificial_neurons
https://en.wikipedia.org/wiki/Perceptron
https://en.wikipedia.org/wiki/Warren_McCulloch
https://en.wikipedia.org/wiki/Walter_Pitts


32 

 
Signals are sent through these pathways to the other neurons. The Input layer is the first layer 

of a network. It receives signals from the data entering the network. The last layer, called the 

output layer, generates the outcome to the outside world.  

 
Figure 5. The structure of multi-layer perceptron. 

 
Connections between various units of the network are in a fashioned and directed cycle. This 

fashioned and directed style gives it a dynamic temporal behaviour. Also, their internal 

memory can also be used to process arbitrary sequences of inputs. Thereby making RNN to 

be tremendously used in so many applications. Similarly, it uses back-propagation algorithm 

as well. However, it is worthy of note that back propagation is mainly used for networks that 

have activation functions that are differentiable. In addition, there are some issues that are 

associated with back-propagation. These include speed of convergence, over fitting and the 

possibility of ending up in a local minimum of the error function.  

 
33 

 
2.3 Training of ANN 

 
Training or learning as it is otherwise called.  ANN learning processes have been divided into three 

namely: 

 
I. Supervised learning  

II. Unsupervised learning  

III. Reinforcement learning  

 
When ANN is trained in the presence of an instructor, teacher or someone that is more 

knowledgeable about how ANN works, then such training process is termed as supervised 

learning process of ANN (Hu et al 1994).  This type of training minimizes the possibility of 

error in the training process. This is because it is assumed that the pre determined target 

outputs values are known for each input pattern. Back propagation, time delay, multiple 

adaptive linear neurons to mention but a few are all examples of supervised neural networks.  

Similarly, unsupervised training is characterized by the fact that it eschews the output 

knowledge from the instructor (Hu et al 1992;1994).  In this case, the network finds the 

relationship between the inputs and the output by itself. Kohonen Self-Organizing Feature 

Maps and Learning Vector Quantization (LVQ) are examples of unsupervised training. 

Reinforcement learning (RL) is an area of machine learning inspired by behaviourist 

psychology. RL is beyond the scope of this thesis. By definition, training describes the 

procedure by which the parameters are tuned or adjusted in such way that makes the neural 

network to adapt itself to a stimulus. This parameter tuning consequently produced the 

desired output. The desired output is mostly compared with the expected output to see how 

effective the network had learned during the training period.  

 
https://en.wikipedia.org/wiki/Machine_learning
https://en.wikipedia.org/wiki/Behaviorism
https://en.wikipedia.org/wiki/Behaviorism


34 

 
In general, neural network uses some internal calculations to computes output values from 

input values as shown in Figure 6 (Delgrange et al 1998). The values of the weights are 

generally adjusted between the inputs, expected output and the target until the network 

produces meaningful results with the targets. Such output is known as the trained output or 

neural output. Thus, for any given set of inputs, the trained network can predict the correct 

outputs.  Traditionally, not all the dataset are used for training to enhance the effectiveness of 

the output and most importantly, minimize errors between the expected output and the target 

output.  

 
Figure 6.  Neural Network training structure (Delgrange et al 1998) 

 
Therefore, datasets are generally divided into training, validation and testing datasets 

respectively. Training dataset is essentially used to compute the gradient. Also, it is used to 

for updating the network weights and biases. The validation datasets are used for validation 

purpose. The training phase equally monitors the errors during the validation process.   

However, the testing datasets are not used at all in the training process. The testing dataset are 

used mainly for comparison purposes. The testing dataset provides important measures on 

how well the trained network has learned in the training phase. 

http://www.sciencedirect.com/science/article/pii/S1110016812000518


35 

 
2.4 Training Algorithm 

 
Since the thesis would examine both feed-forward neural network and the recurrent neural 

network, it is pertinent to examine some of the training algorithms used. Some of the training 

algorithms are given in Table 1.  

 
Table 1. Training Algorithms of ANN. 

 
The training algorithms are numerous and each neural network types has a default training 

algorithm that is appropriate for it. A few of the training algorithms are presented in Table 1. 

The most widely used algorithms are Levenberg-Marquardt and Quasi-Newton methods 

because they are very fast and produce exceptional computational errors. Both are mostly 

used for datasets that are not much. For datasets that are large, Scaled Conjugate Gradient 

and Resilient Backpropagation are mostly the preferred options for training.   

 
Training Function      Algorithms 

trainlm Levenberg-Marquardt 

trainbr Bayesian Regularization 

trainbfg BFGS Quasi-Newton 

traingdm Gradient Descent with Momentum 

traingd Gradient Descent 

trainoss One Step Secant 


36 

 
However, the default training method for feedforward network is the Levenberg-Marquardt. 

It is worthy of mention that the term backpropagation refers to gradient descent algorithm for 

the training of neural network (Demuth et al 2007).  

 
Figure 7. Single node anatomy of ANN. 

 
As pointed out earlier, neuron or node as it can also be called formed the basic component of 

ANN. Figure 7 showed the anatomy of a single node ANN. Where the inputs are 

a1, a2 and an, and the output by Oj. As shown in Figure 7, the node is the summing point. The 

node can accept more inputs than the ones shown in Figure 7. The function of the node is to 

manipulate the inputs to give a single output signal. The values W1j, W2j, and Wnj, are weight 

factors associated with the inputs to the node. Weights are adaptive coefficients within the 

network that determine the intensity of the input signal. Each input (a1, a2, … , an) is 

multiplied by its corresponding weight factor (W1j, W2j, … , Wnj), and the node uses 

summation of these weighted inputs (W1j * a1, W2j * a2, … , Wnj * an)  to estimate an output 

signal using a transfer function. 

 
http://www.sciencedirect.com/science/article/pii/S1110016812000518


37 

 
The other input to the node, bj, is the node’s internal threshold, also called bias. This is a 

randomly chosen value that governs the node’s net input through the following equation: 

 
Node’s output is determined using transfer function on the node’s net input. Sigmoid, 

hyperbolic tangents and linear transfer functions can be effectively used. The transfer 

function can transform the node’s net input in a linear or non-linear manner.  

 
Sigmoid Transfer Function 

 
Hyperbolic Tangent Transfer Function 

 
Linear Transfer Function 

 
The neuron’s output Oj is obtained based on any of the afore-mentioned transfer functions on 

the neuron’s net input uj. Hence, the equations above are transfer function equations that can 

be applied to the net input Uj to give the neurons’ output Oj. 

 
38 

 
2.5 Advantages and Disadvantages of ANN 

 
Advantages of ANN:  

 
I. Adaptive learning: Sequel to the learning exercise for the network using appropriate 

training algorithm, the network can perform the task based on the data given for 

training.  

II. Self-organization: After receiving the information in learning time an ANN can create 

its own organization. 

III. Real-time operation: Many neural network computations can be carried out parallel. 

Specific hardware devices are being designed to take benefit of this ability of neural 

networks.  

IV. Fault-tolerance via redundant information coding: Partial damage of a neural network 

structures lead to the degradation of performance. Though, some network abilities 

may be recollected even after major network damage.  

 
Disadvantages of ANN:  

 
I. Size and Complexity: Neural networks size and complexity is very high. 
 

39 

 
3 APPLICATION OF ANN IN MEDICINE 

 
3.1 Artificial Neural Network in Medicine  

The advancements in the field information and communication technology (ICT) have always 

been at a geometric rate. These advancements have been felt positively in other areas of 

endeavours. Medicine is not an exception. The tremendous development of ICT had 

contributed immensely to medicine through the development of powerful tools such as lasers, 

ultrasonic and so on that could aid medical treatments. Other areas that ICT have contributed 

to medicine are in data analysis and machine learning. For instance, Artificial Intelligence 

(AI) has contributed immensely to medicine and biological research. ANNs are an interesting 

and extensively studied branch of AI. It has been touted as a promising research area and it is 

opined by researchers in the field of machine learning and data science that ANNs would 

have extensive application to various biomedical problems in the future. Presently, it has 

gained the needed audience and attention in medicine as it was successfully applied to 

medical areas such as diagnostic systems, biochemical analysis, image analysis, and drug 

development (Konstantina 2017). This thesis will look at the application of ANN from the 

prediction of patient's situation point of view. 

ANNs have been extensively applied in diagnosis, electronic signal analysis, medical image 

analysis, and radiology. ANNs is aimed to assist the doctors to detect the complex nonlinear 

relationships between dependent and independent variables in the patient's data. The neural 

network is able to learn, capture, draw inferences and establish a relationship from the 

provided data. This is always produced as an output of the learning process. Therefore, 

trained ANNs are the digitized model of the biological brain. Nowadays, ANNs are widely 

used for medical applications in various disciplines of medicine especially in cancer 

treatment, cardiology and so on.  

 
40 

 
Also, it has been used extensively in diagnostic systems because the ANN is not affected by 

other factors such as stress, fatigue, working conditions, emotional states, and equipment 

error and so on that could affect the traditional diagnostic procedures. The network can easily 

be trained and the trained network can produce an output that demonstrates that the network 

understood the relationship between the variables contained in the dataset given. 

Furthermore, ANNs have also found its application in the biochemical analysis where it has 

been widely used to track the glucose levels in diabetic’s patients. ANN is also capable of 

detecting pathological conditions such as tuberculosis. Image analysis is nowadays a core 

aspect of medicine. The need for proper image analysis cannot be over-emphasized.  

Thus, ANNs have assisted in tumour detection and classification of chest X-rays. The results 

produced through the application of ANNs to medicine have been promising so far. Drug 

development, modeling, clinical research, pharmacoepidemiology, and medical data mining 

are some other medical areas where ANNs have been extensively used.  It is important to 

mention that the high computation rates of ANNs had also contributed to its acceptance in 

medicine. Hence, paving way for ANN to be applied also in telemedicine. Having highlighted 

the importance of ANN in medicine, it is therefore important to ask if ANN can replace 

human experts? The answer is in negative- NO. ANN is thus a tool that is poised to help the 

doctors and researchers in the medical field. Finally, ANNs would assist the doctors in the 

screening process and ultimately to double-check and confirm their diagnosis. 

 
41 

 
3.2 Types of Neural Networks used in this research 

3.2.1 Feed-Forward  Neural Network (feedforwardnet) 

Feedforward Neural Network, also known as feedforwardnet consists of layers. The network 

input layer connects the first layer. The output is produced through the last layer. The 

intermediate layers are connected in such a way that each intermediate layer has a connection 

from the previous layer. It is specifically used to map out the relationship between input-

output. In the simplest form, the network has one hidden layer and numerous neurons. In 

terms of the syntax, feedforwardnet is given by: 

 
  Table 2. Feedforwardnet parameters. 

 
hiddenSizes Row vector of one or more hidden layer sizes (default = 10) 

trainFcn Training function (default = 'trainlm') 

 
As pointed out, feedforwardnet maps input-output relationships. When more functionalities 

are required, specialized versions such as fitnet and pattern recognition are good choices.  


42 

 
Similarly, cascade feedfoward neural network (cascadefeedforwardnet) offers unique 

functionality as it connects the input layer to all other layers. Hence, fully established 

connections between layers are employed in cascadefeedforwardnet. 

3.2.2 Elman Neural Network (elmannet) 

 
The most widely cited example of feedforward network, also known as feedforwardnet is the 

Elman Neural Network (ENN) (Elman 1990). It was characterized by the fact that it has local 

memory and feedback connections. It was J.L Elman that first proposed it in 1990. It is also 

back propagation neural networks and a two-layer neural network. It basically consists of the 

input layer, hidden layer, and output layer respectively. The feedback connection is usually 

from the output of the hidden layer to its input as shown in Figure 8. It has a ready-made 

function called elmannet in MATLAB. 

 
 Figure 8. Block Diagram of the Elman Neural Network (Kannathal 2006). 

 
The feedback connection ensures that Elman networks learn effectively. In addition, temporal 

and spatial patterns are easy to recognize and generate with the help of the feedback 

connection.   


43 

 
Mathematically, the algorithm of Elman neural network was presented in the equation below. 

However, it is important to mention that ENN uses staticderiv which was not a full dynamic 

derivative. The final output of the trained network is usually compared with the expected 

output to see how well and effective the network learned.  Similarly, it is structurally 

represented as shown in Figure 9. 

 
Figure 9. Structural representation of Elman neural network .  

 
The structure of ENN was presented in Figure 9. It can be deduced from this figure that it 

has four nodes. The connections of input, hidden and output nodes are similar to the feed-

forward network.  As discussed in Chapter 2, the structure of Elman neural network consist 

of input nodes, hidden nodes, context nodes and  output nodes respectively. The structure 

also includes the weights.  

Wc represents the weight between context and hidden nodes. 

Wij denotes the weight between input and hidden nodes. 

Wjk is the weight between hidden and output. 

X1......Xn represents the inputs. Ʈj and Ʈk are activation functions (Hyperbolic tangents). 

Uj is the hidden output. Xc (k) represent the context layer. 

Ѳj and Ѳk biases in the hidden and output layers.  

Z(k) represent the output 


44 

 
The feedback characteristic is a unique feature of Elman network and it basically utilizes the 

context node to memorize and return the hidden layer's output values. This essential feature 

makes Elman network to be sensitive and suitable for the learning purpose and also in the 

analysis of time series data and historical data respectively.  

 
In terms of the syntax, Elman network is given as follows: 

 
Table 3. Syntax parameters of Elman network. 

 
layerdelays Row vector of increasing 0 or positive delays (default = 1:2) 

hiddenSizes Row vector of one or more hidden layer sizes (default = 10) 

trainFcn Training function (default = 'trainlm') 

 
Based on the syntax and the structural representation of Elman neural network, the output of 

each nodes can be mathematically modelled as follows: 

 
Xc(k) =  αXc (k-1) + U(j-1)         (1) 

 
Uj =  f [ (Wc * Xc(k)) + (Wij * Xn) ]       (2) 

 
y(k) = g [(Wjk * Uj)]         (3) 

 
Equations I-3 above represents the outputs of context, hidden and the final outputs 

respectively.  


45 

 
Additionally,  f()  and g() in the equation above are the linear and nonlinear output functions 

of the output nodes and hidden nodes respectively. Equations (I-3) can be modified further in 

such a way that when the input vectors are mapped to set of hidden nodes through an 

activation function Ʈj , the mathematical representation is as shown in equation 4.  

     
Since Elman has context layer, it means that there will be a delayed hidden variables 

represented as Uj' from the prior training iteration. Thus, the result of the mathematical 

modification is as shown above. In the same way, the output layer can be modified further 

through similar procedure as explained for the hidden layer. In this case, the activation 

function is given as Ʈk. Finally, the mapping relationship between hidden and output layer is 

represented as yk and given in equation 5 as : 

 
With the introduction of full dynamic derivative calculations that uses the concept of  fpderiv 

and bttderiv, time delay neural networks (timedelaynet), layer recurrent neural network 

(layrecnet), nonlinear autoregressive neural (NARXNET) and nonlinear autoregressive neural 

network with external inputs (NARXNET) are now the preferred networks because they 

produced better error performance than Elman neural network. These networks would be 

examined in Chapter 4.   

 
46 

 
3.2.3 Time delay neural network (timedelaynet) 

 
Time delay neural network, TDNN was first developed in the 1980s (Waibel et al 1989). It is 

an artificial neural network that is characterized by two special layers. These are the hidden 

layer and output layer. In TDNN, the nodes are connected fully by direct connections as 

shown in Figure 10. 

 
Figure 10. Single time delay neuron (TDN) with inputs and delays at the time (t) (Hongying 

et al 2016). 
 

The nodes of hidden layer and output layer are time-delay neurons (TDNs). The inputs are 

multiple inputs, time series with time step (t). The inputs are grouped as M inputs but 

explicitly it is made up of I
1
(t) until I

M
 (t). Each of the explicit inputs has a bias value 

represented as bi.  

 
47 

 
Similarly, as shown in Figure 10, TDNN also has N delays explicitly as D
i
1 till D

i
N. Since it is  

the time delay, it is capable of storing previous inputs I
i
(t-d) where d varies from 1 to N. 

Also, N is the independent unknown weights represented as w
i
1...w

i
N.  F is the transfer 

function of f(x). The output O(t) is represented by the equation below: 

 
From the output equation above, the overall outcome of the neurons could be considered to 

be dependent on the current time (t) and also the previous time steps (t-d).  

 
Figure 11. The Architecture of TDNN neural network (Hongying et al 2016). 

 
Where: 

 
Wid

j
 weight of the hidden node H

j
. 

Wjd
r
 weight of the output node O

r
 

bi
j 
and ci

r
 are biases. 

N1 number of delays for output layer. 

N2 number of delays for hidden layer. 

 
48 

 
Having understood the single TDN it becomes easier to model dynamic nonlinear 

characteristics of series inputs. This forms the basic building block of TDNN. In terms of 

architecture, TDNN has hidden layer with J TDNs and conversely an output layer with R 

number of TDNs that are fully connected as shown in Figure 11. 

 
TDNN can be trained using Levenberg-Marquardt algorithm (Levenberg 1944; Marquardt 

1963). Levenberg-Marquardt is an example of the traditional feedforward-feedback network. 

The training process of Levenberg-Marquardt optimizes the weights through iterations. Input 

time series (X)  and known labels Y(t) are  iterated for t = 1,....T, given that T is the length of 

the sequence. 

 
3.3 Layer Recurrent Neural Network (layrecnet) 

 
Layer recurrent neural networks (LRNN) are dynamic and artificial neural network. In this 

network, there are connections between units, and these connections are in a directed cycle 

manner. Hence, the name - dynamic neural network.  LRNN is similar to feedforward 

network but differs in the sense that each layer has a recurrent connection with a tap delay 

associated with it. This is an important feature that makes LRNN to have an infinite dynamic 

response to time series input data.  

 
49 

 
However, when a finite input responses are desired,  time delay (timedelaynet) and 

distributed delay (distdelaynet) neural networks are the neural networks of choice. 

In terms of the syntax of LRNN, it is given by: 

 
Where:  

 
Layrecnet (layer recurrent network) takes the following arguments as shown in Table 4. 

 
Table 4. Layrecnet parameters 

 
layerDelays Row vector of increasing 0 or positive delays (default = 1:2) 

hiddenSizes Row vector of one or more hidden layer sizes (default = 10) 

trainFcn Training function (default = 'trainlm') 

 
With the above parameters, the layrecnet function returns a layer recurrent neural network. 

Though ELMAN is a simplified form of LRNN, LRNN is characterized by the fact that there 

is a feedback loop, with a single delay, around each layer of the network except for the last 

layer as shown in Figure 12. 

 
https://se.mathworks.com/help/nnet/ref/timedelaynet.html
https://se.mathworks.com/help/nnet/ref/distdelaynet.html


50 

 
There are different types of LRNN such as fully recurrent network, recursive neural 

networks, Hopfield Network, Elman and Jordan Neural Networks, continuous-time RNN, Bi-

directional RNN to mention but a few. 

 
Figure 12. Layer recurrent network architecture (MATLAB 2017). 

 
3.3.1 Fully Recurrent Neural Network 

 
The Fully recurrent network is a network of neuron-like units developed in the 1980s. Each 

of the units has a directed connection to every other unit. Each connection has a real-valued 

weight that is modifiable. Similarly, each unit is characterised by a time varying real-valued 

activation.  

 
51 

 
3.3.2 Hopfield Neural Network (HNN) 

 
It is a form of recurrent artificial neural network invented in 1982. It was named after the 

inventor, John Hopfield.  It has an essential feature that it guarantees that all its dynamics will 

converge, that is, to the local minimum. Albeit, it sometimes converges to false local 

minimum. It is not used for sequence of patterns and this makes it to be a unique neural 

network. All connections are symmetry therefore, it is designed specifically to require 

stationary inputs as shown in Figure 13.   

 
Figure 13. A four nodes Hopfield neural network (Hopfield 1982) 

 
Hence, connections in Hopfield network have two restrictions. These are (i) no unit has a 

connection with itself (ii) connections must be symmetry. It has few variations such as 

bidirectional associative memory (BAM). Hopfield neural network basically takes two values 

for their states, that is, 1 and -1. These values are determined by whether the inputs exceed 

the threshold or not. If the inputs exceed the threshold, it is +1, but if it is within the 

threshold, it is -1. However, 0 and 1 values are used in some literature.  


52 

 
Updating one unit in the HNN is based on the under listed rules: 

 
Given that: 

wij is the weight of the connection from unit j to unit i. 

Sj state of unit j. 

Ѳi is the threshold of  unit i. 

 
HNN units can be updated either through synchronous and asynchronous means. By 

synchronous, it means that all the units are updated at a time. This method has a disadvantage 

that it is not effective. Similarly, in asynchronous, only one unit is updated at a time. The unit 

to be updated can be randomly picked or in a pre-defined order. With regards to the learning 

rules in HNN, it can either be local or incremental learning rules respectively.  Learning or 

training rule is considered local in HNN if each weight is updated based on the information 

available to neurons on either side of the connection associated with a particular weight. 

Conversely, in incremental learning rules, the new pattern can be learned without using 

information from the previous patterns. 

 
53 

 
3.3.3 Recursive Neural Network  

 
As the name implies, same sets of weights are recursively applied over a structure. The 

architecture of a simple recursive neural network is as shown in Figure 14.  

 
Figure 14.  An architecture of recursive neural network (Hammer et al 2004).  

 
The RNN was trained using one of the widely used algorithms like scaled conjugate gradient. 

The gradient is calculated using backpropagation through structure (BPTS) which is a family 

of backpropagation through time (BPTT) used in recurrent neural networks (Goller and 

Kuchler 1996). In addition, recurrent and recursive neural network differs. In recurrent neural 

network, the hidden representation and previous time step are combined to produce a unique 

representation of the current time step. Also, the chain of the recurrent neural network is 

linear. Recursive neural network, on the other hand, operates on the hierarchical structure 

where parents representation arises when the child representations are combined (Hammer et 

al 2004).  

 
https://en.wikipedia.org/wiki/Recursion


54 

 
3.4 Nonlinear autoregressive neural network (NARNET) 

 
NARNET is used to make predictions of a time series based on that particular series past 

values (Nyanteh et al 2013 and Lopez 2012). In MATLAB, nonlinear autoregressive neural 

network has a function known as narnet. It takes the following arguments: 

 
narnet(feedbackDelays,hiddenSizes,trainFcn)   

 
Table 5. Parameters for narnet 

 
feedbackDelays Row vector of increasing 0 or positive delays (default = 1:2) 

hiddenSizes Row vector of one or more hidden layer sizes (default = 10) 

trainFcn Training function (default = 'trainlm') 

 
Modelling time series using linear model is more often than not a difficult task. This is 

because time series applications have high variations and fast transient durations. Therefore, 

the need for a better model for such applications becomes imperative. NARNET has been 

touted to handle some of the complications that are peculiar to time series data.  NARNET is 

given by: 

 
The equation above explains how predictions could be made using NARNET. For example, 

predictions can be made using past values of the data series (Ibrahim et al 2016). Where y(t) 

is the predicted value, the function known as h (previous values) is not known but the past 

values of y(t) can make the prediction to be realizable.  

 
55 

 
In the world of neural network, with network training, the function h() can be achieved by 

means of the weights and bias optimizations respectively. Where ɛ(t) denotes the error of the 

approximation of the series.  The structure of the NARNET is as shown in Figure 15. 

 
Figure 15. Nonlinear autoregressive network (NARNET) (Luiz Gonzaga et al 2016) 

 
The past values, that is, the p values y(t-1).....y(t-p) are known as the feedback delays. To 

obtain the network topology that can provide the best performance, several factors need to be 

considered. Example of such factors include: the training algorithm, number of hidden layers 

and neurons to mention but a few. Usually, adjusting the number of hidden neurons or 

changing the training algorithm normally gives better performance. The number of hidden 

layers and training algorithm are adjusted and varied through trial-and-error means. It is 

nevertheless important to understand that the complexity of the system has a direct 

proportionality to the number of neurons. As increasing the number of neurons increases, the 

network also becomes more complex.  

 
56 

 
Although, an increased number of neurons gives better generalization efficiency and the 

speed of computation of  the network. NARNET is based on Levenberg-Marquardt 

propagation procedure (LMBP) algorithm (Alwakeel & Shaaban 2010; Marquardt 1963; 

Hagan et al 1996).  

 
It is the default because it is the fastest type of backpropagation algorithm. In addition, the 

training duration is very fast and it uses the second-order derivative, hence, there is no need 

to compute Hessian Matrix. Instead, it uses the Jacobian Matrix for calculation. NARNET 

uses either Mean Square Error (MSE) or Error Sum of Squares (SSE) and both are given by: 

 
Where: 

 
yi is i-th data sample. 

ý approximated data obtained by the network for the value yi. 

n is the number of the data sample. 

 
In this thesis work, layer recurrent neural network would be used to predict of recurrence and 

mortality of tongue cancer in the patients. Both NARNET and NARXNET can be considered 

as a recommendation for future work. 

 
57 

 
3.5 Nonlinear Autoregressive Neural Network with external input (narxnet) 

 
This is similar to NARNET but differs with the addition of external input. It is also a 

nonlinear model that predicts the future values based on the past values and also an 

exogenous or external data supplied to the network. In terms of syntax, it is given by: 

 
Where: 

 
Table 6. NARXNET parameters 

 
InputDelays Row vector of increasing 0 or positive delays (default = 1:2) 

feedbackDelays Row vector of increasing 0 or positive delays (default = 1:2) 

hiddenSizes Row vector of one or more hidden layer sizes (default= 10) 

trainFcn Training function (default = 'trainlm') 

 
In NARX network, known as NARXNET, the network is able to predict series y(t) given the 

past values of series y and another external series x(t) as shown in the equation below: 

 
58 

 
The external or exogenous series could be single or multidimensional. The architecture for 

NARXNET is as shown in Figure  

 
Figure 16. The architecture of nonlinear regressive network with external inputs (Luiz 

Gonzaga et al 2016). 

 
NARXNET AND NARNET are quite similar but differs only with the addition of external 

inputs in the case of NARXNET. The training of NARXNET equally uses LMBP. 

NARXNET produces better performance than NARNET but the complexity nature of 

NARXNET has made NARNET to be widely preferred (Safavieh et al 2007). Therefore, in 

this thesis work, both methods will be used in the analysis of the dataset. 

 
59 

 
4 ANN SIMULATION OF FIXED AND DYNAMIC DATASETS 

This chapter is aimed at examining the performance analysis, error estimation and most 

importantly to have a clear understanding of some of the neural network types discussed in 

Chapter 3. This is in preparation for the neural networks for the real data, that is, the case 

study to be examined in Chapter 5 of this thesis.  The dataset used in this chapter was 

provided by my supervisor, Professor Elmusrati. The dataset has been generated by a certain 

algorithm known to him. My main task is to test the dataset with the enumerated neural 

network types discussed in Chapter 3. 

The datasets were sent in two batches. The first batch of dataset was fixed dataset, while the 

second batch was dynamic dataset. The datasets were sent in excel formats (xlxs) and the 

datasets can be found as Appendixes I and II respectively in the appendix section of this 

thesis write up.  By fixed dataset, it means that the values within the rows and columns of the 

dataset are related by a direct equation without the need for any past or future values. 

Conversely, dynamic datasets were obtained by establishing a relationship with either the 

past or future value or both as the case may be. In both cases, especially in the dynamic 

datasets, it is possible that the values of the columns and rows change each time that the 

files/documents are opened. Thereby giving false results. To avoid this error, the file was 

opened once and it was imported directly into MATLAB workspace. 

Therefore, Chapter 4 is poised to look at the neural network with regards to dynamic 

datasets only. Finally, in this chapter, the MATLAB code, various plots of performance 

analysis, regression plots, and expected and trained values plots will be shown without 

interpreting the plots. This is because the main work of this thesis is in Chapter 5 and that is 

the main dataset of interest. All the analysis and explanations of each plot will be explained 

in Chapter 5. 

 
60 

 
4.1 Simulation exercise with Feedforward Neural Network 

Using the dynamic dataset as contained in Appendix II, the following commands were issued 

to MATLAB as shown in Figure 17.  

 
Figure 17. Feedforwardnet MATLAB code window. 

From Figure 17, the performance error was very small. The learning was thus successful. 

 
Figure 18. Neural network training output 

It is worthy of note that some of the code might be missing from Figure 17. The output 

shown in Figure 17 is a truncated output.  


61 

 
Similarly, Figure 18 showed the neural network training output.  

 
Figure 19. Performance error plot for feedforwardnet. 

The network was trained using Levenberg-Marquardt algorithm (trainlm) and the error from 

the training was calculated using Mean Squared Error. 

 
Figure 20. Target outputs and the neural outputs 

 
62 

 
The dataset was divided randomly using dividerand output. Hence, no need to manually divide the 

data into training, validation, and testing. However, the network can be trained manually using the 

tr.trainInd, tr.valInd and tr.testInd. These functions can be used to divide the data into training, 

validation and testing respectively. The plot from the training is as shown in Figure 19.  

 
Figure 21. Error histogram of the targets and the neural outputs. 

 
The variation between the target output and the neural outputs is as shown in Figure 20. 

Following Figure 20, the difference between the target and the neural output gives the error 

as shown in error histogram of Figure 21. The network training state is shown in Figure 22.  

It is a plot that shows the gradient, validation check, and epoch.  The epoch gives the iteration 

level at which the network validation performance reached the minimum. In this case, as 

shown in Figure 22, the network validation reached the minimum at about 1000 epoch. 

 
63 

 
Figure 22. Training state of the network. 

 
It is important to mention that it is possible to have errors after the training and the results might not 

be as expected. To improve the results in this case, it is always a good practice, to initialize the 

network again and the training can be performed again.  

This is because, each time the network is initialized especially a feedforwardnet, the network 

parameters are different and thus the results might be different on each occasion. Also, the number of 

hidden neurons can be increased. It is important to mention that the number of hidden layer neurons 

should not be unnecessarily large to avoid under-characterization issues. Finally, the training 

algorithm can be changed may be from Levenberg-Marquardt to Bayesian regularization training.  

 
64 

 
Regression plot is shown in Figure 23. It shows the relationship between the variables that made up 

of the inputs and most importantly, how the inputs and outputs are related. The dashed line in the plot 

indicates how the expected output (target output) relates with the trained outputs (neural outputs).  

 
Figure 23. Regression analysis of the network training. 

 
While the solid line indicated the linear regression between outputs and targets. The 

regression value, R, as shown in Figure 23 for the training, testing and validation had the 

value of 1.  

 
65 

 
Hence, it is an indication of an exact linear relationship between the targets and the neural 

outputs. Finally, a simple plot of the expected or target output is shown against the neural 

output in Figure 24.  It can be seen from Figure 24 that there is no much difference between 

the target output and the expected output plots. It is quite difficult to observe any difference 

in the plots.  

 
Figure 24. Plot of Target and Neural Outputs of feedforward network 

 
This is because the network effectively learned the relationship between the inputs and the 

output. Consequently, the performance error was quite insignificant as shown in Figure 17. 

Similarly, the regression plot of Figure 23 was also a strong indication to the fact that the 

network had effectively learned the relationship between the input variables and the output. 

 
66 

 
4.2 Simulation using Elman Neural Network 

From the dataset contained in Appendix II, the network performance of the Elman neural 

network can be examined. Unlike the feedforward neural network that uses the function 

feedforwardnet, Elman neural network uses the elmannet function with the performance 

evaluation done by another function called preparets. Though Elman neural network is also a 

variant of feedforwardnet, the difference lies in the training function. The dataset was 

randomly divided using dividerand but the default training function for elmannet was 

Gradient Descent With Momentum and Adaptive LR using the traingdx function. 

 
Figure 25. Elman neural network command window  

MATLAB Command window of Elman neural network is as shown in Figure 25. It was 

observed that this function did not train the network effectively, thus, I changed the training 

function to Levenberg-Marquardt function of trainlm.  

 
67 

 
Furthermore, the number of hidden neurons was increased from 10 to 20 to ensure a better 

performance as shown in Figure 26. 

 
Figure 26. Training window for Elman neural network. 

 
The truncated outputs of the target and the trained output, that is , neural output is shown in 

Figure 27.  Despite the change in the training algorithm and also increasing the number of 

hidden neurons, Elman neural network did not learn the relationship between the inputs and 

the outputs effectively. This is because Elman neural network uses simplified derivatives 

calculations known as staticderiv.  

 
68 

 
This simplified derivative calculations used by Elman is known to ignore delayed 

connections. Thus, the learning was not as efficient as the feedforwardnet discussed in 

Section 4.1.  

 
Figure 27. Target and Neural outputs of the Elman neural network.  

Therefore, the need to check other network becomes imperative. The advent of  full 

derivative calculations such as fpderiv and bttderiv, gives the researchers wide range of 

neural network to choose from to enhance better performance. Based on the failure of Elman 

neural network to performed the learning effectively, it is important to examine another 

network such as timedelay, layrecnet, narnet, narxnet for better training and efficient 

prediction of inputs-outputs relationships. These will be shown in subsequent Sections within 

Chapter 4 of this thesis. To probe further on why Elman did not learn as expected, the 

network performance of the network is shown in Figure 28. This plot shows the performance 

of the data when divided into training, validation and testing.   


69 

 
Mean Squared Error was used in the calculation of the training performance and as shown in 

Figure 28; it was at 6 epoch iterations where the best fit occurred at 4th iterations.   

 
Figure 28. Training performance of Elman neural network. 

The regression plot of Figure 29 clearly showed that there exists some level of relationships 

between the training, testing and validation data respectively. The regression value, R was not 

exactly 1 but closer to 1, which means that there is a relationship between the inputs and the 

output to a large extent.  

 
70 

 
In furtherance, the 'all plot' that was shown on the regression plot in Figure 29 is an 

indication of the extent of the relationship between the inputs and output.  

     
Figure 29. Regression plot of Elman neural network training  

The cluster points on each point showed how many of the variables that have no relationship 

between the inputs and the outputs. Interestingly, the training was thorough as shown on the 

training plot of the regression plot. 

 
71 

 
Finally, to put the difference between the target and the neural outputs succinctly, Figure 30 

provides a quick view of the variation in the target and the neural output. 

 
Figure 30. Elman neural network plot of target and neural outputs 

Based on Figure 30, it can be seen that variation occurs only on few occasions. That is, about 

8 times (8 rows) out of the 100 rows considered.  Elman neural network learned well but it 

was not as effective as with the case of feed-forward neural network. 

 
72 

 
4.3  Timedelay Neural Network's simulation exercise 

Timedelay neural network is of paramount importance from the point of view that the input 

weights have a tap delay line that is associated with it. Hence, it is the mostly preferred in 

time series input data because it has a finite dynamic response. Levenberg-Maquardt was the 

training method used and the dataset was randomly divided using dividerand. The command 

window implementation of timedelay neural network is shown in Figure 31.  

 
Figure 31. Command window for time delay neural network. 

Both the training algorithm and the number of hidden neurons were changed to ensure that 

the network effectively learned the relationship. The training algorithm used is as shown in 

Figure 32.   

 
73 

 
However, these efforts of changing the training algorithm and increasing the number of 

hidden neurons have no positive impact on the network as far as improving the performance 

was concerned. 

 
Figure 32. Training window of timedelay neural network. 

As shown in Figure 33, a great deal of variation was observed between the target and the 

neural output. The reason for such great disparities in values remains unknown having 

changed the training algorithm to a more sophisticated training like trainscg and trainrp 

respectively.  

 
74 

 
The output is as given below: 

 
Figure 33. Target and neural outputs of timedelay neural network. 

Regression analysis plot shown in Figure 34 clearly summarizes the fact that the network 

failed to effectively learn the relationship.  

 
Figure 34. Regression plot of the relationship between the target and the neural outputs. 

 
75 

 
The regression value R, indicated the degree of the relationship. Finally, the target and the 

neural outputs are put on a distinct Figure 35 to clearly have a hint of the variation between 

the target and the neural output. 

 
Figure 35. Variation in target and neural output plot. 

Though, it is worthy to mention that an improvement in the learning efficiency was observed 

in terms of the performance and how close the target values were with the neural output when 

distributed delay neural network, distdelaynet was used. This is because it uses delay on the 

layer weights as well as the input weight. However, this is beyond the scope of this thesis.  

 
76 

 
4.4 Layer Recurrent Neural Network 

Layer recurrent neural network uses the function layrecnet and the command window is 

shown in Figure 36. 

 
Figure 36. Command window for layer recurrent neural network. 

Similarly, Figure 37 showed some of the learning outcomes of both target and neural outputs 

respectively 

 
Figure 37. Target and neural outputs of layer recurrent neural network. 


77 

 
From Figure 37, it can be said that layer recurrent neural network perfectively learned the 

relationship between the inputs and the output.  

 
Figure 38. Learning window of layer recurrent neural network. 

Levenberg-Marquardt was also used in the training and the dataset was randomly divided. 

The performance was calculated using the Mean Squared Error.  

 
78 

 
The quick view plot shown in Figure 39 gives a quick view of the values in the target and 

neural output. 

 
Figure 39. Target and neural outputs of layer recurrent neural network. 

From plot obtained in Figure 39, it means that the neural network effectively learned the 

relationship and thus, there was no observable difference between the target and neural 

output. 

 
79 

 
4.5  Nonlinear Autoregressive Neural Network (NARNET) 

Nonlinear autoregressive neural network is trained using the function narnet. The command 

window for narnet is shown in Figure 40.  

 
Figure 40. Command window for the nonlinear autoregressive neural network. 

It is important to examine the performance of narnet as it offers full dynamic derivative 

calculation. Although, this neural network will not considered in the analysis of data 

contained in Appendix IV, but it is necessary to introduce this network as it will be 

recommended to be the neural network for future analysis due to its usage of fpderiv 

algorithm. 

 
80 

 
As with the previously enumerated neural networks, the dataset was randomly divided. 

 
Figure 41. Training window of the nonlinear autoregressive training. 

Levenberg-Marquardt was also used in the training. Some of the outputs are as shown in 

Figure 41. From the output, clear disparities in the values of the target and the neural can be 

clearly seen. 

 
Figure 42. Narnet training results showing both target and neural outputs. 


81 

 
In addition, the regression plot shown in Figure 43 indicated that the neural network did not 

effectively learn the relationships between the inputs and output. 

 
Figure 43. Narnet regression plot of the learning process.  

The regression value was R = 0.17587, this value is nowhere less than 0.5. It therefore means 

that, there was no relationship whatsoever between the target and the neural output.  

 
82 

 
To put the relationship in a more readable format, Figure 44 quickly gives an insight into the 

variation between the expected output and also the trained output respectively.  

 
Figure 44. A graph of the target and neural values after training with Narnet. 

Since there was no much agreement between the expected value and the trained value. 

Changing the training algorithm and increasing the number of hidden neurons did not have 

any meaningful result on the output, the need to examine a variant of the nonlinear 

autoregressive neural network which is known as nonlinear autoregressive neural network 

with external inputs. This will be examined in the final section of this chapter, Chapter 4.  

 
83 

 
4. 6 Nonlinear Autoregressive Neural Network with External Inputs  

The function used by this network is known as narxnet. This is similar to narnet with the 

exception of the external inputs. Appendix II was used as the training dataset for this 

network. Sequel to the use of Appendix II, the MATLAB command window is shown in 

Figure 45. 

 
Figure 45. Narxnet MATLAB command window. 

Interestingly, narxnet is able to make prediction given the values of the past series, feedback 

input, and another time series called external or exogenous time series. Hence, the name 

nonlinear autoregressive neural network with exogenous input. However, this feature is not 

needed for the purpose of this thesis as this thesis deals with patient's data and the 

physiological conditions of patients varies from patients to patients.   

 
84 

 
The training algorithm used was Levenberg-Marquardt and the dataset was randomly divided 

as shown in Figure 46. 

 
Figure 46. Training window for narxnet. 

The target and neural outputs are shown in Figure 47. 

 
Figure 47. Narxnet target and neural outputs. 


85 

 
The regression plot in Figure 48 indicates the inputs-output relationship. The value of the 

regression plot showed that there was a little or no relationship. 

 
Figure 48. Regression plot for narxnet. 

 
86 

 
Finally, a quick view of the target and the neural plot is shown in Figure 49.  

 
Figure 49. Narxnet target and neural outputs. 

In conclusion, based on the use of dataset contained in Appendix II on different neural 

networks in Chapter 4 of this thesis, it can be concluded that feedfowardnet, elmannet and 

layrecnet provided a good learning outcome of the relationships between the inputs and 

output. In other words, the values of the target and the neural outputs are in agreements 

beyond reasonable doubts. Therefore, these will be the neural networks to be considered for 

the case study analysis that will be done in the next chapter, that is, Chapter 5.  

 
87 

 
5. ANN SIMULATION OF THE TONGUE CANCER’S CASE STUDY 

 
This chapter is aimed at examining the objectives of the thesis as outlined in Chapter 1 of 

this write-up. As pointed out in the introductory chapter of this thesis, cancer is a dreadful 

disease and the need to be proactive in the diagnosis and treatment becomes imperative. 

Tongue cancer, that is, oral (mobile) tongue Squamous Cell Carcinoma (SCC) is not an 

exception.  Although, the tongue is characterised by the fact that it has a high amount of 

muscle bundles which may inhibit the potential tumour spread on it. Despite this fact, 

detection and prognostic studies are very important in other to record significance success in 

the crusade against cancer. While detection cancer has to do with having the information that 

the patient has cancer at an early state, the prognosis on the other hand examines the 

likelihood of survival of the patients. It is worthy of note that early detection of tongue cancer 

is not always an indicator for good prognosis (chance of survival from treatment). This is 

because evidence had shown that about 20% to 40% had spread vigorously to other parts 

(metastasis) (Ganly et al 2012 & Ho et al 1992).  

 
Thus, prognostic studies are poised to divide the patients into two- firstly, the patients whose 

tongue cancer is severe and thus would need aggressive treatment such chemotherapy or 

multimodal therapy. Secondly, the patients who would need surgical treatment alone 

(Kellermann et al 2007). This important classification into low and high risk will represent a 

major advancement in the management of this dreadful disease. Clinical size classification 

into T1 and T2 of early oral tongue SCC was unable to divide the patients into low risk and 

high risks respectively (Keski-Santti et al 2007). Hence, the need to look out for parameters 

to be used in the prognostication of cancer becomes imperative. Previous studies had used 

histomorphological parameters such as depth of invasion, tumour budding, the histological 

risk to mention but a few (Almangush et al 2013). Most of these histomorphological 

parameters made up the columns of Appendix III.  In addition, the meaning of some of these 

parameters would be defined in section 5.1.  

 
88 

 
5.1 Definition of SCC related terms 

5.1.1 Tumour Budding 

 
Tumour budding means loss of cellular cohesion as well as loss of active invasive movement. 

Both of these are malignancy properties.  

 
5.1.2  Tumor size, Prognosis and Metastasis 

 
Prognosis is a measure of survival while metastasis defines the spread of cancer from one 

part of the body or the affected part to another. Tumor Size is the diameter of the tumor 

(Baran et al 2015).  

 
5.1.3 Depth of Invasion (DOI) 

 
This is a measure of the thickness of the tumor. In other words, it defines the extent of the 

growth of the tumour. There are different pattern of invasions with  worst patterns of 

invasions (WPOI) known as type 4 . There are other types such as type 5 tumour satellite and 

also perineural invasion (PNI).   

 
5.1.4 Symptom  

 
The symptom can be said to be an indicator for a person or patients when the person changes 

from the normal condition or feelings to an unusual state or feelings due to the presence of 

something or a disease. In terms of cancer, it can be interpreted as a measure of the 

aggressiveness of the tumour, which has significant effects on prognosis (Baran et al 2015).  


89 

 
5.1.5 Pathological Stage (TNM Stage ) 

 
Pathological stage cTNM classifies in terms of the tumour size and location. TNM is the 

nowadays the commonly used pathological staging and it is defined below:  

 
T size and location of the main t