An incremental genetic algorithm and neural network for classification and sensitivity analysis of their parameters

(1)

AN INCREMANTAL GENETIC ALGORITHM AND

NEURAL NETWORK FOR CLASSIFICATION AND

SENSITIVITY ANALYSIS OF THEIR

PARAMETERS

by

Gözde BAKIRLI

September, 2009 ĐZMĐR

(2)

NEURAL NETWORK FOR CLASSIFICATION AND

SENSITIVITY ANALYSIS OF THEIR

PARAMETERS

A Thesis Submitted to the

Graduate School of Natural and Applied Sciences of Dokuz Eylül University In Partial Fulfillment of the Requirements for the Master of Science in

Computer Engineering

by

Gözde BAKIRLI

September, 2009 ĐZMĐR

(3)

ii

M.Sc THESIS EXAMINATION RESULT FORM

We have read the thesis entitled “AN INCREMENTAL GENETIC ALGORITHM

AND NEURAL NETWORK FOR CLASSIFICATION AND SENSITIVITY ANALYSIS OF THEIR PARAMETERS” completed by GÖZDE BAKIRLI under

supervision of PROF. DR. ALP KUT and we certify that in our opinion it is fully adequate, in scope and in quality, as a thesis for the degree of Master of Science.

Prof. Dr. Alp Kut

Supervisor

Yrd. Doc. Dr. Derya Birant Yrd. Doc. Dr. Reyat Yılmaz

(Jury Member) (Jury Member)

Prof.Dr. Cahit HELVACI Director

(4)

iii

ACKNOWLEDGMENTS

I would like to thank to my supervisor, Prof. Dr. Alp Kut, for his support, supervision

and useful suggestions throughout this study. I am also highly thankful to Dr. Derya

Birant for her valuable suggestions throughout study.

I owe my deepest gratitude to my family. This thesis would not have been possible

unless their unflagging love and support. I am indebted to my father Mustafa Ali Bakırlı,

my mother Beyza Bakırlı and my brother Demir Bakırlı for their care and love.

Special thanks are directed to TUBITAK for their financial support throughout two

years. Thanks to TUBITAK BIDEB, I could attain necessary hardware and software

equipments and I could allocate more time to my thesis.

(5)

iv

AN INCREMANTAL GENETIC ALGORITHM AND NEURAL NETWORK FOR CLASSIFICATION AND SENSITIVITY ANALYSIS OF THEIR

PARAMETERS

ABSTRACT

This study proposes classification by using algorithms that are inspired by computation in biological systems and to compare of them. These are genetic algorithm and neural network. New incremental genetic algorithm and new incremental neural network algorithm are developed for classification for efficiently handling new transactions. To achieve incremental classification, a specific model that includes all information about a train operation, rules for each class for genetic algorithm and weight values for neural network is created after each training operation. Later, these models are used for testing, correctness test, comparing models and incremental classification. With that new incremental method, training time gets smaller for new dataset. Experimental results proof that assumption. This paper introduces that new method and importance of that method. This study also includes the sensitivity analysis of the incremental and traditional genetic algorithm parameters and neural network parameters. In this analysis, many specific models were created using the same training dataset but with different parameter values, and then the performances of the models were compared. To achieve these operations two tools are developed for both genetic algorithm and neural network and all of these investigations are done by using these tools.

Keywords: Genetic Algorithm, Neural Network, Classification, Data Mining,

(6)

v

ARTIMLI GENETIK ALGORĐTMA VE YAPAY SĐNĐR AĞLARI ĐLE SINIFLAMA VE PARAMETRELERĐNĐN HASSASĐYET ANALĐZĐ

ÖZ

Bu çalışmanın amacı biyolojik sistemden etkilenilip geliştirilen genetik algoritma ve yapay sinir ağları ile sınıflama yapmak ve artımlı algoritmalarını geliştirip geleneksel yöntem ile karşılaştırmasını yapmaktır. Bu amaç doğrultusunda her eğitim aşamasından sonra o eğitime özgü bir model oluşturulmaktadır. Bu model, eğitim ile ilgili tüm bilgileri ve eğitim aşamasından sonra elde edilen çıktıları saklamakta ve bu bilgiler daha sonra test işlemi, modelin doğruluk testi, modellerin performans hesaplamaları, performanslarının karşılaştırılması ve artımlı sınıflama sağlamak için kullanılmaktadır. Artımlı sınıflama geleneksel yönteme göre daha kısa eğitim zamanı ile daha fazla performans sağlamıştır. Bu amaca yönelik yapılmış deneysel gözlemler bu performans kazancını ispatlamaktadır. Ayrıca genetic algoritma ve yapay sinir ağları parametrelerinin değerleri değiştirilip, eğitim işlemlerinin sonuçları karşılaştırılarak hassasiyet analizi yapılmıştır. Tüm bu işlemleri gerçekleştirmek amacıyla genetik algoritma ve yapay sinir ağları için ayrı iki sınıflama aracı geliştirilmiştir.

Anahtar Sözcükler: Genetik Algoritma, Yapay Sinir Ağları, Sınıflama, Veri

(7)

CONTENTS

Page

M.Sc THESIS EXAMINATION RESULT FORM...ii

ACKNOWLEDGMENTS ... iii

ABSTRACT...iv

ÖZ ...v

CHAPTER ONE - INTRODUCTION ...1

1.1 Data Mining ...1

1.2 Classification...1

CHAPTER TWO – GENETIC ALGORITHM ...3

2.1 Related Works...3

2.2 Genetic Algorithm for Classification ...5

2.3 Fitness Function ...7

2.4 Parent Selection Techniques ...10

2.4.1 Roulette-Wheel Selection...10

2.4.2 Tournament Selection ...10

2.4.3 Top Percent Selection ...10

2.4.4 Best Selection...11

2.4.5 Rank Selection ...11

2.4.6 Random Selection ...11

2.5 Crossover ...12

2.5.1 One Point Crossover ...12

2.5.2 Two-Point Crossover ...13

2.5.3 Uniform Crossover...13

2.6 Mutation ...15

(8)

2.7.1 Generation Number...16

2.7.2 Fitness Value...16

2.8 Experimental Results ...22

2.8.1 Description of the Dataset...22

2.8.2 Determining Crossover and Mutation Probability ...24

2.8.3 Importance of Elitism...26 2.8.4 Population Size ...27 2.8.5 Traditional vs. Incremental GA ...29 2.8.6 Classification Accuracy ...32 2.9 Interface...34 2.9.1 Training ...35 2.9.2 Comparing Models...39 2.9.3 Testing...41 2.9.4 Incremental GA...44

CHAPTER THREE – NEURAL NETWORK...49

3.1 Related Works...49

3.2 Neural Network...49

3.2.1 Simple Single Unit Network ...49

3.2.2 Activation Functions ...51

3.2.3 Termination Criteria...53

3.2.4 Neural Network Types ...54

3.3 Approach ...60

3.3.1 Incremental Approach...60

3.3.2 Performance Calculation of the Models...62

3.4 Experimental Results ...62

3.4.1 Model Construction...62

3.4.2 Description of the Dataset...64

3.4.3 Iteration Number ...64

(9)

3.4.5 Learning Rate (µ) ...67

3.4.6 Neural Network Types ...68

3.4.7 Incremental vs. Traditional Backpropagation NN ...69

3.5 Interface...71 3.5.1 Backpropagation ...71 3.5.2 SLP...73 3.5.3 MLP ...74 3.5.4 SOM ...75 3.5.5 Testing...77 3.5.6 Comparing Models...79 3.5.7 Incremental NN...80

CHAPTER FOUR - CONCLUSION ...83

4.1 Conclusion for GA ...83

4.2 Conclusions for NN ...84

4.3 GA vs. NN for Incremental Classification...85

(10)

1

CHAPTER ONE INTRODUCTION

1.1 Data Mining

Data Mining is the process of extracting hidden patterns from large datasets. Data mining has been widely used in many areas, such as marketing, banking and finance, medicine and manufacturing. There are many data in these areas. Data mining is the most widely used method to process these data. There are commonly four tasks of data mining. These are; Classification, Clustering, Regression and Association rule learning.

1.2 Classification

Classification is a procedure in which individual items are placed into groups based on quantitative information on some characteristics inherent in the items. In the classification process, a collection of labelled classes is provided and a training set is used to learn the descriptions of classes. Classification rules are discovered and then these rules are used to determine the most likely label of a new pattern. The most widely used classification techniques are neural networks, decision trees, k-nearest neighbours, support vector machines, and naive Bayes.

Neural Network is one of most widely used classifiers. Much successful classification can be done with Neural Network. There are so many examples for Neural Network classification. But many of them don’t support testing operations, incremental NN, performance calculation of models and models comparison.

Genetic Algorithm is not accepted between most widely used classifiers as a classifier, but so successfully classification can be done with Genetic Algorithm. There have been

(11)

a few samples for genetic algorithm when it is used as a classifier. None of them supports testing operations, models comparing and incremental GA.

The aim of that study is to show importance of saving information of each train operation. A model is created for each training. This model includes all inputs and outputs of train operation. These information are used for testing (classes finding of new patterns), comparisons of models, correctness testing and decrementing training time for dataset which is updated regularly.

This study also presents the sensitivity analysis of the GA parameters such as crossover probability, mutation probability, with/without elitism and population size and NN parameters such as input and output neuron numbers, hidden layer numbers, activation function…. The aim of this analysis is to evaluate the performances of the classification models which are constructed using the same training dataset with different GA parameter values and different NN parameter values. Each classification model for GA (classifier) consists of input parameters (crossover and mutation probabilities, population size etc.), applied techniques (parent selection type, crossover type, different termination criteria etc.), and outputs (average fitness value, classification rules etc.) related to training process. And each model for NN consists of input parameters (NN type, Network Layer number, network dimensions etc.) and weight values. The models are compared by applying n-fold cross validation method. In order to implement all these experiments, two tools are developed, named Generic Genetic Classifier Tool and Neural Network Modeller.

(12)

3

CHAPTER TWO GENETIC ALGORITHM

2.1 Related Works

Genetic Algorithms are a family of computational models motivated by the process of natural selection in biological system. Evolutionary computing concept is appeared in the 1960’s by I.Rechenberg. GA was first developed by Holland in 1975 and then improved by other many researchers (Booker, Goldberg & Holland, 1989). Currently, GA is one of the most important techniques of artificial intelligence. GAs are used for soft constraint satisfaction, scheduling problems, finding game strategies, and so forth.

The basis of genetic algorithm is "natural selection". That means, individuals who have sufficient features to live, are transferred in next generation, and other individuals who are not good enough, disappear. The stronger candidates remain in the population, the weaker ones are discarded (Shapiro, 2001). So new generation gets closer to the best solution at each step and this operation goes on until termination criteria are met. For the basic concept of genetic algorithms, please refer to Goldberg (1989).

In recent years, only in a few studies, GAs has been applied for classification problem to discover classification rules. Ishibuchi, Nakashima, and Murata (2001) constructed a fuzzy classifier system in which a population for fuzzy if-then rules is evolved from genetic algorithms. Avcı (2009) implemented classification method by combining genetic algorithm and support vector machine techniques. Fan, Chen, Ma and Zhu (2007) created an approach for proposal grouping, in which knowledge rules are designed to interact with proposal classification, and the genetic algorithm is developed to search for the expected groupings. Yuen et al. (2009)proposed a hybrid model which combines genetic algorithm and neural network for classifying garment defects. Kwong,

(13)

Chang and Tsim (2008) used genetic algorithm to discover knowledge about the fluid dispensing. Dehuri, Patnaik, Ghosh and Mal (2008) used an elitist multi-objective genetic algorithm for mining classification rules from large databases. Yılmaz, Yıldırım, and Yazıcı (2007) used genetic algorithm to make classification segments of video to objects.

According to the review of GA-based classification methods, previous studies use either traditional genetic algorithm or combination of genetic algorithm with another AI technique such as fuzzy, neural network. They don’t propose the incremental usage of the genetic algorithm for classification when new data is added to the existing dataset.

The problem of incrementally updating mined patterns on changes of the database, however, has been proposed for other data mining tasks such as clustering, association rule mining. Lin, Hong, Lu (2009) propose an efficient method for incrementally modifying a set of association rules when new transactions have been inserted to the database. Lühr and Lazarescu (2009) introduce an incremental graph-based clustering algorithm to both incrementally cluster new data and to selectively retain important cluster information within a knowledge repository. Fan, Tseng, Chern, and Huang (2009) propose an incremental technique to solve the issue of added-in data without re-implementing the original rough set based rule induction algorithm for a dynamic database.

Sensitivity analysis is the study to determine how a given model output depends upon the input parameters. (Saltelli, 2008) In other words, it is the process of varying input parameters over a reasonable range and observing the relative change in model response. It is an important process for checking the quality of a given model, as well as a powerful tool for checking the robustness and reliability of the model. A sensitivity analysis can be conducted by changing each parameter value by +/- 10% and +/-50% (Cacuci, 2003). This study compares the performance of the classification models constructed by different GA parameter settings.

(14)

2.2 Genetic Algorithm for Classification

Each phase in GA (Figure 2.1) produces a new generation of potential solutions for a given problem. In the first stage, an initial population, which is a set of encoded bit-strings (chromosomes), is created to initiate the search process. The performance of the strings is then evaluated with respect to the fitness function which represents the constraints of the problem. After the sorting operation, the individuals with better performance (fitness value) are selected for a subsequent genetic manipulation process. The selection policy is responsible for assuring survival of the best-fit individuals. In the next stages, a new population is generated using two genetic operations: crossover

operation (recombination of the bits/genes of each two selected strings/chromosomes) and mutation operation (alteration of the bits/genes at one or more randomly selected positions of the strings/chromosomes). This process is repeated until certain criteria are met.

Figure 2.1 Basic genetic algorithm

Search Space is a subset which includes all possible solutions of the problem.

Population is a subset of n randomly chosen solutions from the search space. For data mining, population is created randomly class number times. For example; if there are five classes in dataset then population is created five times. Because train operation for each class runs for five times. For example there are two classes, ‘YES’ and ‘NO’, so

(15)

firstly, population is created for ‘YES’ class, and finds solutions, rules for that class, and then the same operation is made for ‘NO’ class.

Population consists of chromosomes. Chromosomes are strings which are possible solutions of that problem. Length of chromosomes is determined while population is being created randomly at the beginning, I look the number of the difference values of each attribute, and then I create chromosomes by looking these numbers. To understand clearly look at nursery dataset, which is one of datasets I study.

In nursery dataset there are nine attributes. The last one is class value. After number of attributes is determined, dataset is searched for nine times. In every search we find how many different values for relevant attribute, there are. For example for the first attribute which name is ‘parent’, there are three different values, which are ‘usual’, ’pretentious’ and ‘great_preat’. So the length of the part of chromosome for that attribute is three. For the first value, ‘usual’, our string part is ‘100’, for the second value, ’pretentious’, our string part is ‘010’, and for the last value, ‘great_preat’, our string part is ‘001’. So the first part of the chromosomes can be in the following forms;

100: if parent=usual 010: if parent=pretentious 001: if parent=great_preat 110: if parent=usual or parent=pretentious 011: if parent=pretentious or parent=great_pret 101: if parent=usual or parent=great_pret

111: if parent=usual or parent=pretentious or parent=great_pret (in that situation that attribute is noneffective for relevant class.)

At the beginning population is created randomly, that means these string parts are created randomly. For example while a chromosome is being created, for ‘parent’

(16)

attribute, the part of that chromosome will be one of above string parts. And this operation is done for every attributes.

For each attribute we create these string parts, and then we piece together these parts. This operation is made for population size times.

After population creation, all individuals’ fitness values are calculated.

2.3 Fitness Function

This function is used for determining how acceptable individual is. There is not a fixed function to calculate fitness value. Fitness function depends on problem. But for classification with rule discovery by using genetic algorithm there is only one fitness function Freitas, AA(1999).

Rule for each class is of the form “”IF condition THEN class”. Fitness value of that rule is predictive accuracy of that. Predictive accuracy is found by computing confidence factor of that rule.

Definition 1. Confidence factor of rule

) (condition # class) & (condition # factor Confidence =

# (condition) : the number of examples in that condition

# (condition & class): the number of examples in that condition and have that class.

(17)

For example there is 10 rules which are same with ‘condition’ (# (condition) = 10) and 3 of them are from class (# (condition & class) = 3), then confidence factor of that rule is 3/10.

But there is a problem. For example # (condition) = 1 and # (condition & class) = 1, that is there is only one pattern which is “IF condition THEN class”, but according

confidence factor formula, result is “1/1 = 100%”. But it should not be. Because maybe

that rule doesn’t define of class “C”.

To overcome with that problem we use another component.

TP: True Positives: Number of examples satisfying condition and class FP: False Positives: Number of examples satisfying condition but not class

FN: False Negatives: Number of examples not satisfying condition but satisfying

class

TN: True Negatives: Number of examples not satisfying condition nor class

In that situation confidence factor is equals to;

Definition 2. Formula of confidence factor of rule is as follows;

FP TP TP factor Confidence + =

We know that confidence factor is not enough for calculating fitness value of the rule. So another component is needed for fitness value. It is “completeness”. Completeness is used for determining how complete the rule is.

(18)

Comp FN TP TP + =

So now the fitness function such as:

Definition 4. Fitness value for classification operation is calculated as follows;

Fitness = confidence factor x Comp

This is the fitness function that is used in that project for classification.

Training operation is applied for each class, and then average fitness value of each operation is calculated. This is the result fitness value, and this value effects performance of the model.

Definition 5. Population fitness is defined as:

) r Fitness(ch = _Fitness Population | |chr 1 i i

∑

=

where |chr| is the number of chromosome in the population (population size), chri is the one chromosome in the population.

Definition 6. Result fitness value calculation is shown below:

result_fitness=( Population_Fitness(p ))/|c| |c| 1 i i

∑

=

where |c| is the total class number, pi is the population for ith class.

Fitness value is calculated for every individual at the population. After this operation individuals are sorted according to their fitness values.

(19)

After sorting, parent selection is made according to selection techniques for cross-over operation.

2.4 Parent Selection Techniques

2.4.1 Roulette-Wheel Selection

This is the most widely used selection technique. Roulette wheel selection is implemented as follows

1. Find total fitness of the population.

2. Generate n randomly, between 0 and total fitness

3. Return the first population member whose fitness added to the preceding population members is greater than or equal to n.

2.4.2 Tournament Selection

As its name, there is a tournament among individuals. A few individuals are selected from the population randomly. The individual which has the highest fitness value is selected between these individuals.

2.4.3 Top Percent Selection

In that selection technique, top n percent of population is used. For example there are 100 individuals in the population and n is 20, then we select randomly one individual between first 20 individuals.

(20)

2.4.4 Best Selection

Parents are first two individuals of the population which has highest fitness value. The purpose of that method is to accelerate training by avoiding individuals which have poor fitness value. This method can not work when best individuals are not good enough. Because other individuals which have not high fitness value, may have successful performance and may get close solution after crossover or mutation operations.

2.4.5 Rank Selection

Tournament selection will have problems when the fitnesses differ very much. For example, if the best chromosome fitness is 90% of all the roulette wheel then the other chromosomes will have very few chances to be selected.

Rank selection first ranks the population and then every chromosome receives fitness from this ranking. The worst will have fitness 1, second worst 2 etc. and the best will have fitness N (number of chromosomes in population).

2.4.6 Random Selection

Parents are selected randomly with that method. A number ‘n’ is generated between “1” and population size. After that operation nth member of population is selected as parent.

There are two alternatives for cross-over. These are not kind of cross-over techniques, these are choices to get better performance and to decrement time which we need to reach to solution. These are Steady-State Selection and Elitism. The mission of steady-state selection is transferring individuals which have high fitness value, into next generation without any operation. The purpose of that, to save good individuals and not

(21)

losing them. The purpose of Elitism is similar, but only first two individuals which have highest fitness value are added into next generation directly with elitism.

After parents are selected, cross-over operation is applied.

2.5 Crossover

In that operation genetic information of two individuals are merged. The purpose of that operation is to find best individual. There are three main cross-over techniques for data mining.

2.5.1 One Point Crossover

Each parent divides into two parts according to a fix number which is between 1 and string length or a number which is generated randomly. And the first part of first parent and the second part of the second parent are merged and first child is created. The second part of first parent and the first part of second parent create second child.

(22)

2.5.2 Two-Point Crossover

Each parent divides into three parts according to fix two numbers which are between 1 and string length or which are generated randomly. The first and third parts of the first parent and the second part of the second parent create first child. The second part of the first parent and the first and third parts of the second parent create second child.

Figure 2.3 Two point crossover

2.5.3 Uniform Crossover

A string is generated randomly. The mask determines which bits are copied from first parent and which from the other parent. The bits which are “1” in mask, show that these bits will be copied from the first parent, and “0” bits are for second parents.

(23)

Figure 2.4 Uniform crossover

Crossover operation is shown in Figure 2.5 as a pseudo code.

Figure 2.5 Crossover

(24)

Cross-over operation is applied according to cross-over probability. This probability value is between 0 and 1. Before operation a random number is generated between 0 and 1. If this number smaller then probability, cross-over is applied, otherwise parents are transferred directly, without any operation.

2.6 Mutation

After cross-over operation, mutation is applied. Mutation is a random deformation of the strings with a certain probability. This probability is similar with cross-over probability. If random n number smaller then mutation probability, mutation operation is made. Certain bits of string are changed with that operation. The aim of mutation operation is avoiding local minimum. Figure 2.6 shows pseudo code of mutation operation.

Figure 2.6 Mutation

2.7 Termination Criteria

These cross-over and mutation operations attend until termination criteria are met. There are two most widely used termination criteria. These are generation number and fitness value.

(25)

2.7.1 Generation Number

As its name, there is a given number which shows how many new generations will be created. For example if generation number is 50, then after 50 generation, loop of generation is terminated regardless of average fitness of the population. Disadvantage of that method, training can be terminated before reaching solution.

2.7.2 Fitness Value

In that situation fitness value of population needs to reach given fitness value to terminate. Disadvantage of that method, wrong fitness value can be determined as a threshold, so when loop is terminated, there can be no solution of the problem. For example 25 is determined as a threshold fitness value. But maybe there will be a population which has 30 fitness value, and of course closer to best solution. But because of false threshold, this best solution can not be reached. On the other hand wrong fitness value causes infinite loop. For example population’s best fitness value is 30, but at the beginning 35 is selected as a threshold that can never be reached. In that situation loop can not terminate.

When termination criterion has been reached, there is a population that includes rules for one class. These rules or this rule characterize of current rule. After rules are found for one class, the same operation is done for another class and rules of that class are found.

The purpose of that study is to achieve incremental GA firstly. To realize that, the incremental genetic algorithm for classification in Figure 2.7, is developed.

(26)

Figure 2.7 Incremental GA

In training process, the main difference between traditional GA and our incremental GA is the generation of the initial population. While, in traditional GA, initial population is generated fully random, in incremental GA, the classification rules are also added into the randomly generated population. Experimental results show that the results obtained from traditional and incremental GA are the same, but incremental GA reduces the generation number and decreases time which we need to reach to solution. The flowchart of the proposed training process is depicted in Figure 2.8.

(27)

Figure 2.8 The flowchart of the proposed training process

By using above algorithm training (classification) operation is implemented as described before. After that operation there is a specific model of that operation. This model is shown in Figure 2.9.

(28)

This model includes inputs that are entered ; Selection Type (roulette wheel, tournament, top percent, best, rank or random) if selection type is tournament then model includes “Group Size” value or selection type is top percent then there will be “Percent Value” in the model, Steady_State Selection (true or false), Elitism (true or false), Cross_over Probability (between 0 and 1), Mutation Probability (between 0 and 1), Population Size, Cross_over Type (one_point, two_point or uniform), Replace If

Better (true or false), Replace Always (true or false), Termination Criteria (generation

number or fitness value). These are inputs. This model also includes outputs of classification operation; Generation Number, Average Fitness, Attributes and Rules.

This specific structure is used for calculation performances of models.

Performances of models depend on generation number and result fitness of the population. Performance is directly proportional to the result_ fitness and inversely proportional to the generation number.

Definition 7. Performance of the model is calculated as follows, Pm stands for

performance of the model, result_fitness is declared in Definition 6.

number generation ness result_fit α P_m

Result fitness and generation number effect performance. But the forcefulness of their effect is not the same. Because the main purpose of training operation with genetic algorithm, is having maximum fitness and than minimum generation number. That is fitness value has priority. Look from this perspective, training operation can not be interrupted to have small generation number. We can say that fitness should have double effect on performance calculation. In that situation the following formula is used.

(29)

Definition 8. Performance calculation when fitness value has double effect on calculation. number generation ness result_fit * 2 P_m =

But sometimes generation number can be more important than fitness value, or there can be a delicate balance between average fitness and generation number. So weights should be used for calculation.

Definition 9. Performance calculation with user defined weights, w1 is a weight for

result fitness and w2 is a weight for generation number.

number generation * w2 ness result_fit * w1 P_m =

Comparisons of models’ performances are done by using the pm formula in Definition 9. With these comparisons, parameters which are ideal for a dataset, can be determined.

This study provides effective way for classification for datasets which are updated regularly. To achieve that, models that are created after training operation, are used. (Figure 2.9) These specific models include rules that stand for every class in the dataset. When new patterns are added into dataset, a new classification, a new training operation is needed. Before that study all operations were applied one by one, and this causes waste of time. Because a part of that dataset is trained before and rules in that dataset are found. This study solves that problem and eliminates wasting of time. To succeed that, rules that are found before, are added into initial population. And then other operations are done. That means initial population is not created fully randomly, an intervention is applied to initial population. All classes in the new dataset are determined, and rules for

(30)

classes that were in the previous dataset, are found from the model, and added in initial populations. This provides to access expected fitness value with fewer generation number. Figure 2.10 shows that operation.

Figure 2.10 Incremental approach for handling new patterns

Initial dataset is trained and model of that dataset is created. This model includes all information and rules for that training operation. Later, new patterns are added into dataset. For example there were 15000 patterns in the initial dataset, and 1000 new patterns are added into dataset. For 1000 patterns all dataset should be trained in any case. But generation number and so training time can be reduced by using model that is created for initial dataset.

The following notations are used in the proposed model:

St Selection Type Mp Mutation Probability Cp Crossover Probability

(31)

Ps Population Size Ct Crossover Type Tc Termination Criteria Gn Generation Number

Pv Percent value for top percent selection Gs Group size for tournament selection

GN Generation number as a termination criteria FV Fitness value as a termination criteria

2.8 Experimental Results

2.8.1 Description of the Dataset

For the purpose of testing the performance of the proposed incremental GA, classification experiments are conducted on the real-world data "Nursery" from the UCI Machine Learning Repository (Asuncion & Newman, 2007). Nursery data consists of 12960 instances with 9 features derived from a hierarchical decision model originally developed to rank applications for nursery schools in Ljubljana, Slovenia. As shown in Table 2.1, all attributes have categorical values and the target (class) attribute has five different classes, namely, not_recom, recommend, very_recom, priority, and spec_prior.

(32)

Table 2.1Attributes and attribute values of Nursery dataset Attribute

Name

Attribute Values

parents usual, pretentious, great_pret

has_nurs proper, less_proper, improper, critical, very_crit

form complete, completed, incomplete, foster

children 1, 2, 3, more

housing convenient, less_conv, critical

finance convenient, inconv

social non-prob, slightly_prob, problematic

health recommended, priority, not_recom

class not_recom, recommend, very_recom, priority, spec_prior

When generating initial population, chromosomes are created using binary encoding. In binary encoding, the length of a chromosome is determined by the number of the different values of each attribute. For example; if an attribute has three different values: ‘usual’, ‘pretentious’ and ‘great_preat’ then the length of the part of chromosome for that attribute becomes three as shown below.

100: if parent = usual

010: if parent = pretentious 001: if parent = great_preat

110: if parent = usual or parent = pretentious 011: if parent = pretentious or parent = great_pret 101: if parent = usual or parent = great_pret

111: if parent = usual or parent = pretentious or parent = great_pret (In the last case, this attribute is ineffective for relevant class)

Encoding operation is done for every attributes, each attribute constitutes a string part and chromosomes are randomly constructed by the concatenation of them. The following string is one of possible chromosomes for Nursery dataset.

(33)

101-00001-1100-0110-111-01-001-100-00001

The meaning of that chromosome is;

If (parent=usual or parent=great_preat) and (has_nurs=very_crit) and

(form=complete or form=comleted) and (children=2 or children=3) and

(housing=convenient or housing=less_conv or housing=critical) and (finance=inconv) and

(social=problematic) and (health=recommended) Then class=spec_prior

Chromosomes for initial population for one class are randomly generated in this way until the desired population size has been reached. After termination criterion is met, other initial populations are created for other classes.

2.8.2 Determining Crossover and Mutation Probability

In this section importance of crossover and mutation probabilities for fitness value are depicted. Crossover and mutation probabilities effect training time and so generation number and fitness value directly. If crossover probability is too big, needless skipping occurs, and important individuals who have big fitness value, can be lost. If crossover probability is so small then training time increases. Figure 2.11 shows importance of crossover probability and replace type. Blue line is a fitness value-crossover probability involvement when replace type is if_better, that is children are added into new generation if their fitness values are greater than their parents, otherwise parents are transferred into new generation. Pink line is for replace type is replace_always. In that situation children are added into new generation without controlling their fitness value. Table 2.2 includes parameters that are fixed for that observation. Cpsare taken between

(34)

0.3 and 0.9, 7 different values. For each Cp 10 trainings are done, and averages of these observations are taken. As it can be seen from graphic that is shown in Figure 2.11 crossover probability(Cp) effects fitness value(FV) directly. The more Cp increases, the more FV increases for fixed generation number (GN). FV measurements are listed in Table 2.3.

Mutation probability (Mp) effects FV like Cp, substantially. Mp is taken between 0.3 and 0.8. The more Mp increases, the more FV increases for fixed GN until one point. That limit is 0.7. After that point the more Mp increases, the more FV decreases. Because system starts running randomly and in some situations good individuals are lost because of oft mutation. This observation is depicted in Figure 2.12. For each Mp, 10 trainings are applied and averages of these results are taken. Fixed parameters and their values are listed in Table 2.4.

Table 2.2 Parameters of training for Cp-FV Involvement examination

10.633 13.128 17.794 24.197 33.235 34.081 39.081 9.624 11.875 15.783 20.232 27.317 32.377 38.801 0 5 10 15 20 25 30 35 40 45 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Cp F V _{Replace_if_better} Replace Always Figure 2.11 Cp-FV Involvement St Pv Steady_State Elitism Mp Ps Ct Tc Gn Toppercent 10 False False 0.5 100 two-point GN 50

(35)

Table 2.3 Results of Cp-FV Involvement examination Cp 0.3 0.4 0.5 0.6 0.7 0.8 0.9 FV If_Better 10.633 13.128 17.794 24.197 33.235 34.081 39.081 Always 9.624 11.875 15.783 20.232 27.317 32.377 38.801 Mutation Probability 0 2 4 6 8 10 12 14 16 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Mp F V Figure 2.12 Mp-FV Involvement

Table 2.4 Parameters of training for Mp-FV Involvement examination

2.8.3 Importance of Elitism

First two individuals which have highest fitness value are added into next generation directly with elitism. The strongest individuals are not lost by this way. These individuals are under protection against every operations like mutation, crossover. To investigate importance of elitism 100 trainings are applied. 50 are with elitism and 50

St Steady_State Cp Mp Ps Ct Replace Tc FV Tournament False 0.5 0.8 100 two-point If_better FV 25

(36)

are without that. To examine that, tournament selection is used as parent selection technique. Effect of group size for tournament selection is investigated, too. 5 different group sizes are taken, these are 10,15,20,25 and 30. For each group size 10 trainings are applied and average of these is taken. Figure 2.13 shows result of investigation as a graph. While observation is done, parameters that are shown in Table 2.4 are used. Graphic that is shown in Figure 2.13 shows effects of Gs and Elitism on GN. FV is used as termination criterion. FV is 25 for that investigation. Blue line at the graphic represents classification with elitism and pink line represents classification without elitism. This graphic proves that importance of elitism. Elitism provides to reach expected FV with smaller GN. Gs effects GNsubstantially. Too Small Gss and too big Gss increases GN as seen in Figure 2.13. 25 is the most suitable Gs for that classification.

GN-Gs Involvement 0 50 100 150 200 250 300 350 400 10 15 20 25 30 Gs G N with Elitism without Elitism Figure 2.13 GN-GS Involvement 2.8.4 Population Size

Population size (Ps) is other argument that effects classification with genetic algorithm. To show effect of Ps 50 classification operations are done. 5 different Ps which are between 25 and 250, are taken and FVs of these are observed. 10

(37)

classifications are done for each Ps and average of these operations is taken. That investigation is shown in Figure 2.14 and Figure 2.15. The more Ps increases, the more FV increases for fixed GN (Figure 2.14) but also the more training time increases, too (Figure 2.15). Parameters that are fixed during the training are listed in Table 2.5.

Table 2.5 Parameters of training for PS-FV Involvement examination

Ps-FV Involvement 0 10 20 30 40 50 60 25 50 75 100 125 150 175 200 225 250 Ps F V Figure 2.14 Ps-FV Involvement

St Steady_State Elitism Cp Mp Ct Replace Tc GN

(38)

Time-Ps Involvement 0 2 4 6 8 10 12 14 16 18 25 50 75 100 125 150 175 200 225 250 Ps T im e

Figure 2.15 Time-Ps Involvement Table 2.6 Results for Ps – FV Involvement

Ps FV Time(sn) 25 5.1 6.93 50 9.56 7.39 75 12.47 8.62 100 16.48 9.85 250 52.99 16.58 2.8.5 Traditional vs. Incremental GA

In this section, the performances of traditional GA and incremental GA are compared through the experiments based on the real-world data set Nursery. The implements of both algorithms were run with various parameter settings and every classification problem has been solved 10 times where average minimum costs were calculated. Consequently, 120 different experimentswere handled for this purpose.

(39)

In the first case, 150 patterns are removed from the Nursery dataset and training operation was done for that dataset. Fixed parameters that were used for training operation are shown in Table 2.7. After training operation specific model is created. After that operation 150 patterns are added into dataset as new patterns. Model that is created before is used for training operation, and normal training operation is applied without model usage, differences of FVs are shown in Figure 2.16. Each operation is done when GN is 25, 50 and 75, and 6 operations for each. When GN is 25, FV difference is the highest, the more GN increases, the more FV differences decrease.

In the second case, termination criterion was changed from GN to FV. In this case, initial parameters were assigned with the values listed in Table 2.8 and training operations were repeated with different FV values 20, 25 and 30. According to the results shown in Figure 2.17, incremental GA reached to the expected FV at least 3 times faster than traditional GA. So, incremental GA provides to access expected fitness value with lower generation number.

Table 2.7 Parameters of training for incremental GA when termination criterion is GN.

St Steady_State Elitism Cp Mp Ct Replace Tc Ps

(40)

83.23 88.323 88.323 27.142 39.324 54.97 0 10 20 30 40 50 60 70 80 90 100 25 50 75 GN F V Incremental GA Traditional GA

Figure 2.16 Comparison of Traditional GA and Incremental GA according to the average FVs with various GN parameter settings

Table 2.8 Parameters of training for incremental GA when termination criterion is FV.

St Steady_State Elitism Cp Mp Ct Replace Tc Ps

(41)

4 6 8 15 18 29 0 5 10 15 20 25 30 35 20 25 30 FV G N Incremental Traditional

Figure 2.17 Comparison of Traditional GA and Incremental GA according to GN values with various FV parameter settings

2.8.6 Classification Accuracy

The classification accuracy is evaluated by the error rate that is the ratio of the total number of correctly classified samples by the trained models in all generated test. For example, the dataset could be randomly divided into two portions, with 70 percent of data in the training set and 30 percent in the validation (test) set. After training operation is performed on the training set, classification accuracy rate is computed on the test set.

Commonly used validation techniques for classification are simple validation, cross validation, n-fold cross validation, and bootstrap method. (Kim, 2009) In my experiments, classification accuracy is estimated by using 5-fold cross validation technique. In n-fold cross validation technique, the data set is divided into n subsets, and the method is repeated n times. Each time, one of the n subsets is used as the test set and the other n-1 subsets are put together to form a training set. Then the average error

(42)

across all n trials is computed. I used this technique because it matters less how the data gets divided.

In my experiments, the highest classification accuracy (89%) is obtained when the input parameters are assigned with the values listed in Table 2.9.

Table 2.9 Specific model that is used for classification

Table 2.10 Rules that are used for classification

Rules Class

If (parents=usual) and (has_nurs=proper) and (form=complete) and (children=1) and (housing=convenient) and (finance=convenient) and (social=nonprob or social=slightly_prob) and (health=recommended)

recommend

If (parents=usual or parents=pretentious) and (has_nurs=proper or

has_nurs=less_proper or has_nurs=improper) and

(health=recommended or health=priority)

priority

If (health=not_recom) not_recom

If (parents=usual or parents=pretentious) and (has_nurs=proper or has_nurs=less_proper) and (form=complete or form=completed or form=incomplete) and (children=1 or children=2) and (housing=convenient or housing=less_conv) and (social=nonprob or social=slightly_prob) and (health=recommended)

very_recom

If (parents=pretentious or parents=great_pret) and

(has_nurs=improper or has_nurs=critical or has_nurs=very_crit) and (health=recommended or health=priority)

spec_prior

St Steady_State Elitism Cp Mp Ct Replace Tc Ps GN FV

(43)

2.9 Interface

All operations are applied by using a tool that is developed for classification by using genetic algorithm. This tool is developed in Visual Studio .Net 2008, using C# as programming language.

All parameters for classification are entered as inputs. This provides user control and models that are created with different GA parameters, comparisons.

This tool provides model training, comparison, testing, correctness test and incremental GA. Figure 2.18 shows interface.

(44)

2.9.1 Training

This function of the tool provides classification. All parameters for the genetic algorithm are entered manually to give all controls to the users. There are 7 main parts of the page. The first part is for dataset. Figure 2.19 shows that part.

Figure 2.19 Entering dataset

File name of the dataset is entered manually or by using search button. After that, splitter between attributes in the dataset is selected. There are three choices; comma, space and full stop.

After these operations result file name is entered. All information and results of that training operation are saved in that file. This is called ‘model’ of the classification. This model is used later for many operations.

(45)

Figure 2.20 Selection techniques

Alternatives for selection techniques are Roulette Wheel, Tournament, Top Percent, Best, Rank and Random. If the user select tournament, then she/he should determine group size for tournament. There are two alternatives for group size. User can enter group size manually, or she/he can select random size. If random size is selected, group size will be determined randomly. If top percent is selected as a parent selection technique, then percent value should be entered manually. For example there are 1000 individuals in the population. User enters 30 for the percent value. In that situation parents are selected between first 300 individuals in the population.

After selection techniques are determined, there two alternatives which can improve effectivity of the training operation. These are Steady-State Selection and Elitism. User can select one of them or not, it is optional. If Steady-State Selection is selected, then first %30 of population which have high fitness value, are transferred into next generation without any operation that can change them. If Elitism is selected, then first individual which have highest fitness value, is transferred into next generation directly. Steady-State Selection and Elitism can not be selected together.

(46)

GA parameters such as Crossover Probability, Mutation Probability and Population

Size are determined by user shown in Figure 2.21.

Figure 2.21 GA parameters

Crossover probability and mutation probability should be between 0 and 1.

Figure 2.22 Crossover techniques

There are three crossover techniques alternatives for classification with GA. These are One Point, Two Point and Uniform. These are illustrated in Figure 2.22. After crossover techniques, there are two choices that can improve performance of training. These are Replace If Better and Replace Always. Selection these, is optional like Elitism and Steady_State_Selection in Figure 2.20.

(47)

Termination Criteria are in user control like other GA parameters. There are two termination criteria, Generation Number and Fitness Value, see Figure 2.23. If generation number is selected as termination criterion, then generation number should be determined. For example for the training that is shown in Figure 2.23, training will stop after 50 new generations. If the termination criterion is Fitness Threshold, then user should determine Fitness Value. Training terminates when fitness value of population reaches that user defined value.

During training operation, information of generation number and fitness value of operation is shown by using graphic that is illustrated in Figure 2.24.

Figure 2.24 Graphic

This graphic is drawn by using Zed Graph. For each class this graphic is drawn. For example for nursery dataset, this graphic is shown to the users for 5 times, because there are five distinct classes in the dataset. This graphic provides to show when model reaches at maximum FV. For example for the training that is shown in Figure 2.24, maximum FV is reached nearly after 5 generation numbers, so that provides to give information to the user that 50 generation number is too big for that training.

(48)

Figure 2.25 Rules

For each class, rules are listed as output like in Figure 2.25 ‘,’ means and, ‘|’ means

or, and ‘=’ stands for then. All of these rules are saved in model file to use later, too.

2.9.2 Comparing Models

This function of the tool provides to compare models according to their performances, and results are shown by using bar chart. Figure 2.26 shows interface of ‘Comparing Models’.

There is a datagridview that illustrates all information of the models which are compared. Infinite number of models can be compared. In datagridview; model name, selection type, information about steady_stead selection and elitism, crossover and mutation probabilities, population size, crossover type, replece_if_better and replace always information, termination criteria, generation number and average fitness value are listed.

(49)

Figure 2.26 Comparing models

First of all; models are selected by ‘Search’ button, and then added into list in the datagridview by using ‘Add->’ button. After these operations models are compared with ‘Compare’ button. As it can be seen from the Figure 2.26, performances of all models are illustrated by using Bar Chart that is drawn with Zed Graph. For example for that comparison that is shown in Figure 2.26, first model which is called ‘25’ has the highest performance and model ‘75’ has greater performance than model ‘den’. Reason of that result is shown in the datagridview. Look at Figure 2.27, that shows other part of the datagridview which is not seen in the Figure 2.26.

(50)

Figure 2.27 Information of models

Performance is directly proportional to the fitness and inversely proportional to the

generation number. Performance is calculated as it is declared in Definition 9. Fitness

value of the first model is ’83.53’ and fitness value of the second model is ’88.32’. If just fitness values were looked for the performance calculation, then second ‘75’ model should have higher performance than ‘25’ model. But when generation numbers of models are looked, then it can be seen that ‘75’ model has three times of generation number of ‘25’ model. So ‘25’ model has the highest performance. ‘den’ model has the smallest fitness value and also highest generation number. So that model has the smallest performance between these models. User can see the reason of that result by looking information of the models. For example population sizes of first two models are ‘226’, and ‘den’ model has ‘100’ population size. So it can be said that the more population size increases, the more performance increases. The other difference between first two models and the third model is crossover and mutation probabilities. So all of reasons that effect performance, can be seen with comparing function of the tool.

If any other models are wanted to compare then ‘Clear’ button is used to clear datagridview and graph.

2.9.3 Testing

This part of tool provides to classify patterns which don’t belong any classes. Testing page includes two parts. First part is for classifying, and the second part is to test how model is correct. Figure 2.28 shows testing page.

(51)

Figure 2.28 Testing

Model file name which includes classification rules is entered in the first textbox. Test file name that includes patterns that are wanted to classify is written in the second text box. Splitter that splits attributes in the test file is selected, ‘comma’, ‘space’ or ‘full stop’. Model output file name should be written in the third text box. Class names of all patterns are written in that model output file. After pressing ‘Test’ button, class names that are found for patterns are written in the ‘output.txt’ file for that example that is shown in Figure 2.29.

(52)

Figure 2.29 Classification

To test correctness of model, second part of the page is used. To use that function of the tool, user should have file that includes correct class name of the patterns. Correct outputs filename that includes correct class name of patterns is entered in the ‘Correct Outputs File’ text box, and model output file name is entered in the ‘Model Output File’ text box. After pressing ‘Correctness Test’ button, correctness ratio of the model is shown by using pie chart that is drawn with Zed Graph as it is shown in Figure 2.30.

(53)

Figure 2.30 Correctness test

2.9.4 Incremental GA

This function of the tool provides training that needs less classification time than traditional training. This function is used for datasets that are updated regularly.

(54)

Figure 2.31 Incremental GA

To achieve Incremental GA, name of model which is created for previous dataset, is entered in the ‘Model File Name’ text box. This model includes classification rules that are found with traditional GA classification for previous dataset. Dataset file name that includes new added patterns is entered ‘New Dataset File Name’ text box that is shown in Figure 2.32.

(55)

Figure 2.32 Incremental GA I

Symbol that splits attributes in the dataset is selected between there choices; ‘comma’, ’space’ and ‘full stop’. And then result file name which all information about training will be written in, is entered in the ‘Result File Name’ text box.

Another advantage of that operation is that there is no need to write GA parameters for training. Because all needed GA parameters are in the model file.

Termination Criteria should be chosen to terminate training operation. There are two choices as in traditional training; Generation Number and Fitness Value as in Figure 2.33. If Generation Number is selected as termination criterion, then generation number should be written in ‘Generation Number’ text box. If Fitness Value is selected, then fitness value should be entered in ‘Fitness Value’ text box.

(56)

After pressing ‘START’ button training operation starts. Graph that is shown in Figure 2.34 is drawn during training. This graph shows situation of operation, and gives information to the user about FV and GN for each class as in traditional training operation.

Figure 2.34 Graph for Incremental GA

In that example that is illustrated in figures above, firstly nursery dataset that includes 12810 patterns is trained and a model is created for that operation. After that, 150 new patterns are added in dataset. If there is not that Incremental GA function of the tool, then traditional training operation would be applied, and this would cause waste of time. If Figure 2.34 is looked, it can be seen that FV for spec_prior class starts from 32.60, not 0. Because initial population is not created fully randomly, classification for that exists in model file, is added in initial population.

(57)

49

CHAPTER THREE NEURAL NETWORK

3.1 Related Works

In recent years, many studies which NNs have been applied for classification problem have been done. Mazurowski , Habas, Zurada, Lo, Baker and Tourassi (2007), investigate the effect of class imbalance in training data when developing neural network classifiers for computer-aided medical diagnosis. Molnár, Keserű, Papp, Lőrincz , Ambrus and Ferenc Darvas (2006) developed a NN based classification approach using cytotoxicity data measured for 30,000 compounds to predict cytotoxicity. Yu and Zhu (2009) combined neural networks and semantic feature space for email classification. Manevitz and Yousef (2006) developed one-class document classification by using Neural Networks. Banerjee, Kiran, Murty and Venkateswarlu (2008) presented an ANN for classification and identification of Anopheles mosquito species based on the internal transcribed spacer2 (ITS2) data of ribosomal DNA string. Übeyli (2008) used combined NN model to guide model selection for classification of electroencephalogram (EEG) signals.

3.2 Neural Network

NN is a system that includes units which have small amount of local memory. These units connect each other with more than one communication channel. These channels carry out numerical data. Each unit process its local data, and units run asynchronously.

3.2.1 Simple Single Unit Network

Simple single unit network includes input, weights, nucleus, activation and output as it is shown in Figure 3.1.

(58)

Figure 3.1 Simple single unit network

In is input of network, wn is weight of In. Nucleus includes a summation function that calculate weighted sum of inputs. Most widely used summation function is illustrated below.

Definition 10 Weighted sum of inputs calculated as;

n

Yin = ∑ Inwn i=1

where In is input value, and wn is the weight.

Yin is the input of activation function, f. Activation function provides to process input that is calculated with summation function, and to calculate output of the network. There are many activation functions. In that study Sigmoid, Gaussian, Identity, Unit Step, Piecewise Linear and Hyperbolic Tangent functions are used as activation functions.

(59)

3.2.2 Activation Functions

3.2.2.1 Sigmoid Function

This is the most widely used activation function in NN. Sigmoid function gives continues results to the inputs. Results are not discreet. This function is suitable for the problems which sensitive evaluation should be applied. Result of the sigmoid function is between 0 and 1.

Definition 11 Sigmoid function is;

) (

1

1 )

(

_x _b

e

x

f

₋ ₊

+

=

_β

where ß is gradient, x is input and b is the bias.

3.2.2.2 Gaussian Function

Gaussian function provides easier to prediction of the behaviour of the net when the input patterns differ strongly from all teaching patterns.

Definition 12 Gaussian function as activation function is

2 2 ) ( 2

2

1 )

(

β µ

πβ

− −

=

x

e

x

f

(60)

3.2.2.3 Identity Function

This function is noneffective function.

Definition 13 Identity function is;

x

f

(

)

=

3.2.2.4 Unit Step Function

If input is greater than 0, output is 1, otherwise output is 0. This function can be used for simple problems. This function is not useful for complex problems.

Definition 14 Unit step function is;

0 )

( =

x

f

if

x

>

0

,

f

( =

x

)

1 if

x

≥

0

3.2.2.5 Piecewise Linear Function

Piecewise linear function is combination of sigmoid and unit step functions. This function returns both continues and discrete values. This function returns continues values between 0 and 1, and calculates discrete values 0 or 1.

Definition 15 Piecewise Linear function is;











≥

>

+

≤

=

max min max min

1

0 )

(

x

if

x

if

b

mx

x

if

x

f

(61)

3.2.2.6 Hyperbolic Tangent

Difference of that function from others, that function returns results between -1 and 1.

Definition 16 Hyperbolic Tangent function is;

)

1 /(

)

1 (

)

(

2x 2x

e

x

f

=

−

+

− 3.2.3 Termination Criteria

Training operation in neural network attend until termination criteria are met. There are two termination criteria that are most widely used. These are minimum error and

iteration number. In that study these are used as termination criteria.

3.2.3.1 Minimum Error

As its names, there is a given value that represents minimum error which network should reach that value. Training operation continues until error value of the network reaches given minimum error value. Disadvantage of that method is wrong minimum error value which system can never reach, can be given. In that situation, loop can not terminate.

3.2.3.2 Iteration Number

There is a given number which shows how many iterations will be done. Training attends until iteration number reaches given iteration number. When it reaches limit value, training terminates. Disadvantage of that method, small iteration number can be given as termination criterion. In that situation, training terminates before reaching optimum result. When bigger than network needs to reach optimum solution,