• Sonuç bulunamadı

A new clustering method based on the bio-inspired cuttlefish optimization algorithm

N/A
N/A
Protected

Academic year: 2021

Share "A new clustering method based on the bio-inspired cuttlefish optimization algorithm"

Copied!
13
0
0

Yükleniyor.... (view fulltext now)

Tam metin

(1)

O R I G I N A L A R T I C L E

A new clustering method based on the bio-inspired cuttlefish

optimization algorithm

Adel Sabry Eesa

1

| Zeynep Orman

2

1

Computer Science Department, University of Zakho, Duhok, KRG, Iraq

2

Department of Computer Engineering, Istanbul University-Cerrahpasa, Istanbul, Turkey

Correspondence

Zeynep Orman, Department of Computer Engineering, Istanbul University-Cerrahpasa, Istanbul, Turkey.

Email: ormanz@istanbul.edu.tr

Abstract

Most of the well-known clustering methods based on distance measures, distance

metrics and similarity functions have the main problem of getting stuck in the local

optima and their performance strongly depends on the initial values of the cluster

centers. This paper presents a new approach to enhance the clustering problems with

the bio-inspired Cuttlefish Algorithm (CFA) by searching the best cluster centers that

can minimize the clustering metrics. Various UCI Machine Learning Repository

datasets are used to test and evaluate the performance of the proposed method. For

the sake of comparison, we have also analysed several algorithms such as K-means,

Genetic Algorithm and the Particle Swarm Optimization (PSO) Algorithm. The

simula-tions and obtained results demonstrate that the performance of the proposed

CFA-Clustering method is superior to the other counterpart algorithms in most cases.

Therefore, the CFA can be considered as an alternative stochastic method to solve

clustering problems.

K E Y W O R D S

clustering, cuttlefish optimization algorithm, genetic algorithm, K-means algorithm, particle swarm optimization

1

| I N T R O D U C T I O N

Clustering is a descriptive unsupervised learning method where the main objective is to discover some new sets of categories from a dataset by grouping similar instances together (Filippone, Camastra, Masulli, & Rovetta, 2008). In recent years, data clustering has been employed in many areas, such as data mining (Peters, 2006), (Shanghooshabad & Abadeh, 2016), image compression (Gupta & Sinha, 2017), (Ryu, Lee, & Lee, 2014), image segmentation (Ray & Turi, 2000), machine learning (Al-Omary & Jamil, 2006), (Min et al., 2018). K-means is one of the most commonly used algorithms for data clustering which utilizes the squared error criterion. It partitions a set of n objects into k clusters so that the resulting intra-cluster similarity is high and the inter-intra-cluster similarity is low. The intra-cluster similarity is measured according to the mean value of the objects in the cluster (Lai, Huang, & Liaw, 2009).

The main problem with K-means Algorithm is its tendency to converge to a local optima. The other problem with K-means Algorithm is that it depends strongly on the initial values of the cluster centers (Likas, Vlassis, Verbeek, & J., 2003). In the recent literature, several methods have been proposed by many researchers to overcome these limitations. (Maulik & Bandyopadhyay, 2000) presented Genetic Algorithms (GAs) to sea-rch for the cluster centers which minimize the sum of the absolute Euclidean distances of each point from the respective cluster centers. (Fränti & Virmajoki, 2006) has used the iterative shrinking approach to generate clustering hierarchically by eliminating one cluster at a time. In this method, the cluster to be removed was selected optimally. In the papers (Pham & Al-Jabbouli, 2007) and (Akila, Jayakumar, Shree, & Jain, 2012), Bees Algo-rithm (BA) has been used to avoid the local optima by searching for the best cluster centers which minimizes a given clustering metric. (Borah & Ghose, 2009) proposed a method based on Automatic Initialization of Means (AIM) which enhances the efficiency of automating the selection of

DOI: 10.1111/exsy.12478

Expert Systems. 2019;e12478. wileyonlinelibrary.com/journal/exsy © 2019 John Wiley & Sons, Ltd 1 of 13 https://doi.org/10.1111/exsy.12478

Expert Systems. 2020;37:e12478. wileyonlinelibrary.com/journal/exsy © 2019 John Wiley & Sons, Ltd 1 of 13 https://doi.org/10.1111/exsy.12478

(2)

the initial means. This method was in fact an extension of K-means Algorithm providing the number of clusters to be generated. In addition, many Evolutionary Algorithms (EAs) have also been proposed for clustering in the last decade (Bandyopadhyay & Maulik, 2002) and (Vijendra & Laxman, 2014). Swarm optimization has also been considered by many researchers to enhance the clustering process (Chuan Tan, Ting, & Teng, 2011) and (He, Hui, & Sim, 2006).

In recent years, many papers have been published in the literature in which new meta-heuristic optimization algorithms were proposed. These algorithms and related papers are summarized in (Wikipedia, the free encyclopedia, 2019). In most of these research papers, in order to show the effectiveness of the proposed methods, each of them is compared with the most well-known algorithms such as GA and PSO (Harifi, Khalilian, Mohammadzadeh, & Ebrahimnejad, 2019), (Yang, 2012), (Heidari et al., 2019), (Biyanto et al., 2016), and (Yang & Suash Deb, 2009).

Bio-inspired optimization algorithms have been used to solve different types of problems such as engineering problems (Karagöz & Yıldız, 2017), (A. R. Yıldız, Kurtulus¸, Demirci, Yıldız, & Karagöz, 2016), (B. S. Yıldız & Yıldız, 2017), (B. S. Yıldız, 2017), (B. S. Yıldız & Yıldız, 2018), (Kiani & Yıldız, 2016), (Yıldız, 2013), and (Yıldız, 2012), feature selection (Adel S. Eesa, Orman, & Brifcani, 2015), (Rostami & Moradi, 2014), (Ahmad, Salah, Sabry, ALhabib, & Shaikhow, 2018) and (Adel S. Eesa, Abdulazeez, & Orman, 2017), data mining (Shanghooshabad & Abadeh, 2016), (Shi, Tian, Kou, Peng, & Li, 2011) and (Parpinelli, Lopes, & Freitas, 2002), and image processing (Bejinariu, Costin, Rotaru, Luca, & Nita, 2015), (Jino Ramson, Lova Raju, Vishnu, & Anagnostopoulos, 2019), and (Hemanth & Balas, 2019).

In this study, a new clustering approach based on the CFA to avoid local optima of K-means Algorithm is proposed, which carries out a search on the best cluster centers by minimizing the clustering metrics. In order to show the effectiveness of the proposed approach, the obtained results are also compared with the most well known algorithms such as GA, PSO and K-means.

This paper is organized as follows: Section 2 provides preliminaries and a brief overview of K-means and CFA. The proposed clustering approach based on CFA is described in Section 3. Section 4 presents the experimental setup and the obtained comparative results. Finally, Section 5 provides the conclusions and a suggestion for future works.

2

| P R E L I M I N A R I E S

2.1 | K-means Algorithm

K-means Algorithm is one of the most popular partitioning based clustering method. For a given a set of numeric objects S and an integer number k, the K-means Algorithm splits S into k groups that minimizes the sum of the squares of the Euclidean distance between data objects and their closest cluster centers (Lai & Liaw, 2008). The formulation of this criterion is given by Equation (1).

Emin= Xk i = 1 X X∈Si ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X−Ci ð Þ2 q ð1Þ

where k is the number of the clusters, Siis a subset of data belongs to cluster i, X is a vector which represents the data object belongs to Siand Ci is the mean of the points in Sithat represents the center of cluster i.

2.2 | Cuttlefish Optimization Algorithm (CFA)

CFA is a population based optimization algorithm developed in 2013 by (Sabry Eesa, Mohsin, Brifcani, & Orman, 2013), (Adel Sabry Eesa, Brifcani, & Orman, 2014), (Adel Sabry Eesa et al., 2015). This algorithm mimics the colour changing behavior of a cuttlefish by considering two main pro-cesses as reflection and visibility. The reflection process is considered to imitate the light reversal mechanism, while the visibility process is used to imitate the clarity of matching patterns. The formulation of finding a new solution (newP) is formulated in Equation (2).

newP = reflection + visibility ð2Þ

This algorithm is based on the interaction between different layers of the cells seen in a cuttlefish including chromatophores, leucophores and iridophores, and the operator of light reflection that allows cuttlefish to produce many different array of patterns and colours. The three main skin layers and the reflection operators through the six cases used by the cuttlefish are depicted in Figure 1.

The CFA divides the population into four groups as G = {G1, G2, G3, G4} and it carries out two global and two local searches to find the global optima. For one of the global searches, the interaction between chromatophores and iridophores cells is considered in cases 1 and 2 for G1which is used to search for the global optima in a specific interval around the best solution. For the other global search, the reflection operator of the leucophores cells in case 6 for G4is used to reflect any random solution from the search space. As a local search, the iridophores cells and their interaction with chromatophores in cases 3 and 4 for G2are used to produce an interval around the best solution and this interval is used as a

(3)

new search space. The other local search is considered in case 5 for G3as the interaction between chromatophores and leucophores cells and it is used to generate a new state space around the best solution. This new space is produced based on the best solution and the average value of the best points.

3

| C L U S T E R I N G U S I N G C U T T L E F I S H A L G O R I T H M

Since CFA can be used as a swap and combine mechanism as mentioned in (Adel Sabry Eesa et al., 2015) and (Taheri, Ahmadzadeh, & Kharazmi, 2015), the same mechanism and formulations will be reused in the current study. In order to solve the clustering problem, the CFA is used as a search strategy to find the appropriate cluster centers that minimizes the sum of squares of the Euclidean distance in Equation (1). We should point out here that CFA has been used for the first time to overcome the limitation of the clustering task. The main steps of the proposed cluster-ing approach are summarized in 1:

3.1 | Initialization

The proposed method starts with the population P of N random solutions, P = {p1, p2,… , pN}. Each picontains k centers chosen arbitrarily from the given dataset, where k represents the number of clusters. For the dimension space D, the centers are formed as a vector with a length of k*D real values. In order to illustrate this, consider a two dimensional dataset where k is set to 2. If we assume that pi is given as follows: pi= {25.5, 30.7, 55.4, 10.6}, then (25.5, 30.7) represents center 1 and (55.4, 10.6) represents center 2. The k clusters are produced by assigning each record to one of the given clusters based on the chosen centers, and it selects the nearest one. In order to update the centers, an average point is calcu-lated for each cluster, and then the previous centers are replaced by the new calcucalcu-lated centers. The formulation of finding the new centers is defined by Equation 3. Z*i=1 Li X Xj∈Ci Xj, i = 1, 2,…,k ð3Þ

where Ziis the new center of cluster i, Xjis an instance j belongs to cluster i, Ciis the cluster i and Liis the number of data instances in Ci. In order to keep the best solution, the population is evaluated by using a fitness function which minimizes Equation (1).

F I G U R E 1 The six cases of reflection used by the cuttlefish (A.S. Eesa et al., 2015)

(4)

3.2 | Applying the Light Reflection Mechanism

In order to mimic the six cases of the light reflection mechanism used by the cuttlefish, the population is divided equally into four groups: G1, G2, G3and G4. G1uses case 1 and case 2, G2uses case 3 and case 4, G3uses case 5, and finally G4uses case 6.

3.2.1 | Case 1 and Case 2 for

G

1

These two cases represent the light reflection actions used by the chromatophores and the iridophores cells, which are used to find a new solu-tion. The chromatophores cell is used to produce an interval where the saccule can shrink or stretch respectively corresponding the contraction and relaxation of the muscles. In our case, this interval is represented by a subset of center values such that the size of this subset can be stretched or shrinked. For these two cases, the same formulations of finding the new solution used in (Adel Sabry Eesa et al., 2015) are reused as follows:

newCenters = ReflectionVisibility ð4Þ

Reflection = randomSubset R½   currentCenters ð5Þ

V = bestCenters:Size−R

Visibility = bestCenters V½  ð6Þ

Equation (4) represents merging (union) of Reflection and Visibility processes. In equation (5), R denotes an integer whose value is selected ran-domly between 1 and the size of the current center. Reflection is a subset produced by choosing the R random elements from the current center while Visibility is a subset of size V whose elements are selected from the best centers. The indexes of the selected elements correspond to the remaining indexes in the current center. For more illustration, consider the following example:

Example 1. Let p be any element in G1and Best be the best solution obtained for the time being. Consider the value of k = 2 with 2 dimensional dataset and p. Centers = {pc11, pc12, pc21, pc22}and Best. Centers = {Bestc11, Bestc12, Bestc21, Bestc22} where (pc11, pc12) and (pc21, pc22) are the points of center 1 and center 2, respectively. As a result, the operations of case 1 and case 2 work as follows: R = random (1, 4) = 2, where 4 is the size of p.Centers. Then, Reflection is produced as a subset by choosing two random elements from p. Centers, such as: Reflection = {pc11, pc22}, and

ALGORITHM 1

Proposed CFA Based Clustering Approach

Input:

K: Number of clusters N: Size of population P. Output:

A set of K clusters (Best solution) Method:

1. Initialize the population P (see Section 3.1). 2. Divide P into 4 Groups (G1, G2, G3, G4). 3. Repeat:

a. Apply case (1, 2) on G1-Global1 to produce a new solution. Compare the new solution with current solution and best solution then keep the best.

b. Apply case (3, 4) on G2-Local1 to produce a new solution. Compare the new solution with current solution and best solution then keep the best.

c. Apply case (5) on G3-Local2 to produce a new solution. Compare the new solution with current solution and best solution then keep the best.

d. Apply case (6) on G4-Global2 to produce a new solution. Compare the new solution with current solution and best solution then keep the best.

(5)

V = 4–2 = 2. Visibility subset then can be found directly from the Best. Centers using the remaining indexes 2 and 3 in Reflection subset, which are (Bestc12and Bestc21), respectively. Thus, the new solution can be found easily by combining the Reflection and the Visibility in a new set as shown below:

newCenters = pf c11, pc22g Bestf c12, Bestc21g = pf c11, Bestc12, Bestc21, pc22g

This operator works in the same way as the crossover operator in Genetic Algorithms.

3.2.2 | Case 3 and Case 4 for

G

2

Since Iridophores cells are used by a cuttlefish to hide its organs, the final out-going colour of cuttlefish corresponds to the Best solution in the proposed algorithm. To this end, it is considered that the Reflection subset is produced by removing only one element from the Best solution ran-domly, and the Visibility subset is represented by only one element selected from the current solution. This element has the same index of the removed element from the Best solution. Then the final reflected colour can be produced directly by combining the Visibility subset with the Reflection subset. The formulations of finding the Reflection and Visibility used in (Adel Sabry Eesa et al., 2015) are reused as shown below:

Reflection = bestCenters−bestCenters R½  ð7Þ

V = R

Visibility = currentCenters V½  ð8Þ

where, R represents the random index of the element that should be removed from the best solution, and V represents the index of the element that should be selected from the current solution. Now, consider the following example for further clarification:

Example 2. Using the same assumptions in Example 1 with p. Centers = {pc11, pc12, pc21, pc22}and Best. Centers = {Bestc11, Bestc12, Bestc21, Bestc22}, the operations of case 3 and case 4 can be calculated as follows: R = random (1, 4) = 3, where 4 is the size of bestCenters.

Reflection = Bestf c11, Bestc12, Bestc21, Bestc22g – Bestc21= Bestf c11, Bestc12, Bestc22g

Since V = R = 3 then: Visibility = p:Centers V½  = pc21: Finally,the new solution is found by the combination of the Reflection and the Visibility:

newCenters = Bestf c11, Bestc12, Bestc22g + pc21= Bestf c11, Bestc12, pc21, Bestc22g

3.2.3 | Case 5 for

G

3

In order to mimic Case 5 for G3, the average value between the best solution values and the current solution values and any random value between the intervals of best solution and the average value both form a new solution. The Equations of finding the Reflection and the Visibility are reformulated as follows:

Reflectioni= R*bestCenters i½ ð9Þ

Visibilityi= V* bestCenters i½−Avð iÞ ð10Þ

where i is the ithpoint of the centers, Av

iis the average value of the ithbest solution and the ithcurrent solution centers, R is set to 1 and V is a real random value between (−1, 1). The following example gives more details on this case.

(6)

Example 3. Let best. Centers[i] = 52.5 and the current solution p.Centers[i] = 46.8, the average value Aviis calculated as: Avi= (best. Centers[i] + p. Centers[i]) /2 = 49.65.

V = randomð−1,1Þ = 0:8

Since reflection = bestCenter[i] = 52.5, then: Visibility = 0.8* (52.5–49.65) = 2.28. The new solution newCenters[i] can be found by adding Visibility to Reflection.

newCenters i½  = Reflection + Visibility = 52:5 + 2:28 = 54:78

3.2.4 | Case 6 for

G

4

A cuttlefish uses the Leucophores Layer to hide itself in its environment by reflecting the incoming colour without any changes. As a simulation, any random solution can be presented as a new solution. In our case, the random solutions are generated from the input dataset. This case works exactly in the same manner as the initialization process.

The flow chart of the main steps of the proposed method is shown in Figure 2.

4

| E X P E R I M E N T S A N D R E S U L T S

In order to evaluate the performance of the proposed CFA-Clustering approach, two types of datasets (Shape and UCI datasets) are used to make comparisons with the performance of the most well-known algorithms such as K-means, GA and PSO. Shape datasets are 2-dimensional datasets containing different numbers of instances and classes. The other type of datasets is a collection of several UCI real life datasets. The numbers of

F I G U R E 2 The flow chart of the proposed method

(7)

dimensions, instances and classes are varied in each dataset. All these datasets are available in (Fränti & Sieranoja, 2018). The details of the two types of datasets are described in Tables 1 and 2.

The four clustering methods CFA-Clustering, GA-Clustering, PSO-Clustering and K-means Clustering are implemented with Visual Studio.NET 2010 environment using Visual C# language. The implementation of GA-Clustering and its parameters are borrowed from (Maulik & Ban-dyopadhyay, 2000) with crossover probability c = 0.8 and mutation probability m = 0.01. The parameters of PSO are borrowed from (Eberhart & Shi, 2000) and the values of c1and c2are set to 1.49445, while the inertia factor w is set to 0.729. Since the sizes of all datasets are small, the population size of these algorithms is set to 12 and number of iterations is set to 200. Finally, the sum of the squares of the Euclidean Distance is used as the fitness function to evaluate the performance of these four methods. The results obtained in all experiments are the average for 30 independent runs. Tables 3–5 and Figure 3–5 illustrate the results that are obtained by using the Shape datasets, while Tables 6–8 and Figure 6– 8 illustrate the results obtained by using UCI datasets. The two terms Max and Min in the tables denote the maximum and the minimum fitness values which are obtained using Equation (1). The terms Mean and SD represent the average fitness and the standard deviation respectively. The bold results emphasize the algorithms showing better performance than the others.

From Tables 3–5, using the Shape datasets, it is obvious that the performances of the CFA are considerably better than the other algorithms in all cases except for the Flame dataset. In this dataset, the PSO produces the same results in all terms Mean, Min, Max and SD with the CFA. The worst results in all terms are obtained by the K-means Algorithm.

The obtained results using the UCI datasets given in Tables 6–8 show that the Mean value, Min value, Max value and SD of CFA are the best when they are compared with the other algorithms except for the results that are obtained by Glass dataset. With this dataset, GA has produced better results than the other algorithms in terms of Min and Max values. The worst results are also obtained with the K-means Algorithm by using the UCI datasets. In addition, from Figure 3–8, it is clearly seen that the progress of the convergence curve of CFA during the iterations is more robust and stable than those of the other algorithms.

From Table 3 and Figure 3 it is clearly seen that the performance of the CFA is much better than the GA and the K-means Algorithm in all cases especially with the term SD. CFA obtained 2.31261E-13 while GA and K-means obtained 0.723706433 and 4.05075191, respectively. PSO obtained the same results in all terms of Mean, Min, Max and SD with the CFA. The worst results in all terms were obtained with the K-means Algorithm.

For the Jain dataset, while the CFA, GA, and PSO produced the same result for the Min value, the performance of the CFA was better than the other algorithms in terms of Mean, Max and SD as shown in Table 4. The worst results in all terms were obtained with the K-means Algorithm. PSO was faster than the CFA as seen in Figure 4. However, CFA showed its robustness and its stability during the progress of the iterations.

From Table 5 and Figure 5 it is clearly seen that the performance of the CFA for the R15 dataset was much better than all other algorithms in all terms. While GA was performed better than both the PSO and K-means Algorithm, for Mean and Min terms, K-means obtained better results than the results produced by PSO.

T A B L E 1 Shape datasets

Name #Instances #Dimensions #Classes

Flame 240 2 2

Jain 373 2 2

R15 600 2 15

T A B L E 2 UCI real life datasets

Name #Instance #Dimensions #Classes

Glass 214 9 7

Iris 150 4 3

Thyroid 215 5 2

T A B L E 3 Results using Flame dataset

Algorithms Mean Max Min SD

CFA 772.3627293 772.3627293 772.3627293 2.31261E-13

GA 772.7308331 776.02645 772.3627293 0.723706433

PSO 772.3627293 772.3627293 772.3627293 2.31261E-13

(8)

For the Glass dataset, the performance of the CFA was better than all other algorithms for the Mean and the SD terms. GA produced better results for the terms Min and Max when it was compared with the results that were obtained from the other algorithms. The worst results were obtained with the PSO and K-means Algorithm and there was a big difference between the performance of CFA and GA when compared with the performance of PSO and K-means. The obtained results and the convergence curves for this experiment are described in Table 6 and Figure 6, respectively.

F I G U R E 3 The convergence curves of CFA, GA and PSO by using Flame dataset

T A B L E 4 Results using Jain dataset

Algorithms Mean Max Min SD

CFA 2604.9633 2605.341828 2604.911646 0.113937892

GA 2605.40625 2605.929192 2604.911646 0.199057034

PSO 2604.97254 2605.351547 2604.911646 0.140600126

K_means 2615.133509 2625.721302 2605.351547 9.610909913

F I G U R E 4 The convergence curves of CFA, GA and PSO by using Jain dataset T A B L E 5 Result using R15 dataset

Algorithms Mean Max Min SD

CFA 255.317668 308.4257726 225.1770571 20.2864408

GA 274.9271411 315.3965549 225.1770571 26.87985635

PSO 458.7424906 549.925483 351.1966147 47.8892323

(9)

From Table 7, for the Iris dataset, it was noticed that the Mean, Min, Max and SD terms of the CFA were obtained as the best values when they were compared with the values obtained with other algorithms mentioned in this paper. From the numerical comparisons, it was also realized that PSO performed as the second best algorithm, while K-means performed as the worst one. The progress of the CFA during the iterations is shown in Figure 7.

F I G U R E 5 The convergence curves of CFA, GA and PSO by using R15 dataset

T A B L E 6 Result using Glass dataset

Algorithms Mean Max Min SD

CFA 205.2477766 214.4836371 201.4054599 2.570716072

GA 205.8260701 213.6382142 201.1815294 3.05930592

PSO 218.8993808 233.4458545 206.463713 7.966352712

K_means 219.1191883 254.2095796 207.3027761 9.129736823

F I G U R E 6 The convergence curves of CFA, GA and PSO using Glass dataset T A B L E 7 Results using Iris dataset

Algorithms Mean Max Min SD

CFA 97.22617299 97.23224099 97.22212765 0.005039202

GA 97.25381816 97.37411774 97.22212765 0.049184489

PSO 97.23318153 97.32592423 97.22212765 0.03154641

(10)

It was observed that the Mean, Min, Max and SD values of the CFA were obtained as the best values over the values obtained by all the algo-rithms listed in Table 8. SD term of the CFA was at least seven order of magnitude better than the SD values obtained by the other algoalgo-rithms. While PSO was performed as the second best algorithm, K-means was performed as the worst one. Figure 8 describes the convergence curves of the three algorithms CFA, GA and PSO.

5

| C O N C L U S I O N S

This paper has studied the application of the CFA on the clustering problems. It has also been shown that the CFA is capable of searching for the best cluster centers. Our proposed method has ensured that the cluster centers will not easily get stuck in a local minima, which is a major draw-back of the K-means Algorithm. The CFA was used as a search strategy which can minimize the clustering metrics. The performance of the pro-posed CFA-Clustering method has been evaluated based on the two types of datasets: Shapes and UCI real life datasets. Then, it has been

F I G U R E 7 The convergence curves of CFA, GA and PSO using Iris dataset

T A B L E 8 Result using Thyroid dataset

Algorithms Mean Max Min SD

CFA 305662346.7 305662346.7 305662346.7 1.21247E-07

GA 305909819.4 308472558.9 305662346.7 598709.2981

PSO 306125944.9 310298329.1 305662346.7 1414570.708

K_means 307891970.2 310476251.3 305662346.7 2276659.107

F I G U R E 8 The convergence curves of the three algorithms CFA, GA and PSO using thyroid dataset

(11)

compared with the three well-known algorithms GA, PSO and K-means. The empirical results revealed that the proposed CFA-Clustering method performs better than the other algorithms in most cases. Through six experiments that are using six different datasets, the proposed CFA-Clustering is proved to perform better in five datasets than the other algorithms in all terms. GA can perform better in only one dataset as far as Min and Max terms are concerned. It has also been observed that the worst results for all datasets have been obtained with the K-means Algo-rithm. Therefore, the CFA can be preferred as an alternative method to solve the clustering problems.

As a future work, it is suggested to adjust the number of the best k clusters by adding the distance between the clusters as a new parameter to the evaluation criteria. More comparative studies will also improve the field of building clustering techniques based on the CFA.

O R C I D

Adel Sabry Eesa https://orcid.org/0000-0001-7106-7999

Zeynep Orman https://orcid.org/0000-0002-0205-4198

R E F E R E N C E S

Ahmad, A. S. S., Salah, M., Sabry, A., ALhabib, O., & Shaikhow, S. (2018). Features Optimization for ECG Signals Classification. International Journal of Advanced Computer Science and Applications, 9(11), 383–389. https://doi.org/10.14569/IJACSA.2018.091154

Akila, D., Jayakumar, C., Shree, G., & Jain, S. (2012). Link-based Ensemble Approach for Web Information Retrieval Using Honey Bee and K-Means Algorithm. International Journal of Advanced Research in Computer Science and Software Engineering, 2(11), 75–81. Retrieved from www.ijarcsse.com Al-Omary, A. Y., & Jamil, M. S. (2006). A new approach of clustering based machine-learning algorithm. Knowledge-Based Systems, 19(4), 248–258.

https://doi.org/10.1016/J.KNOSYS.2005.10.011

Bandyopadhyay, S., & Maulik, U. (2002). Genetic clustering for automatic evolution of clusters and application to image classification. Pattern Recognition, 35(6), 1197–1208. https://doi.org/10.1016/S0031-3203(01)00108-X

Bejinariu, S.-I., Costin, H., Rotaru, F., Luca, R., & Nita, C. (2015). Image processing by means of some bio-inspired optimization algorithms. E-Health and Bioengineering Conference. IEEE, 1–4. https://doi.org/10.1109/EHB.2015.7391356

Biyanto, T. R., Fibrianto, H. Y., Nugroho, G., Hatta, A. M., Listijorini, E., Budiati, T., & Huda, H. (2016). Duelist Algorithm: An Algorithm Inspired by How Duelist Improve Their Capabilities in a Duel. In ICSI 2016. Lecture notes in computer science (pp. 39–47). Cham: Springer. https://doi.org/10.1007/ 978-3-319-41000-5_4

Borah, S., & Ghose, M. K. (2009). Performance Analysis of AIM-K-means & K-means in Quality Cluster Generation. Journal of Computing, 1(1), 175–178. Retrieved from. https://pdfs.semanticscholar.org/a080/875e22c4fb9771cdc7f7daedc2cd723c97b1.pdf

Chuan Tan, S., Ting, K. M., & Teng, W. (2011). Simplifying and improving ant-based clustering. Procedia Computer Science, 4, 46–55. Elsevier. https://doi. org/10.1016/j.procs.2011.04.006

Eberhart, R. C., & Shi, Y. (2000). Comparing inertia weights and constriction factors in particle swarm optimization. In Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512),1, 84–88. IEEE. https://doi.org/10.1109/CEC.2000.870279

Eesa, A. S., Abdulazeez, A. M., & Orman, Z. (2017). A DIDS Based on The Combination of Cuttlefish Algorithm and Decision Tree. Science Journal of University of Zakho, 5(7), 313–318. https://doi.org/10.25271/2017.5.4.382

Eesa, A. S., Brifcani, A., & Orman, Z. (2014). A New Tool for Global Optimization Problems-Cuttlefish Algorithm. International Journal of Computer and Information Engineering, World Academy of Science, Engineering and Technology, 8(9), 1235–1239. Retrieved from. https://waset.org/publications/ 9999515/a-new-tool-for-global-optimization-problems-cuttlefish-algorithm

Eesa, A. S., Orman, Z., & Brifcani, A. M. A. (2015). A novel feature-selection approach based on the cuttlefish optimization algorithm for intrusion detection systems. Expert Systems with Applications, 42(5), 2670–2679. https://doi.org/10.1016/J.ESWA.2014.11.009

Filippone, M., Camastra, F., Masulli, F., & Rovetta, S. (2008). A survey of kernel and spectral methods for clustering. Pattern Recognition, 41(1), 176–190. https://doi.org/10.1016/J.PATCOG.2007.05.018

Fränti, P., & Sieranoja, S. (2018). K-means properties on six clustering benchmark datasets. Applied Intelligence, 48(12), 4743–4759. https://doi.org/ 10.1007/s10489-018-1238-7

Fränti, P., & Virmajoki, O. (2006). Iterative shrinking method for clustering problems. Pattern Recognition, 39(5), 761–775. https://doi.org/10.1016/ J.PATCOG.2005.09.012

Gupta, H., & Sinha, P. (2017). A proposed algorithm for image compression using clustering approach. In 2017 IEEE International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM). IEEE, 322–325. https://doi.org/10.1109/ICSTM. 2017.8089178

Harifi, S., Khalilian, M., Mohammadzadeh, J., & Ebrahimnejad, S. (2019). Emperor Penguins Colony: a new metaheuristic algorithm for optimization. Evolutionary Intelligence, 12(2), 211–226. https://doi.org/10.1007/s12065-019-00212-x

He, Y., Hui, S. C., & Sim, Y. (2006). A Novel Ant-Based Clustering Approach for Document Clustering. Information Retrieval Technology. AIRS 2006. Lecture Notes in Computer Science,. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11880592_43

Heidari, A. A., Mirjalili, S., Faris, H., Aljarah, I., Mafarja, M., & Chen, H. (2019). Harris hawks optimization: Algorithm and applications. Future Generation Computer Systems, 97, 849–872. https://doi.org/10.1016/J.FUTURE.2019.02.028

Hemanth, J., & Balas, V. E. (Eds.) (2019). Nature inspired optimization techniques for image processing applications, vol.150. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-96002-9

Jino Ramson, S. R., Lova Raju, K., Vishnu, S., & Anagnostopoulos, T. (2019). Nature inspired optimization techniques for image processing—A short review. In Intelligent Systems Reference Library (Vol. 150). Cham: Springer. https://doi.org/10.1007/978-3-319-96002-9_5

(12)

Karagöz, S., & Yıldız, A. R. (2017). A comparison of recent metaheuristic algorithms for crashworthiness optimisation of vehicle thin-walled tubes considering sheet metal forming effects. International Journal of Vehicle Design, 73(1/2/3), 179–188. https://doi.org/10.1504/IJVD.2017.082593 Kiani, M., & Yıldız, A. R. (2016). A comparative study of non-traditional methods for vehicle crashworthiness and NVH optimization. Archives of

Computational Methods in Engineering, 23(4), 723–734. https://doi.org/10.1007/s11831-015-9155-y

Lai, J. Z. C., Huang, T.-J., & Liaw, Y.-C. (2009). A fast k-means clustering algorithm using cluster center displacement. Pattern Recognition, 42(11), 2551–2556. https://doi.org/10.1016/J.PATCOG.2009.02.014

Lai, J. Z. C., & Liaw, Y.-C. (2008). Improvement of the k-means clustering filtering algorithm. Pattern Recognition, 41(12), 3677–3681. https://doi.org/ 10.1016/J.PATCOG.2008.06.005

Likas, A., Vlassis, N., Verbeek, J., & J. (2003). The global k-means clustering algorithm. Pattern Recognition, 36(2), 451–461. https://doi.org/10.1016/ S0031-3203(02)00060-2

Maulik, U., & Bandyopadhyay, S. (2000). Genetic algorithm-based clustering technique. Pattern Recognition, 33(9), 1455–1465. https://doi.org/10.1016/ S0031-3203(99)00137-5

Min, E., Guo, X., Liu, Q., Zhang, G., Cui, J., & Long, J. (2018). A Survey of Clustering With Deep Learning: From the Perspective of Network Architecture. IEEE Access, 6, 39501–39514. https://doi.org/10.1109/ACCESS.2018.2855437

Parpinelli, R. S., Lopes, H. S., & Freitas, A. A. (2002). Data mining with an ant colony optimization algorithm. IEEE Transactions on Evolutionary Computation, 6(4), 321–332. https://doi.org/10.1109/TEVC.2002.802452

Peters, G. (2006). Some refinements of rough k-means clustering. Pattern Recognition, 39(8), 1481–1491. https://doi.org/10.1016/J.PATCOG.2006.02.002 Pham, D. T., & Al-Jabbouli, H. (2007). Data Clustering Using the Bees Algorithm. Cardiff CF24 3AA, UK. Retrieved from https://www.researchgate.net/

publication/241767604

Ray, S., & Turi, R. H. (2000). Determination of number of clusters in K-means clustering and application in colour segmentation. The 4th international conference on advances in pattern recognition and digital techniques, 137–143. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi= 10.1.1.587.3517

Rostami, M., & Moradi, P. (2014). A clustering based genetic algorithm for feature selection. In 2014 6th Conference on Information and Knowledge Technology (IKT) (pp. 112–116). IEEE. https://doi.org/10.1109/IKT.2014.7030343

Ryu, T., Lee, B. G., & Lee, S.-H. (2014). Image compression system using colorization and Meanshift clustering methods (pp. 165–172). Berlin, Heidelberg: Springer. https://doi.org/10.1007/978-3-642-41671-2_22

Sabry Eesa, A., Mohsin, A., Brifcani, A., & Orman, Z. (2013). Cuttlefish Algorithm– A Novel Bio-Inspired Optimization Algorithm. International Journal of Scientific and Engineering Research, 4(9), 1978–1986. Retrieved from. http://www.ijser.org

Shanghooshabad, A. M., & Abadeh, M. S. (2016). Robust medical data mining using a clustering and swarm-based framework. International Journal of Data Mining and Bioinformatics, 14(1), 22–39. https://doi.org/10.1504/IJDMB.2016.073342

Shi, Y., Tian, Y., Kou, G., Peng, Y., & Li, J. (2011). Optimization based data mining: Theory and applications. London: Springer London. https://doi.org/ 10.1007/978-0-85729-504-0

Taheri, R., Ahmadzadeh, M., & Kharazmi, M. R. (2015). A New Approach For Feature Selection In Intrusion Detection System. Cumhuriyet University Faculty of Science Science Journal, 36(6), 1344–1357. https://doi.org/10.17776/CSJ.30621

Vijendra, S., & Laxman, S. (2014). Effective Evolution of Clusters: A Genetic Clustering Approach. Research Journal of Information Technology, 6(2), 81–100. https://doi.org/10.3923/rjit.2014.81.100

Wikipedia, the free encyclopedia. (2019). Retrieved May 22, 2019, from https://en.wikipedia.org/wiki/List_of_metaphor-based_metaheuristics

Yang, X.-S. (2012). Flower pollination algorithm for global optimization (pp. 240–249). Berlin, Heidelberg: Springer. https://doi.org/10.1007/978-3-642-32894-7_27

Yang, X.-S., & Suash Deb. (2009). Cuckoo Search via Lévy flights. In 2009 World Congress on Nature & Biologically Inspired Computing (NaBIC). IEEE, 210–214. https://doi.org/10.1109/NABIC.2009.5393690

Yıldız, A. R. (2012). A comparative study of population-based optimization algorithms for turning operations. Information Sciences, 210, 81–88. https://doi. org/10.1016/j.ins.2012.03.005

Yıldız, A. R. (2013). Comparison of evolutionary based optimization algorithms for structural design optimization. Engineering Applications of Artificial Intelligence, 26(1), 327–333. https://doi.org/10.1016/j.engappai.2012.05.014

Yıldız, A. R., Kurtulus¸, E., Demirci, E., Yıldız, B. S., & Karagöz, S. (2016). Optimization of thin-wall structures using hybrid gravitational search and Nelder-Mead algorithm. Materials Testing, 58(1), 75–78. https://doi.org/10.3139/120.110823

Yıldız, B. S. (2017). A comparative investigation of eight recent population-based optimisation algorithms for mechanical and structural design problems. International Journal of Vehicle Design, 73(1/2/3), 208–218. https://doi.org/10.1504/IJVD.2017.082603

Yıldız, B. S., & Yıldız, A. R. (2017). Moth-flame optimization algorithm to determine optimal machining parameters in manufacturing processes. Materials Testing, 59(5), 425–429. https://doi.org/10.3139/120.111024

Yıldız, B. S., & Yıldız, A. R. (2018). Comparison of grey wolf, whale, water cycle, ant lion and sine-cosine algorithms for the optimization of a vehicle engine connecting rod. Materials Testing, 60(3), 311–315. https://doi.org/10.3139/120.111153

A U T H O R B I O G R A P H I E S

Adel Sabry Eesa received his PhD degree in Computer Science from the University of Zakho, Iraq in 2015. He is currently working as an Assistant Professor in the Computer Science Department of Zakho University. His research interests include artificial intelligence, optimization algorithms, soft computing, and network security.

Zeynep Orman received her BSc, MSc, and PhD degrees from Istanbul University, Istanbul, Turkey, in 2001, 2003, and 2007, respectively. She has studied as a postdoctoral research fellow in the Department of Information Systems and Computing, Brunel University, London, United Kingdom, in 2009. She is currently working as an Associate Professor in the Department of Computer Engineering, Istanbul

(13)

University-Cerrahpasa. Her research interests include artificial intelligence, neural networks, nonlinear systems, machine learning, and data science.

How to cite this article: Eesa AS, Orman Z. A new clustering method based on the bio-inspired cuttlefish optimization algorithm. Expert Systems. 2019;e12478.https://doi.org/10.1111/exsy.12478

How to cite this article: Eesa AS, Orman Z. A new clustering method based on the bio-inspired cuttlefish optimization algorithm. Expert Systems. 2020;37:e12478.https://doi.org/10.1111/exsy.12478

Referanslar

Benzer Belgeler

Tüm inkübasyon periyotları incelendiğinde Ganoderma lucidum eklenerek hazırlanmış kompozitlerin, farklı molekül ağırlıklarındaki saf PEG (1400, 2250, 8400 g/mol)

Hanehalkının sahip olduğu otomobil sayısı fazla olan bireyler C sınıfında yer alan araç yerine “Diğer sınıf” bünyesinde yer alan aracı tercih etmektedir.. Bu sınıfta

IMDb veri setini sınıflandırmak için kullanılan tekrarlı sinir a˘gında daha ba¸sarılı olan ilk iki eniyileyici de daha uzun süren, CIFAR-10 veri setini sınıflandırmak

Hastalara gönüllü bilgilendirilmiş olur formu (Ek-2) imzalatıldı. Haziran 2006 – Mart 2009 tarihleri arasında Trakya Üniversitesi Kardiyoloji Anabilim Dalı’nda yapıldı.

1960'ta Köyde Bir Kız Sevdim ile başladığı sinema grafiği aradan geçen yıllar boyunca yükselerek sürdü.. Önceleri kırsal kesim insanının yakın bulduğu

Bu çal›flmada menopoz sonras› dönemde kemik mineral yo¤unluk ölçümü ile osteoporoz saptanan ve saptanmayan hastalar›n serumlar›nda kurflun düzeylerine bak›lm›fl ve

Günümüze gelebilen devrinin ve Mehmet A~a'n~n en önemli eserleri ise Edirneli Defterdar Ekmekçio~lu Ahmet Pa~a'n~n yapt~ r~ p Sultan I.Ah- met'e hediye etti~i Ekmekçio~lu Ahmet

Order No Retailer Code Retailer Name Customer Name Numeric Alphanumeric Alphanumeric Alphanumeric Customer Address Confirmation Date Delivery Date Installation