Semi-supervised fuzzy neighborhood preserving analysis for feature extraction in hyperspectral remote sensing images

(1)

ORIGINAL ARTICLE

Semi-supervised fuzzy neighborhood preserving analysis

for feature extraction in hyperspectral remote sensing images

Hasan Ali Akyu¨rek

1 •

Barıs¸ Koçer

2

Received: 6 February 2017 / Accepted: 13 November 2017 / Published online: 21 November 2017 Ó The Natural Computing Applications Forum 2017

Abstract

Semi-supervised feature extraction methods are an important focus of interest in data mining and machine learning areas.

These methods are improved methods based on learning from a combination of labeled and unlabeled data. In this study, a

semi-supervised feature extraction method called as semi-supervised fuzzy neighborhood preserving analysis (SFNPA) is

proposed to improve the classification accuracy of hyperspectral remote sensing images. The proposed method combines

the principal component analysis (PCA) method, which is an unsupervised feature extraction method, and the supervised

fuzzy neighborhood preserving analysis (FNPA) method and increases the classification accuracy by using a limited

number of labeled data. Experimental results on four popular hyperspectral remote sensing datasets show that the proposed

method significantly improves classification accuracy on hyperspectral remote sensing images compared to the well-known

semi-supervised dimension reduction methods.

Keywords Semi-supervised feature extraction

Hyperspectral image classification Remote sensing Fuzzy neighborhood

preserving analysis

1 Introduction

Hyperspectral cameras collect information as a set of

images represented by hundreds of spectral bands. While

hyperspectral images provide richer spectral information

than traditional RGB and multispectral images, these huge

number of spectral bands also present a challenge for

tra-ditional spectral data processing techniques. While the

dimension increases, the performance of conventional

classification methods in the accuracy of classification

decreases [

1 ], due to the multidimensional problem (curse

of dimensionality) when labeled data are limited. To cope

with the curse of dimensionality, feature extraction is used

to reduce the dimensionality of hyperspectral data and to

extract necessary information as much as possible. Feature

extraction methods are often used to classify, perceive and

visualize remote sensing data since they can represent

information in hyperspectral images with fewer bands [

2 ].

There are many supervised and unsupervised methods

for feature extraction. The most well-known unsupervised

feature extraction method is the principal component

analysis (PCA) [

3 ], which is widely used in hyperspectral

remote sensing images. In recent years, in order to preserve

the properties of local neighborhoods, locally linear

embedding [

4 ], Laplacian eigen mapping [

5 ], linear local

tangent space alignment [

6 ], neighborhood preserving

embedding [

7 ], locality preserving projections [

8 ] and

nonlinear dimensionality reduction via tangent space

alignment [

9 ] are applied to extract features from

hyper-spectral images. These methods, which preserve local

neighborhood information, can provide separation in the

large-dimensional feature space.

Supervised methods are providing a data projection

based on the existing labeled data. Supervised feature

extraction methods such as linear discriminant analysis

(LDA) [

10 ] and modified Fisher’s linear discriminant

Electronic supplementary material The online version of this article

(https://doi.org/10.1007/s00521-017-3279-y) contains supplementary material, which is available to authorized users.

& Hasan Ali Akyu¨rek hakyurek@konya.edu.tr

1 _{Department of Management Information Sciences, School of}

Applied Sciences, Necmettin Erbakan University, Konya, Turkey

2 _{Department of Computer Engineering, Faculty of}

Engineering, Selcuk University, Konya, Turkey https://doi.org/10.1007/s00521-017-3279-y

(2)

analysis [

11 ] are also used for feature extraction in

hyperspectral remote sensing images.

In real-world applications, the labeled data are rather

limited and many times expert human resources are needed

to label a lot of data [

12 ,

13 ]. On the other hand, unlabeled

data can be reached at a very low cost. For this reason,

semi-supervised methods, which increase the performance of

extracting features using an unlabeled and limited number of

labeled data, have gained popularity in the field of machine

learning [

14 –

17 ]. Many semi-supervised feature extraction

methods are based on the principle of co-operating a

super-vised method and an unsupersuper-vised method [

18 ].

In this paper, a novel semi-supervised fuzzy

neighbor-hood preserving analysis (SFNPA) method for feature

extraction in hyperspectral remote sensing images is

pre-sented. The proposed SFNPA method aims to find a

pro-jection by preserving local neighborhood information and

maximizing informative and non-redundant data. In the

proposed method, fuzzy neighborhood preserving analysis

(FNPA), a supervised method, and principal component

analysis (PCA), an unsupervised method, are combined.

Extracted features by using the principal component

anal-ysis are classified according to the labeled data, and a new

projection is created by the supervised FNPA method to

enhance classification accuracy by enriching the data

pro-jection together with labeled data. The rest of the paper

organized as follows: In Sect.

2 , a brief note on

semi-su-pervised feature extraction methods is given. The proposed

method SFNPA is explained in Sect.

3 . The experimental

results on three different hyperspectral images are given

and discussed in Sect.

4 . Finally, the conclusion part is

explained in Sect.

5 .

2 Related work

Semi-supervised feature extraction methods are an

impor-tant focus of interest in data mining and machine learning

areas. These methods are improved methods based on

learning from a combination of labeled and unlabeled data.

The methods can be grouped into three main categories.

The first is constraint-based methods that are prone to

decompose based on binary sample constraints, the second

is distance-based methods that apply metric learning

techniques to binary instances and methods in the last

category are hybrid methods that combine the first two

methods in a general probabilistic working environment.

On the other hand, locality preserving is an important

case for preserving local structures [

8 ]. As seen in the

lit-erature, local structures are usually more important for

discriminant analysis than global structures [

19 –

21 ],

because local structures maximize the distance between

data points in different classes in local areas. Many locality

preserving methods are introduced in the literature, such as

optimizing kernel function with applications to kernel

principal analysis and locality preserving projection for

feature extraction [

22 ], kernel self-optimized locality

pre-serving discriminant analysis for feature extraction and

recognition [

23 ]; especially for hyperspectral image

clas-sification context, Li et al. proposed locality preserving

discriminant analysis in kernel-induced feature spaces for

hyperspectral image classification [

24 ], locality preserving

dimensionality reduction and classification for

hyperspec-tral image analysis [

25 ], bilateral filtering inspired locality

preserving projections for hyperspectral images [

26 ].

The fuzzy neighborhood preserving analysis (FNPA)

method, an effective feature projection method, has been

proposed by Khushaba et al. [

27 ,

28 ]. This supervised

method is based on fuzzy and neighborhood discriminant

analysis. Especially by using this projection method, the

data in the vicinity of the same class become closer to each

other, while the data in the different classes become farther.

The x

k

data given in the scattered space can be expressed as

X

¼ x

f

1

; x

2

; . . .; x

k

g, while k ¼ 1; 2; . . .; l and l is the total

number of samples. The membership function of kst

sam-ple of ist class can be expressed in Eq. (

1 ).

l

ik

¼ l

i

ð Þ 2 0; 1

x

k

½

ð1Þ

The diameter r of class i can be expressed in Eq. (

2 )

where

x

i

is the mean of data in class i.

r

¼ max k

x

i

x

k

_r

ð2Þ

In this case, the fuzzy membership function l

ik

can be

expressed in Eq. (

3 ).

l

ik

¼

x

i

x

kr

k

r

þ

2 m1

ð3Þ

where m is the fuzzification parameter which controls the

degree of membership function, [ 0 is a small value to

avoid the zero-division error and r is the standard deviation

involved in the standardized Euclidean distance

calcula-tion. Finally, each sample’s membership in all class is

normalized to provide

P

c_i¼1

l

ik

¼ 1. The total membership

values B

i

of elements in fuzzy class c

i

are expressed in

Eq. (

4 ).

B

i

¼

P

li k¼1

l

ik

ð4Þ

The fuzziness N of elements in all fuzzy classes is

expressed in Eq. (

5 ).

N

¼

P

c i¼1

B

i

ð5Þ

The weighted mean v

i

of samples in class i can be

(3)

v

i

¼

P

li k¼1

l

ik

x

k

P

li k¼1

l

ik

ð6Þ

By using these equations, the generalized class

scatter-ing matrix S

W

and the fuzziness scattering matrix S

B

are

given in Eqs. (

7 ) and (

8 ), respectively.

S

W

¼

P

c i¼1

1 2B

i

X

li k¼1

X

li j¼1

l

ik

l

ij

x

k

x

j

x

k

x

j

T

ð7Þ

where c is the number of classes, B

i

is the total membership

value, l

ik

and l

ij

are the membership of sample k and j in

class i, x

k

and x

j

are the kth and jth samples that belong to

class i

S

B

¼

1 2N

X

c i¼1

X

c j¼1

B

i

B

j

v

i

v

j

v

i

v

j

T

ð8Þ

where c is the number of classes, N is the sum of fuzziness

of samples in all fuzzy classes, B

i

and B

j

are the total

membership value of class i and j, v

i

and v

j

are the

weighted mean of samples in class i and j. The objective of

S

W

is to minimize the distance between samples of the

same class and also incorporates the membership values,

thus considering the samples contribution in the class when

preserving their distances. The objective of S

B

is to

maxi-mize the distance between classes. Based on these

equa-tions, the transformation matrix G

FNPA

can be found as

eigenvectors of Eq. (

9 ).

G

FNPA

¼ arg max

G

trace

G

T

S

B

G

T

_S

W

G

ð9Þ

where the function aims at maximizing the distance

between the samples of different classes while minimizing

the distance between the samples of the same class. The

projection of the data X

is obtained by multiplying the

transformation matrix by the data, as given in Eq. (

10 ).

X

¼ X G

FNPA

ð10Þ

3 Semi-supervised fuzzy neighborhood

preserving analysis

The success of classification methods in high-dimensional

data is achieved by increasing the discriminability of data

in different classes. The principal component analysis

(PCA) method, which has been widely used for dimension

reduction in the classification of hyperspectral remote

sensing images, significantly reduces the dimension of the

data, but also significantly improves classification

accu-racy. In this study, the feature projection ability of the

fuzzy neighborhood preserving analysis method, which is a

supervised method, has been improved with the superior

feature projection ability of the principal component

analysis method, which is an unsupervised method. The

semi-supervised fuzzy neighborhood preserving analysis

method (SFNPA) combines the unsupervised

discrim-inability of the PCA and the supervised discrimdiscrim-inability of

the FNPA, and the decomposition between the classes is

also increased, while the dimension of the data is reduced.

x

i

, which is the center of class i, used in the FNPA method

is calculated as weighted mean in Eq. (

11 ) in SFNPA.

x

i

¼

x

TRAINi

a þ

x

PCAi

1 a

ð

Þ

ð11Þ

where

x

TRAINi

is the center of class i obtained from labeled

data, and x

PCAi

is the center of pre-classified unlabeled data

class which dimension-reduced with PCA and also

pre-classification map created by SVM, and a is a

semi-su-pervision constant, which is in the range [0,1]. Center of

classes is computed as mean of class. In this study, as a

result of experiments a value was chosen as 0.9. By using

Eq. (

11 ), PCA-guided pre-classified unlabeled data can

enhance the FNPA’s discriminability in case of small

labeled data. In this method, we focused on combining

advantages of PCA and FNPA’s discriminability in case of

small labeled data. Pseudocodes of proposed method are

listed in Table

1 .

Table 1 Pseudocodes of proposed method

Algorithm Semi-supervised fuzzy neighborhood preserving analysis for feature extraction in hyperspectral remote sensing images

Inputs X: MxNxD-sized hyperspectral image L: MxN label matrix of labeled data Rdim: Size of reduced dimension

a: Weight parameter for center of class calculation Output X*: MxNxRdim-sized features of data X

Step 1 The dimension of data X is reduced to Rdim by principal component analysis Step 2 The obtained data XPCAare classified according to label matrix L

Step 3 The class centers are calculated by Eq. (11) with the labeled data XTRAINand the pre-classified data XPCAobtained in step 2

Step 4 The data X are analyzed by FNPA method using new class centers from Step 3

(4)

In step 1, the dimension of input data is reduced to Rdim

by using principal component analysis. At step 2,

dimen-sion-reduced data X

PCA

are classified according to label

matrix L. In step 3, class centers are calculated by Eq. (

11 )

with the labeled data XTRAIN

and the pre-classified data

X

PCA

, which is obtained in step 2. Calculated class centers

are used to analyze input data X by FNPA method at step 4

for obtaining transformation matrix. Finally,

dimension-reduced data X* are obtained from Eq. (

10 ) by using the

transformation matrix G

FNPA

that is obtained from Eq. (

9 )

in step 5.

The flowchart of the proposed method for classifying

hyperspectral remote sensing data is shown in Fig.

1 .

At the beginning of flowchart, hyperspectral data are

loaded. After that, hyperspectral data are filtered with 3-D

Gaussian filter for denoising data. At next, the proposed

method is applied to obtain dimension-reduced data. New

data are classified by using SVM and KELM. Finally,

classification results are used to obtain classification

maps.

4 Datasets and experimental design

Experiments were conducted on eight popular

hyperspec-tral remote sensing datasets to verify the success of the

proposed method.

4.1 Indian Pines dataset

The Indian Pines hyperspectral image was taken by the

AVIRIS sensor from the Indian Pines test site in

north-western Indiana. The image, which has a spatial resolution

of 145 9 145 pixels, consists of 224 spectral bands in the

0.4–2.5 lm bandwidth. Sixteen classes are defined in the

ground truth as given in Table

2 . In the dataset consisting

of 10,249 samples except for the background, the total

number of bands was reduced to 200 by deleting 24

spectral bands due to water absorption [

29 ].

4.2 PaviaU dataset

The hyperspectral image of PaviaU was taken from the

Pavia site in northern Italy by the ROSIS sensor. The image

with a spatial resolution of 610 9 340 pixels consists of

103 spectral bands at a bandwidth of 0.43–0.86 lm. Nine

classes are defined in the ground truth as given in Table

3 .

In the dataset consisting of 42,776 samples except for the

background, the geometric resolution is 1.3 m [

30 ].

4.3 Salinas dataset

The Salinas hyperspectral image was taken from the

Sali-nas Valley in California by the AVIRIS sensor. The image

has a spatial resolution of 512 9 217 pixels and consists of

224 spectral bands in the 0.4–2.5 lm bandwidth. Sixteen

Fig. 1 Flowchart of semi-supervised fuzzy neighborhood preserving analysis for hyperspectral remote sensing image

Table 2 Ground-truth classes for the Indian Pines scene and their respective samples number

Class Class name Samples

1 Alfalfa 46 2 Corn-notill 1428 3 Corn-mintill 830 4 Corn 237 5 Grass-pasture 483 6 Grass-trees 730 7 Grass-pasture-mowed 28 8 Hay-windrowed 478 9 Oats 20 10 Soybean-notill 972 11 Soybean-mintill 2455 12 Soybean-clean 593 13 Wheat 205 14 Woods 1265 15 Buildings-Grass-Trees-Drives 386 16 Stone-Steel-Towers 93 Total 10,249

(5)

classes are defined in the ground truth as given in Table

4 .

In the dataset consisting of 54,129 samples excluding

background, 20 spectral bands, which are problematic due

to water absorption, were deleted, and the total number of

bands was reduced to 204 [

30 ].

4.4 Kennedy space center dataset

The Kennedy Space Center (KSC) hyperspectral image

was acquired by the NASA AVIRIS sensor at the Kennedy

Space Center in Florida. The image has a spatial resolution

of 512 9 614 pixels and consists of 224 spectral bands at a

bandwidth of 0.4–2.5 lm. Table

5 defines 13 classes in the

ground truth. The total number of bands was reduced to

176 by deleting 48 spectral bands, which are problematic

due to the water absorption, in the dataset consisting of

9130 samples except for the background [

31 ].

4.5 MUUFL Gulf Port

MUUFL Gulf Port hyperspectral image was acquired by

the ITRES CASI-1500 sensor at the campus of the

University of Southern Mississippi—Gulfport in Long

Beach, MS. The image has a spatial resolution of

325 9 220 pixels and consists of 64 spectral bands at a

bandwidth of 375–1050 nm [

32 ]. Table

6 defines 11

clas-ses in the ground truth [

33 ].

Table 3 Ground-truth classes for the Pavia University scene and their respective samples number

1 Asphalt 6631

2 Meadows 18,649

3 Gravel 2099

4 Trees 3064

5 Painted metal sheets 1345

6 Bare Soil 5029

7 Bitumen 1330

8 Self-Blocking Bricks 3682

9 Shadows 947

Total 42,776

Table 4 Ground-truth classes for the Salinas scene and their respec-tive samples number

1 Brocoli Green Weeds_1 2009

2 Brocoli Green Weeds_2 3726

3 Fallow 1976

4 Fallow Rough Plow 1394

5 Fallow Smooth 2678

6 Stubble 3959

7 Celery 3579

8 Grapes Untrained 11,271

9 Soil Vinyard Develop 6203

10 Corn Senesced Green Weeds 3278

11 Lettuce Romaine 4wk 1068

15 Vinyard Untrained 7268

16 Vinyard Vertical Trellis 1807

Total 54,129

Table 5 Ground-truth classes for the KSC scene and their respective samples number

1 Scrub 46

2 Willow Swamp 1428

3 Cabbage Palm Hammock 830

4 Cabbage Palm/Oak Hammock 830

5 Slash Pine 237 6 Oak/Broadleaf Hammock 483 7 Hardwood Swamp 730 8 Graminoid Marsh 28 9 Spartina Marsh 478 10 Cattail Marsh 20 11 Salt Marsh 972 12 Mud Flats 2455 13 Water 593 Total 9130

Table 6 Ground-truth classes for the MUUFL Gulf Port scene and their respective samples number

1 Trees 23,246

2 Grass Pure 4270

3 Grass Groundsurface 6882

4 Dirt and Sand 1826

5 Road Materials 6687 6 Water 466 7 Shadow Building 2233 8 Buildings 6240 9 Sidewalk 1385 10 Yellowcurb 183 11 Cloth Panels 269 Total 71,500

(6)

4.6 Urban

Urban is one of the most widely used hyperspectral data

used in the hyperspectral unmixing study. There are

307 9 307 pixels, each of which corresponds to a

2 9 2 m

2

area. In this image, there are 210 wavelengths

ranging from 400 to 2500 nm, resulting in a spectral

res-olution of 10 nm. After the channels 1–4, 76, 87, 101–111,

136–153 and 198–210 are removed (due to dense water

vapor and atmospheric effects), we remain 162 channels.

(This is a common preprocess for hyperspectral unmixing

analyses.) There are three versions of the ground truth,

which contain 4, 5 and 6 endmembers, respectively, which

are introduced in the ground truth [

34 –

36 ]. Table

7 defines

six classes in the ground truth.

4.7 Samson

Samson is a sample dataset that is available from the

Opticks project. In this image, there are 952 9 952 pixels.

Each pixel is recorded at 156 channels covering the

wavelengths from 401 to 889 nm. The spectral resolution is

highly up to 3.13 nm. As the original image is too large,

which is very expensive in terms of computational cost, a

region of 95 9 95 pixels is used. It starts from the

(252,332)-th pixel in the original image. These data are not

degraded by the blank channel or badly noised channels

[

34 –

36 ]. Table

8 defines three classes in the ground truth.

Table 7 Ground-truth classes

for the Urban scene and their respective samples number

1 Asphalt 18,570 2 Grass 35,198 3 Tree 22,468 4 Roof 6821 5 Metal 2436 6 Dirt 8756 Total 94,249

Table 8 Ground-truth classes for the Samson scene and their respective samples number

1 Soil 3015

2 Tree 3666

3 Water 2344

Total 9025

Table 9 Ground-truth classes for the Jasper scene and their respective samples number

1 Road 3493

2 Soil 3326

3 Water 2428

4 Tree 753

Total 10,000

Table 10 Overall (OA) classification accuracy (%) and standard deviation (STD) of all datasets classified by the SVM classifier when a different number of samples are selected in each class for training set

NC Indian Pines PaviaU Salinas KSC MUUFL Urban Samson Jasper

1 OA (%) 50.61 51.99 78.57 77.06 42.67 47.90 57.37 55.15 STD – – – – – – – – 2 OA (%) 59.47 65.3 84.25 88.42 54.24 51.98 72.53 62.96 STD ± 0.000 ± 0.000 ± 0.170 ± 0.000 ± 0.000 ± 0.000 ± 0.000 ± 1.130 3 OA (%) 69.26 68.42 87.61 93.87 60.2 56.47 77 69.33 STD ± 0.001 ± 0.000 ± 0.142 ± 0.499 ± 0.555 ± 0.194 ± 24,000 ± 0.496 4 OA (%) 74.57 76.58 87.98 94.8 63.27 60.72 80.76 72.38 STD ± 0.000 ± 0.000 ± 0.115 ± 0.814 ± 1.170 ± 0.289 ± 3.280 ± 0.335 5 OA (%) 78.2 77.76 89.65 96.14 66.28 61.68 75.93 71.04 STD ± 0.000 ± 0.002 ± 0.087 ± 0.963 ± 0.874 ± 0.165 ± 3.290 ± 1.370 10 OA (%) 87.98 85.91 92.7 98.5 74.12 65.13 84.31 73.03 STD ± 0.000 ± 0.000 ± 0.059 ± 0.455 ± 0.000 ± 0.153 ± 1.440 ± 1.180 20 OA (%) 93.25 92.66 95.98 99.65 76.59 65.65 83.92 75.48 STD ± 0.000 ± 0.000 ± 0.032 ± 0.326 ± 0.244 ± 0.774 ± 0.805 ± 1220 30 OA (%) 95.69 95.56 96.11 99.85 78.71 66.68 86.97 76.77 STD ± 0.000 ± 0.020 ± 0.000 ± 0.228 ± 0.435 ± 0.035 ± 1.020 ± 0.125 50 OA (%) 97.64 96.96 98.17 99.94 81.41 68.05 87.59 79.41 STD ± 0.000 ± 0.256 ± 0.017 ± 0.184 ± 0.429 ± 0.240 ± 0.525 ± 0.589 100 OA (%) 98.93 98.49 98.96 99.99 85.44 68.74 90.27 80.99 STD ± 0.016 ± 0.193 ± 0.056 ± 0.088 ± 0.568 ± 2.510 ± 0.241 ± 0.842

(7)

4.8 Jasper

Jasper Ridge is a popular hyperspectral data used in

hyperspectral studies. There are 512 9 614 pixels in it.

Each pixel is recorded at 224 channels ranging from 380 to

2500 nm. The spectral resolution is up to 9.46 nm. Since

this hyperspectral image is too complex to get the ground

truth, we consider a sub-image of 100 9 100 pixels. The

Table 11 Overall (OA)

classification accuracy (%) and standard deviation (STD) of all datasets classified by the KELM classifier when a different number of samples are selected in each class for training set

NC Indian Pines PaviaU Salinas KSC MUUFL Urban Samson Jasper

1 OA (%) 50.61 51.99 78.57 77.06 47.32 47.90 57.37 57.37 STD – – – – – – – – 2 OA (%) 59.02 62.6 83.72 88.14 56.47 51.34 69.1 69.1 STD ± 0.001 ± 0.001 ± 0.000 ± 0.000 ± 0.000 ± 0.000 ± 0.000 ± 0.000 3 OA (%) 67.95 65.56 87.33 93.78 63.07 55.88 74.61 74.61 STD ± 0.000 ± 0.001 ± 0.001 ± 0.000 ± 0.001 ± 0.000 ± 0.000 ± 0.000 4 OA (%) 73.28 74.91 87.52 94.62 65.04 60.02 80.72 80.72 STD ± 0.000 ± 0.002 ± 0.001 ± 0.000 ± 0.002 ± 0.000 ± 0.000 ± 0.000 5 OA (%) 77.35 76.67 89.34 95.99 69.24 61.42 73.54 73.54 STD ± 0.001 ± 0.001 ± 0.002 ± 0.000 ± 0.001 ± 0.001 ± 0.000 ± 0.000 10 OA (%) 86.54 86.04 93.25 98.51 77.33 64.82 83.36 83.36 STD ± 0.000 ± 0.002 ± 0.004 ± 0.000 ± 0.005 ± 0.002 ± 0.000 ± 0.000 20 OA (%) 92.49 93.03 95.89 99.68 79.24 67.7 83.26 83.26 STD ± 0.000 ± 0.002 ± 0.002 ± 0.000 ± 0.005 ± 0.004 ± 0.000 ± 0.000 30 OA (%) 95.02 95.34 96.43 99.87 81.03 68.08 86.01 86.01 STD ± 0.000 ± 0.004 ± 0.003 ± 0.000 ± 0.006 ± 0.004 ± 0.000 ± 0.000 50 OA (%) 97.21 97.31 98.15 99.97 83.41 70.28 86.6 86.6 STD ± 0.000 ± 0.004 ± 0.002 ± 0.000 ± 0.008 ± 0.006 ± 0.000 ± 0.000 100 OA (%) 98.73 98.7 99.19 100 86.14 72.11 89.53 89.53 STD ± 0.000 ± 0.003 ± 0.003 ± 0.000 ± 0.011 ± 0.007 ± 0.000 ± 0.000

Fig. 2 Classification maps generated by SVM classifier on the Indian Pines dataset

(8)

first pixel starts from the (105,269)th pixel in the original

image. After removing the channels 1–3, 108–112,

154–166 and 220–224 due to dense water vapor and

atmospheric effects, we remain 198 channels [

34 –

36 ].

Table

9 defines four classes in the ground truth.

4.9 Experimental design

Experiments were performed on a laptop with 16 GB RAM

and a 2.70 GHz i7-3740QM CPU and all source codes

written in MATLAB. The proposed method for feature

extraction is compared with three well-known methods

such as principal component analysis (PCA) [

29 ],

semi-supervised local Fisher discriminant analysis for

dimen-sionality reduction (SELF) [

30 ] and semi-supervised

dis-criminant analysis (SDA) [

31 ]. Support vector machines

(SVMs) [

32 ] and kernel extreme learning machine

(KELM) [

33 ] were used as the classification methods. The

SVM was used due to its capability to deal with

high-dimensional data. Its resilience, because of the kernel

function, allows alternative strategies for including spatial

features in the classification process such as feature fusion

or composite kernels. KELM was used in many studies for

multispectral and hyperspectral remote sensing images

classification. The results show that KELM is more

accu-rate than, or similar to, SVMs in terms of classification

accuracy and offers notably low computational cost.

In the experiments, spectral reflectance values were

nor-malized to [0, 1] and all parameters were determined by

tenfold validation method and leave-one-out

cross-validation for sample size lower than 10. The classification

accuracies reported in this paper are given as the average of

classification accuracies resulting from 10 trials. In the

experiments, classification accuracies are compared by

classwise accuracy, which provided as supplementary

mate-rial, average accuracy, overall accuracy, the standard

devia-tion of tenfold cross-validadevia-tion and kappa coefficient [

37 ],

which is a metric that compares an observed accuracy with an

expected accuracy. In the experiments, classification results

are also validated by the Wilcoxon test [

38 ]. This is

Fig. 3 Classification maps

generated by KELM classifier on the Indian Pines dataset

Table 12 p values obtained through Wilcoxon tests between pro-posed method and other competing methods for the SVM classifier on Indian pines dataset

p value versus other methods PCA SELF SDA FNPA NC 1 0.014 0.0001 0.508 0.012 2 0.0001 0.065 0.0001 0.296 3 0.57 0.0001 0.862 0.0001 4 0.026 0.0001 0.033 0.0001 5 0.029 0.0001 0.251 0.001 10 0.028 0.001 0.026 0.0001 20 0.006 0.005 0.033 0.001 30 0.01 0.058 0.002 0.001 50 0.019 0.01 0.016 0.0001 100 0.012 0.196 0.003 0.003

(9)

Table 13 p values obtained through Wilcoxon tests between proposed method and other competing methods for the KELM classifier on Indian pines dataset

p value versus other methods PCA SELF SDA FNPA

NC 1 0.015 0.0001 0.433 0.021 2 0.0001 0.001 0.0001 0.507 3 0.009 0.0001 0.85 0.0001 4 0.0001 0.0001 0.019 0.0001 5 0.003 0.0001 0.042 0.001 10 0.001 0.001 0.018 0.0001 20 0.001 0.001 0.005 0.0001 30 0.001 0.001 0.002 0.003 50 0.014 0.007 0.052 0.002 100 0.055 0.003 0.017 0.014

Table 14 Overall (OA) classification accuracy (%), standard deviation (STD) and kappa values of the SVM classifier when a different number of samples are selected in each class for training set

NC FNPA PCA SFNPA SELF SDA

OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa 1 42.65 – 0.37 49.63 – 0.43 50.61 – 0.44 27.83 – 0.18 50.45 – 0.45 2 55.95 ± 3.060 0.53 58.71 ± 1.570 0.54 59.47 ± 0.000 0.55 36.01 ± 2.630 0.29 58.54 ± 1.690 0.54 3 46.99 ± 3.010 0.42 68.00 ± 1.660 0.64 69.26 ± 0.001 0.66 42.57 ± 1.460 0.37 64.96 ± 1.670 0.61 4 56.67 ± 2.750 0.52 72.01 ± 1.140 0.69 74.57 ± 0.000 0.71 49.90 ± 2.020 0.43 70.75 ± 1.220 0.67 5 59.76 ± 1.130 0.56 75.69 ± 0.707 0.73 78.20 ± 0.000 0.75 55.11 ± 2.880 0.49 74.06 ± 0.973 0.71 10 62.94 ± 0.891 0.59 86.14 ± 0.649 0.84 87.98 ± 0.000 0.86 79.43 ± 4.050 0.76 83.75 ± 0.647 0.82 20 78.12 ± 0.345 0.76 92.25 ± 0.278 0.91 93.25 ± 0.000 0.92 90.31 ± 1.750 0.89 91.38 ± 0.300 0.90 30 88.91 ± 0.224 0.87 94.61 ± 0.135 0.94 95.69 ± 0.000 0.95 94.52 ± 1.080 0.94 94.00 ± 0.213 0.93 50 93.93 ± 0.070 0.93 97.07 ± 0.099 0.97 97.64 ± 0.000 0.97 96.73 ± 1.100 0.96 96.64 ± 0.120 0.96 100 97.90 ± 0.043 0.98 98.71 ± 0.105 0.99 98.93 ± 0.016 0.99 98.71 ± 0.696 0.99 98.59 ± 0.076 0.98

Table 15 Overall (OA) classification accuracy (%), standard deviation (STD) and kappa values of the KELM classifier when a different number of samples are selected in each class for training set

(10)

nonparametric statistical hypothesis test, which bases the

hypothesis decision on the p value, and determines whether

the differences of the classification results between two

methods are statistically significant. A p value smaller than

0.05 indicates that the difference between classification

accuracies is statistically significant with 95% confidence.

The results of the Wilcoxon test are shown in experimental

results, including comparisons between the proposed method

and

the

other

methods.

The

statistical

difference

(p value \ 0.05) in results indicates that the improved

per-formance of the proposed method is statistically significant.

5 Experimental results

Experimental results for all datasets are listed in Tables

10 and

11 obtained by SVM and KELM, respectively. NC is

the number of samples in training set for each class where

total sample size is lower than NC; all samples are used for

that class.

5.1 Indian Pines dataset

For the Indian Pines [

29 ] dataset, all the data consist of 1,

2, 3, 4, 5, 10, 20, 30, 50 and 100 samples randomly selected

in each class for learning sets, and the remaining data

constitute the test set.

Classwise classification accuracy on the Indian Pines

dataset when 10 samples are selected in each class for the

training set is given in Online Resource 1. As shown in

Online Resource 1, classwise discriminability of SFNPA is

better than others.

Figure

2 shows the classification maps generated by

SVM classifier on the Indian Pines dataset when 10

sam-ples are selected in each class for the training set.

generated by SVM classifier on the PaviaU dataset

(11)

Figure

3 shows the classification maps generated by

KELM classifier on the Indian Pines dataset when 10

samples are selected in each class for the training set.

The results of the Wilcoxon test are listed in Table

12 ,

including comparisons between the proposed method and

the other methods for the SVM classifier. The statistical

difference (p value \ 0.05) in Table

12 indicates that the

improved performance of the proposed method is

statisti-cally significant.

The results of the Wilcoxon test are listed in Table

13 ,

including comparisons between the proposed method and

the other methods for the KELM classifier. The statistical

difference (p value \ 0.05) in Table

13 indicates that the

improved performance of the proposed method is

statisti-cally significant.

Table

14 shows the classification accuracy of the SVM

classifier when a different number of the samples are

selected in each class for the training set. As listed in

Table

14 , classwise discriminability of SFNPA is better

than others.

Table

15 shows the classification accuracy of the KELM

classifier when a different number of the samples are

selected in each class for the training set.

As shown in results, in the Indian Pines dataset, the

proposed method provides superior performance than

FNPA, PCA, SELF and SDA methods. Furthermore, the

proposed method preserves neighborhood and effectively

discriminates small structures and edges.

(12)

5.2 PaviaU dataset

For the PaviaU [

30 ] dataset, all the data consist of 1, 2, 3, 4,

5, 10, 20, 30, 50 and 100 samples randomly selected in

each class for learning sets, and the remaining data

con-stitute the test set.

Classwise classification accuracy on the PaviaU dataset

when 10 samples are selected in each class for the training

set is given in Online Resource 2. As shown in Online

Resource 2, classwise discriminability of SFNPA is better

than others.

Figure

4 shows the classification maps generated by

SVM classifier on the PaviaU dataset when 10 samples are

selected in each class for the training set.

Figure

5 shows the classification maps generated by

KELM classifier on the PaviaU dataset when 10 samples

are selected in each class for the training set.

The results of the Wilcoxon test are listed in Table

16 ,

including comparisons between the proposed method and

the other methods for the SVM classifier. The statistical

difference (p value \ 0.05) in Table

16 indicates that the

improved performance of the proposed method is

statisti-cally significant.

The results of the Wilcoxon test are listed in Table

17 ,

including comparisons between the proposed method and

the other methods for the KELM classifier. The statistical

difference (p value \ 0.05) in Table

17 indicates that the

improved performance of the proposed method is

statisti-cally significant.

Table

18 shows the classification accuracy of the SVM

classifier when a different number of the samples are

selected in each class for the training set. As listed in

Table

18 , classwise discriminability of SFNPA is better

than others.

Table 16 p values obtained through Wilcoxon tests between pro-posed method and other competing methods for the SVM classifier on PaviaU dataset

Table 17 p values obtained through Wilcoxon tests between pro-posed method and other competing methods for the KELM classifier on PaviaU dataset

Table 18 Overall (OA) classification accuracy (%), standard deviation (STD) and kappa values of the SVM classifier when a different number of the samples are selected in each class for the training set

(13)

Table

19 shows the classification accuracy of the KELM

classifier when a different number of the samples are

selected in each class for the training set.

As shown in results, in the PaviaU dataset, the proposed

method provides superior performance than FNPA, PCA,

SELF and SDA methods. Furthermore, the proposed

method preserves neighborhood and effectively

discrimi-nates small structures and edges.

5.3 Salinas dataset

For the Salinas [

30 ] dataset, all the data consist of 1, 2, 3, 4,

5, 10, 20, 30, 50 and 100 samples randomly selected in

each class for learning sets, and the remaining data

con-stitute the test set.

Classwise classification accuracy on the Salinas dataset

when 10 samples are selected in each class for the training

set is given in Online Resource 3. As shown in Online

Resource 3, classwise discriminability of SFNPA is better

than others.

Figure

6 shows the classification maps generated by

SVM classifier on the Salinas dataset when 10 samples are

selected in each class for the training set.

Figure

7 shows the classification maps generated by

KELM classifier on the Salinas dataset when 10 samples

are selected in each class for the training set.

The results of the Wilcoxon test are listed in Table

20 ,

including comparisons between the proposed method and

the other methods for the SVM classifier. The statistical

difference (p value \ 0.05) in Table

20 indicates that the

improved performance of the proposed method is

statisti-cally significant.

The results of the Wilcoxon test are listed in Table

21 ,

including comparisons between the proposed method and

the other methods for the KELM classifier. The statistical

difference (p value \ 0.05) in Table

21 indicates that the

improved performance of the proposed method is

statisti-cally significant.

Table

22 shows the classification accuracy of the SVM

classifier when a different number of the samples are

selected in each class for the training set.

Table

23 shows the classification accuracy of the KELM

classifier when a different number of the samples are

selected in each class for the training set.

As shown in results, in the Salinas dataset, the proposed

method provides superior performance than FNPA, PCA,

SELF and SDA methods. Furthermore, the proposed

method preserves neighborhood and effectively

discrimi-nates small structures and edges.

5.4 Kennedy space center dataset

For the Kennedy Space Center (KSC) [

31 ] dataset, all the

data consist of 1, 2, 3, 4, 5, 10, 20, 30, 50 and 100 samples

randomly selected in each class for learning sets, and the

remaining data constitute the test set.

Classwise classification accuracy on the KSC dataset

when 10 samples are selected in each class for the training

set is given in Online Resource 4. As shown in Online

Resource 4, classwise discriminability of SFNPA is better

than others.

Figure

8 shows the classification maps generated by

SVM classifier on the KSC dataset when 10 samples are

selected in each class for the training set.

(14)

Figure

9 shows the classification maps generated by

KELM classifier on the KSC dataset when 10 samples are

selected in each class for the training set.

The results of the Wilcoxon test are listed in Table

24 ,

including comparisons between the proposed method and

the other methods for the SVM classifier. The statistical

difference (p value \ 0.05) in Table

24 indicates that the

improved performance of the proposed method is

statisti-cally significant.

The results of the Wilcoxon test are listed in Table

25 ,

including comparisons between the proposed method and

the other methods for the KELM classifier. The statistical

difference (p value \ 0.05) in Table

25 indicates that the

improved performance of the proposed method is

statisti-cally significant.

Fig. 6 Classification maps generated by SVM classifier on the Salinas Valley dataset

(15)

Table

26 shows the classification accuracy of the SVM

classifier when a different number of the samples are

selected in each class for the training set.

Table

27 shows the classification accuracy of the KELM

classifier when a different number of the samples are

selected in each class for the training set.

As shown in results, in the KSC dataset, the proposed

method provides superior performance than FNPA, PCA,

SELF and SDA methods. Furthermore, proposed method

preserves neighborhood and effectively discriminates small

structures and edges.

5.5 MUUFL Gulf Port

For the MUUFL Gulf Port [

32 ,

33 ] dataset, all the data

consist of 1, 2, 3, 4, 5, 10, 20, 30, 50 and 100 samples

generated by KELM classifier on the Salinas Valley dataset

(16)

randomly selected in each class for learning sets, and the

remaining data constitute the test set.

Classwise classification accuracy on the MUUFL Gulf

Port dataset when 10 samples are selected in each class for

the training set is given in Online Resource 5. As shown in

Online Resource 5, classwise discriminability of SFNPA is

better than others.

Figure

10 shows the classification maps generated by

SVM classifier on the MUUFL Gulf Port dataset when 10

samples are selected in each class for the training set.

Figure

11 shows the classification maps generated by

KELM classifier on the MUUFL Gulf Port dataset when 10

samples are selected in each class for the training set.

The results of the Wilcoxon test are listed in Table

28 ,

including comparisons between the proposed method and

the other methods for the SVM classifier. The statistical

difference (p value \ 0.05) in Table

28 indicates that the

improved performance of the proposed method is

statisti-cally significant.

The results of the Wilcoxon test are listed in Table

29 ,

including comparisons between the proposed method and

the other methods for the KELM classifier. The statistical

difference (p value \ 0.05) in Table

29 indicates that the

improved performance of the proposed method is

statisti-cally significant.

Table

30 shows the classification accuracy of the SVM

classifier when a different number of the samples are

selected in each class for the training set.

Table

31 shows the classification accuracy of the KELM

classifier when a different number of the samples are

selected in each class for the training set.

As shown in results, in the MUUFL Gulf Port dataset,

the proposed method provides superior performance than

FNPA, PCA, SELF and SDA methods. Furthermore, the

Table 20 p values obtained through Wilcoxon tests between

pro-posed method and other competing methods for the SVM classifier on Salinas dataset

Table 21 p values obtained through Wilcoxon tests between pro-posed method and other competing methods for the KELM classifier on Salinas dataset

(17)

proposed method preserves neighborhood and effectively

discriminates small structures and edges.

5.6 Urban

For the Urban [

34 –

36 ] dataset, all the data consist of 1, 2,

3, 4, 5, 10, 20, 30, 50 and 100 samples randomly selected in

each class for learning sets, and the remaining data

con-stitute the test set.

Classwise classification accuracy on the Urban dataset

when 10 samples are selected in each class for the training

set is given in Online Resource 6. As shown in Online

Resource 6, classwise discriminability of SFNPA is better

than others.

Figure

12 shows the classification maps generated by

SVM classifier on the Urban dataset when 10 samples are

selected in each class for the training set.

Figure

13 shows the classification maps generated by

KELM classifier on the Urban dataset when 10 samples are

selected in each class for the training set.

The results of the Wilcoxon test are listed in Table

32 ,

including comparisons between the proposed method and

the other methods for the SVM classifier. The statistical

difference (p value \ 0.05) in Table

32 indicates that the

Fig. 8 Classification maps generated by SVM classifier on the KSC dataset

(18)

Fig. 9 Classification maps generated by KELM classifier on the KSC dataset

Table 24 p values obtained through Wilcoxon tests between pro-posed method and other competing methods for the SVM classifier on KSC dataset

Table 25 p values obtained through Wilcoxon tests between pro-posed method and other competing methods for the KELM classifier on KSC dataset

(19)

improved performance of the proposed method is

statisti-cally significant.

The results of the Wilcoxon test are listed in Table

33 ,

including comparisons between the proposed method and

the other methods for the KELM classifier. The statistical

difference (p value \ 0.05) in Table

33 indicates that the

improved performance of the proposed method is

statisti-cally significant.

Table

34 shows the classification accuracy of the SVM

classifier when a different number of the samples are

selected in each class for the training set.

Table

35 shows the classification accuracy of the KELM

classifier when a different number of the samples are

selected in each class for the training set.

As shown in results, in the Urban dataset, the proposed

method provides superior performance than FNPA, PCA,

SELF and SDA methods. Furthermore, the proposed

method preserves neighborhood and effectively

discrimi-nates small structures and edges.

5.7 Samson

For the Samson [

34 –

36 ] dataset, all the data consist of 1, 2,

3, 4, 5, 10, 20, 30, 50 and 100 samples randomly selected in

each class for learning sets, and the remaining data

con-stitute the test set.

Classwise classification accuracy on the Samson dataset

when 10 samples are selected in each class for the training

(20)

set is given in Online Resource 7. As shown in Online

Resource 7, classwise discriminability of SFNPA is better

than others.

Figure

14 shows the classification maps generated by

SVM classifier on the Samson dataset when 10 samples are

selected in each class for the training set.

Figure

15 shows the classification maps generated by

KELM classifier on the Samson dataset when 10 samples

are selected in each class for the training set.

The results of the Wilcoxon test are listed in Table

36 ,

including comparisons between the proposed method and

the other methods for the SVM classifier. The statistical

difference (p value \ 0.05) in Table

36 indicates that the

improved performance of the proposed method is

statisti-cally significant.

The results of the Wilcoxon test are listed in Table

37 ,

including comparisons between the proposed method and

the other methods for the KELM classifier. The statistical

difference (p value \ 0.05) in Table

37 indicates that the

improved performance of the proposed method is

statisti-cally significant.

Table

38 shows the classification accuracy of the SVM

classifier when a different number of the samples are

selected in each class for the training set.

Table

39 shows the classification accuracy of the KELM

classifier when a different number of the samples are

selected in each class for the training set.

As shown in results, in the Samson dataset, the proposed

method provides superior performance than FNPA, PCA,

SELF and SDA methods. Furthermore, the proposed

method preserves neighborhood and effectively

discrimi-nates small structures and edges.

5.8 Jasper

For the Jasper [

34 –

36 ] dataset, all the data consist of 1, 2,

3, 4, 5, 10, 20, 30, 50 and 100 samples randomly selected in

each class for learning sets, and the remaining data

con-stitute the test set.

(21)

Fig. 11 Classification maps generated by KELM classifier on the MUUFL Gulf Port dataset Table 28 p values obtained through Wilcoxon tests between

pro-posed method and other competing methods for the SVM classifier on MUUFL Gulf Port dataset

Table 29 p values obtained through Wilcoxon tests between pro-posed method and other competing methods for the KELM classifier on MUUFL Gulf Port dataset

(22)

Classwise classification accuracy on the Jasper dataset

when 10 samples are selected in each class for the training

set is given in Online Resource 8. As shown in Online

Resource 8, classwise discriminability of SFNPA is better

than others.

Figure

16 shows the classification maps generated by

SVM classifier on the Jasper dataset when 10 samples are

selected in each class for the training set.

Figure

17 shows the classification maps generated by

KELM classifier on the Jasper dataset when 10 samples are

selected in each class for the training set.

The results of the Wilcoxon test are listed in Table

40 ,

including comparisons between the proposed method and

the other methods for the SVM classifier. The statistical

difference (p value \ 0.05) in Table

40 indicates that the

improved performance of the proposed method is

statisti-cally significant.

The results of the Wilcoxon test are listed in Table

41 ,

including comparisons between the proposed method and

the other methods for the KELM classifier. The statistical

difference (p value \ 0.05) in Table

41 indicates that the

improved performance of the proposed method is

statisti-cally significant.

Table

42 shows the classification accuracy of the SVM

classifier when a different number of the samples are

selected in each class for the training set.

(23)

Table

43 shows the classification accuracy of the KELM

classifier when a different number of the samples are

selected in each class for the training set.

As shown in results, in the Jasper dataset, the proposed

method provides nearly superior performance than FNPA,

PCA, SELF and SDA methods while training set has more

than 10 samples for each class. Due to dataset does not

have groundtruth data, the class with the highest

mem-bership value in the unmixing data is considered as the

groundtruth data. Furthermore, the proposed method

pre-serves neighborhood and effectively discriminates small

structures and edges.

Fig. 12 Classification maps generated by SVM classifier on the Urban dataset

Fig. 13 Classification maps generated by KELM classifier on the Urban dataset

(24)

Table 32 p values obtained through Wilcoxon tests between proposed method and other competing methods for the SVM classifier on Urban dataset

NC 1 0.343 0.009 1.000 1.000 2 0.236 0.009 0.813 0.155 3 0.541 0.006 0.041 0.006 4 0.541 0.011 0.067 0.006 5 0.041 0.011 0.014 0.006 10 0.053 0.011 0.032 0.006 20 0.019 0.011 0.011 0.006 30 0.011 0.011 0.025 0.006 50 0.006 0.032 0.019 0.006 100 0.025 0.262 0.032 0.185

Table 33 p values obtained through Wilcoxon tests between proposed method and other competing methods for the KELM classifier on Urban dataset

NC 1 0.154 0.008 0.76 0.838 2 0.476 0.006 0.221 0.359 3 0.476 0.006 0.053 0.008 4 0.919 0.008 0.083 0.008 5 0.308 0.008 0.019 0.008 10 0.359 0.006 0.053 0.008 20 0.610 0.008 0.683 0.008 30 0.415 0.008 0.76 0.008 50 0.359 0.008 0.185 0.008 100 0.541 0.006 0.359 0.025