ORIGINAL ARTICLE
Semi-supervised fuzzy neighborhood preserving analysis
for feature extraction in hyperspectral remote sensing images
Hasan Ali Akyu¨rek
1 •Barıs¸ Koçer
2Received: 6 February 2017 / Accepted: 13 November 2017 / Published online: 21 November 2017 Ó The Natural Computing Applications Forum 2017
Abstract
Semi-supervised feature extraction methods are an important focus of interest in data mining and machine learning areas.
These methods are improved methods based on learning from a combination of labeled and unlabeled data. In this study, a
semi-supervised feature extraction method called as semi-supervised fuzzy neighborhood preserving analysis (SFNPA) is
proposed to improve the classification accuracy of hyperspectral remote sensing images. The proposed method combines
the principal component analysis (PCA) method, which is an unsupervised feature extraction method, and the supervised
fuzzy neighborhood preserving analysis (FNPA) method and increases the classification accuracy by using a limited
number of labeled data. Experimental results on four popular hyperspectral remote sensing datasets show that the proposed
method significantly improves classification accuracy on hyperspectral remote sensing images compared to the well-known
semi-supervised dimension reduction methods.
Keywords Semi-supervised feature extraction
Hyperspectral image classification Remote sensing Fuzzy neighborhood
preserving analysis
1 Introduction
Hyperspectral cameras collect information as a set of
images represented by hundreds of spectral bands. While
hyperspectral images provide richer spectral information
than traditional RGB and multispectral images, these huge
number of spectral bands also present a challenge for
tra-ditional spectral data processing techniques. While the
dimension increases, the performance of conventional
classification methods in the accuracy of classification
decreases [
1
], due to the multidimensional problem (curse
of dimensionality) when labeled data are limited. To cope
with the curse of dimensionality, feature extraction is used
to reduce the dimensionality of hyperspectral data and to
extract necessary information as much as possible. Feature
extraction methods are often used to classify, perceive and
visualize remote sensing data since they can represent
information in hyperspectral images with fewer bands [
2
].
There are many supervised and unsupervised methods
for feature extraction. The most well-known unsupervised
feature extraction method is the principal component
analysis (PCA) [
3
], which is widely used in hyperspectral
remote sensing images. In recent years, in order to preserve
the properties of local neighborhoods, locally linear
embedding [
4
], Laplacian eigen mapping [
5
], linear local
tangent space alignment [
6
], neighborhood preserving
embedding [
7
], locality preserving projections [
8
] and
nonlinear dimensionality reduction via tangent space
alignment [
9
] are applied to extract features from
hyper-spectral images. These methods, which preserve local
neighborhood information, can provide separation in the
large-dimensional feature space.
Supervised methods are providing a data projection
based on the existing labeled data. Supervised feature
extraction methods such as linear discriminant analysis
(LDA) [
10
] and modified Fisher’s linear discriminant
Electronic supplementary material The online version of this article(https://doi.org/10.1007/s00521-017-3279-y) contains supplementary material, which is available to authorized users.
& Hasan Ali Akyu¨rek hakyurek@konya.edu.tr
1 Department of Management Information Sciences, School of
Applied Sciences, Necmettin Erbakan University, Konya, Turkey
2 Department of Computer Engineering, Faculty of
Engineering, Selcuk University, Konya, Turkey https://doi.org/10.1007/s00521-017-3279-y
analysis [
11
] are also used for feature extraction in
hyperspectral remote sensing images.
In real-world applications, the labeled data are rather
limited and many times expert human resources are needed
to label a lot of data [
12
,
13
]. On the other hand, unlabeled
data can be reached at a very low cost. For this reason,
semi-supervised methods, which increase the performance of
extracting features using an unlabeled and limited number of
labeled data, have gained popularity in the field of machine
learning [
14
–
17
]. Many semi-supervised feature extraction
methods are based on the principle of co-operating a
super-vised method and an unsupersuper-vised method [
18
].
In this paper, a novel semi-supervised fuzzy
neighbor-hood preserving analysis (SFNPA) method for feature
extraction in hyperspectral remote sensing images is
pre-sented. The proposed SFNPA method aims to find a
pro-jection by preserving local neighborhood information and
maximizing informative and non-redundant data. In the
proposed method, fuzzy neighborhood preserving analysis
(FNPA), a supervised method, and principal component
analysis (PCA), an unsupervised method, are combined.
Extracted features by using the principal component
anal-ysis are classified according to the labeled data, and a new
projection is created by the supervised FNPA method to
enhance classification accuracy by enriching the data
pro-jection together with labeled data. The rest of the paper
organized as follows: In Sect.
2
, a brief note on
semi-su-pervised feature extraction methods is given. The proposed
method SFNPA is explained in Sect.
3
. The experimental
results on three different hyperspectral images are given
and discussed in Sect.
4
. Finally, the conclusion part is
explained in Sect.
5
.
2 Related work
Semi-supervised feature extraction methods are an
impor-tant focus of interest in data mining and machine learning
areas. These methods are improved methods based on
learning from a combination of labeled and unlabeled data.
The methods can be grouped into three main categories.
The first is constraint-based methods that are prone to
decompose based on binary sample constraints, the second
is distance-based methods that apply metric learning
techniques to binary instances and methods in the last
category are hybrid methods that combine the first two
methods in a general probabilistic working environment.
On the other hand, locality preserving is an important
case for preserving local structures [
8
]. As seen in the
lit-erature, local structures are usually more important for
discriminant analysis than global structures [
19
–
21
],
because local structures maximize the distance between
data points in different classes in local areas. Many locality
preserving methods are introduced in the literature, such as
optimizing kernel function with applications to kernel
principal analysis and locality preserving projection for
feature extraction [
22
], kernel self-optimized locality
pre-serving discriminant analysis for feature extraction and
recognition [
23
]; especially for hyperspectral image
clas-sification context, Li et al. proposed locality preserving
discriminant analysis in kernel-induced feature spaces for
hyperspectral image classification [
24
], locality preserving
dimensionality reduction and classification for
hyperspec-tral image analysis [
25
], bilateral filtering inspired locality
preserving projections for hyperspectral images [
26
].
The fuzzy neighborhood preserving analysis (FNPA)
method, an effective feature projection method, has been
proposed by Khushaba et al. [
27
,
28
]. This supervised
method is based on fuzzy and neighborhood discriminant
analysis. Especially by using this projection method, the
data in the vicinity of the same class become closer to each
other, while the data in the different classes become farther.
The x
kdata given in the scattered space can be expressed as
X
¼ x
f
1; x
2; . . .; x
kg, while k ¼ 1; 2; . . .; l and l is the total
number of samples. The membership function of kst
sam-ple of ist class can be expressed in Eq. (
1
).
l
ik¼ l
ið Þ 2 0; 1
x
k½
ð1Þ
The diameter r of class i can be expressed in Eq. (
2
)
where
x
iis the mean of data in class i.
r
¼ max k
x
ix
kk
rð2Þ
In this case, the fuzzy membership function l
ikcan be
expressed in Eq. (
3
).
l
ik¼
x
ix
krk
k
r
þ
2 m1ð3Þ
where m is the fuzzification parameter which controls the
degree of membership function, [ 0 is a small value to
avoid the zero-division error and r is the standard deviation
involved in the standardized Euclidean distance
calcula-tion. Finally, each sample’s membership in all class is
normalized to provide
P
ci¼1l
ik¼ 1. The total membership
values B
iof elements in fuzzy class c
iare expressed in
Eq. (
4
).
B
i¼
P
li k¼1l
ikð4Þ
The fuzziness N of elements in all fuzzy classes is
expressed in Eq. (
5
).
N
¼
P
c i¼1
B
ið5Þ
The weighted mean v
iof samples in class i can be
v
i¼
P
li k¼1l
ikx
kP
li k¼1l
ikð6Þ
By using these equations, the generalized class
scatter-ing matrix S
Wand the fuzziness scattering matrix S
Bare
given in Eqs. (
7
) and (
8
), respectively.
S
W¼
P
c i¼11
2B
iX
li k¼1X
li j¼1l
ikl
ijx
kx
jx
kx
j Tð7Þ
where c is the number of classes, B
iis the total membership
value, l
ikand l
ijare the membership of sample k and j in
class i, x
kand x
jare the kth and jth samples that belong to
class i
S
B¼
1
2N
X
c i¼1X
c j¼1B
iB
jv
iv
jv
iv
j Tð8Þ
where c is the number of classes, N is the sum of fuzziness
of samples in all fuzzy classes, B
iand B
jare the total
membership value of class i and j, v
iand v
jare the
weighted mean of samples in class i and j. The objective of
S
Wis to minimize the distance between samples of the
same class and also incorporates the membership values,
thus considering the samples contribution in the class when
preserving their distances. The objective of S
Bis to
maxi-mize the distance between classes. Based on these
equa-tions, the transformation matrix G
FNPAcan be found as
eigenvectors of Eq. (
9
).
G
FNPA¼ arg max
Gtrace
G
TS
BG
G
TS
WG
ð9Þ
where the function aims at maximizing the distance
between the samples of different classes while minimizing
the distance between the samples of the same class. The
projection of the data X
is obtained by multiplying the
transformation matrix by the data, as given in Eq. (
10
).
X
¼ X G
FNPAð10Þ
3 Semi-supervised fuzzy neighborhood
preserving analysis
The success of classification methods in high-dimensional
data is achieved by increasing the discriminability of data
in different classes. The principal component analysis
(PCA) method, which has been widely used for dimension
reduction in the classification of hyperspectral remote
sensing images, significantly reduces the dimension of the
data, but also significantly improves classification
accu-racy. In this study, the feature projection ability of the
fuzzy neighborhood preserving analysis method, which is a
supervised method, has been improved with the superior
feature projection ability of the principal component
analysis method, which is an unsupervised method. The
semi-supervised fuzzy neighborhood preserving analysis
method (SFNPA) combines the unsupervised
discrim-inability of the PCA and the supervised discrimdiscrim-inability of
the FNPA, and the decomposition between the classes is
also increased, while the dimension of the data is reduced.
x
i, which is the center of class i, used in the FNPA method
is calculated as weighted mean in Eq. (
11
) in SFNPA.
x
i¼
x
TRAINia þ
x
PCAi1 a
ð
Þ
ð11Þ
where
x
TRAINiis the center of class i obtained from labeled
data, and x
PCAiis the center of pre-classified unlabeled data
class which dimension-reduced with PCA and also
pre-classification map created by SVM, and a is a
semi-su-pervision constant, which is in the range [0,1]. Center of
classes is computed as mean of class. In this study, as a
result of experiments a value was chosen as 0.9. By using
Eq. (
11
), PCA-guided pre-classified unlabeled data can
enhance the FNPA’s discriminability in case of small
labeled data. In this method, we focused on combining
advantages of PCA and FNPA’s discriminability in case of
small labeled data. Pseudocodes of proposed method are
listed in Table
1
.
Table 1 Pseudocodes of proposed method
Algorithm Semi-supervised fuzzy neighborhood preserving analysis for feature extraction in hyperspectral remote sensing images
Inputs X: MxNxD-sized hyperspectral image L: MxN label matrix of labeled data Rdim: Size of reduced dimension
a: Weight parameter for center of class calculation Output X*: MxNxRdim-sized features of data X
Step 1 The dimension of data X is reduced to Rdim by principal component analysis Step 2 The obtained data XPCAare classified according to label matrix L
Step 3 The class centers are calculated by Eq. (11) with the labeled data XTRAINand the pre-classified data XPCAobtained in step 2
Step 4 The data X are analyzed by FNPA method using new class centers from Step 3
In step 1, the dimension of input data is reduced to Rdim
by using principal component analysis. At step 2,
dimen-sion-reduced data X
PCAare classified according to label
matrix L. In step 3, class centers are calculated by Eq. (
11
)
with the labeled data XTRAIN
and the pre-classified data
X
PCA, which is obtained in step 2. Calculated class centers
are used to analyze input data X by FNPA method at step 4
for obtaining transformation matrix. Finally,
dimension-reduced data X* are obtained from Eq. (
10
) by using the
transformation matrix G
FNPAthat is obtained from Eq. (
9
)
in step 5.
The flowchart of the proposed method for classifying
hyperspectral remote sensing data is shown in Fig.
1
.
At the beginning of flowchart, hyperspectral data are
loaded. After that, hyperspectral data are filtered with 3-D
Gaussian filter for denoising data. At next, the proposed
method is applied to obtain dimension-reduced data. New
data are classified by using SVM and KELM. Finally,
classification results are used to obtain classification
maps.
4 Datasets and experimental design
Experiments were conducted on eight popular
hyperspec-tral remote sensing datasets to verify the success of the
proposed method.
4.1 Indian Pines dataset
The Indian Pines hyperspectral image was taken by the
AVIRIS sensor from the Indian Pines test site in
north-western Indiana. The image, which has a spatial resolution
of 145 9 145 pixels, consists of 224 spectral bands in the
0.4–2.5 lm bandwidth. Sixteen classes are defined in the
ground truth as given in Table
2
. In the dataset consisting
of 10,249 samples except for the background, the total
number of bands was reduced to 200 by deleting 24
spectral bands due to water absorption [
29
].
4.2 PaviaU dataset
The hyperspectral image of PaviaU was taken from the
Pavia site in northern Italy by the ROSIS sensor. The image
with a spatial resolution of 610 9 340 pixels consists of
103 spectral bands at a bandwidth of 0.43–0.86 lm. Nine
classes are defined in the ground truth as given in Table
3
.
In the dataset consisting of 42,776 samples except for the
background, the geometric resolution is 1.3 m [
30
].
4.3 Salinas dataset
The Salinas hyperspectral image was taken from the
Sali-nas Valley in California by the AVIRIS sensor. The image
has a spatial resolution of 512 9 217 pixels and consists of
224 spectral bands in the 0.4–2.5 lm bandwidth. Sixteen
Fig. 1 Flowchart of semi-supervised fuzzy neighborhood preserving analysis for hyperspectral remote sensing imageTable 2 Ground-truth classes for the Indian Pines scene and their respective samples number
Class Class name Samples
1 Alfalfa 46 2 Corn-notill 1428 3 Corn-mintill 830 4 Corn 237 5 Grass-pasture 483 6 Grass-trees 730 7 Grass-pasture-mowed 28 8 Hay-windrowed 478 9 Oats 20 10 Soybean-notill 972 11 Soybean-mintill 2455 12 Soybean-clean 593 13 Wheat 205 14 Woods 1265 15 Buildings-Grass-Trees-Drives 386 16 Stone-Steel-Towers 93 Total 10,249
classes are defined in the ground truth as given in Table
4
.
In the dataset consisting of 54,129 samples excluding
background, 20 spectral bands, which are problematic due
to water absorption, were deleted, and the total number of
bands was reduced to 204 [
30
].
4.4 Kennedy space center dataset
The Kennedy Space Center (KSC) hyperspectral image
was acquired by the NASA AVIRIS sensor at the Kennedy
Space Center in Florida. The image has a spatial resolution
of 512 9 614 pixels and consists of 224 spectral bands at a
bandwidth of 0.4–2.5 lm. Table
5
defines 13 classes in the
ground truth. The total number of bands was reduced to
176 by deleting 48 spectral bands, which are problematic
due to the water absorption, in the dataset consisting of
9130 samples except for the background [
31
].
4.5 MUUFL Gulf Port
MUUFL Gulf Port hyperspectral image was acquired by
the ITRES CASI-1500 sensor at the campus of the
University of Southern Mississippi—Gulfport in Long
Beach, MS. The image has a spatial resolution of
325 9 220 pixels and consists of 64 spectral bands at a
bandwidth of 375–1050 nm [
32
]. Table
6
defines 11
clas-ses in the ground truth [
33
].
Table 3 Ground-truth classes for the Pavia University scene and their respective samples number
Class Class name Samples
1 Asphalt 6631
2 Meadows 18,649
3 Gravel 2099
4 Trees 3064
5 Painted metal sheets 1345
6 Bare Soil 5029
7 Bitumen 1330
8 Self-Blocking Bricks 3682
9 Shadows 947
Total 42,776
Table 4 Ground-truth classes for the Salinas scene and their respec-tive samples number
Class Class name Samples
1 Brocoli Green Weeds_1 2009
2 Brocoli Green Weeds_2 3726
3 Fallow 1976
4 Fallow Rough Plow 1394
5 Fallow Smooth 2678
6 Stubble 3959
7 Celery 3579
8 Grapes Untrained 11,271
9 Soil Vinyard Develop 6203
10 Corn Senesced Green Weeds 3278
11 Lettuce Romaine 4wk 1068
12 Lettuce Romaine 5wk 1927
13 Lettuce Romaine 6wk 916
14 Lettuce Romaine 7wk 1070
15 Vinyard Untrained 7268
16 Vinyard Vertical Trellis 1807
Total 54,129
Table 5 Ground-truth classes for the KSC scene and their respective samples number
Class Class name Samples
1 Scrub 46
2 Willow Swamp 1428
3 Cabbage Palm Hammock 830
4 Cabbage Palm/Oak Hammock 830
5 Slash Pine 237 6 Oak/Broadleaf Hammock 483 7 Hardwood Swamp 730 8 Graminoid Marsh 28 9 Spartina Marsh 478 10 Cattail Marsh 20 11 Salt Marsh 972 12 Mud Flats 2455 13 Water 593 Total 9130
Table 6 Ground-truth classes for the MUUFL Gulf Port scene and their respective samples number
Class Class name Samples
1 Trees 23,246
2 Grass Pure 4270
3 Grass Groundsurface 6882
4 Dirt and Sand 1826
5 Road Materials 6687 6 Water 466 7 Shadow Building 2233 8 Buildings 6240 9 Sidewalk 1385 10 Yellowcurb 183 11 Cloth Panels 269 Total 71,500
4.6 Urban
Urban is one of the most widely used hyperspectral data
used in the hyperspectral unmixing study. There are
307 9 307 pixels, each of which corresponds to a
2 9 2 m
2area. In this image, there are 210 wavelengths
ranging from 400 to 2500 nm, resulting in a spectral
res-olution of 10 nm. After the channels 1–4, 76, 87, 101–111,
136–153 and 198–210 are removed (due to dense water
vapor and atmospheric effects), we remain 162 channels.
(This is a common preprocess for hyperspectral unmixing
analyses.) There are three versions of the ground truth,
which contain 4, 5 and 6 endmembers, respectively, which
are introduced in the ground truth [
34
–
36
]. Table
7
defines
six classes in the ground truth.
4.7 Samson
Samson is a sample dataset that is available from the
Opticks project. In this image, there are 952 9 952 pixels.
Each pixel is recorded at 156 channels covering the
wavelengths from 401 to 889 nm. The spectral resolution is
highly up to 3.13 nm. As the original image is too large,
which is very expensive in terms of computational cost, a
region of 95 9 95 pixels is used. It starts from the
(252,332)-th pixel in the original image. These data are not
degraded by the blank channel or badly noised channels
[
34
–
36
]. Table
8
defines three classes in the ground truth.
Table 7 Ground-truth classesfor the Urban scene and their respective samples number
Class Class name Samples
1 Asphalt 18,570 2 Grass 35,198 3 Tree 22,468 4 Roof 6821 5 Metal 2436 6 Dirt 8756 Total 94,249
Table 8 Ground-truth classes for the Samson scene and their respective samples number
Class Class name Samples
1 Soil 3015
2 Tree 3666
3 Water 2344
Total 9025
Table 9 Ground-truth classes for the Jasper scene and their respective samples number
Class Class name Samples
1 Road 3493
2 Soil 3326
3 Water 2428
4 Tree 753
Total 10,000
Table 10 Overall (OA) classification accuracy (%) and standard deviation (STD) of all datasets classified by the SVM classifier when a different number of samples are selected in each class for training set
NC Indian Pines PaviaU Salinas KSC MUUFL Urban Samson Jasper
1 OA (%) 50.61 51.99 78.57 77.06 42.67 47.90 57.37 55.15 STD – – – – – – – – 2 OA (%) 59.47 65.3 84.25 88.42 54.24 51.98 72.53 62.96 STD ± 0.000 ± 0.000 ± 0.170 ± 0.000 ± 0.000 ± 0.000 ± 0.000 ± 1.130 3 OA (%) 69.26 68.42 87.61 93.87 60.2 56.47 77 69.33 STD ± 0.001 ± 0.000 ± 0.142 ± 0.499 ± 0.555 ± 0.194 ± 24,000 ± 0.496 4 OA (%) 74.57 76.58 87.98 94.8 63.27 60.72 80.76 72.38 STD ± 0.000 ± 0.000 ± 0.115 ± 0.814 ± 1.170 ± 0.289 ± 3.280 ± 0.335 5 OA (%) 78.2 77.76 89.65 96.14 66.28 61.68 75.93 71.04 STD ± 0.000 ± 0.002 ± 0.087 ± 0.963 ± 0.874 ± 0.165 ± 3.290 ± 1.370 10 OA (%) 87.98 85.91 92.7 98.5 74.12 65.13 84.31 73.03 STD ± 0.000 ± 0.000 ± 0.059 ± 0.455 ± 0.000 ± 0.153 ± 1.440 ± 1.180 20 OA (%) 93.25 92.66 95.98 99.65 76.59 65.65 83.92 75.48 STD ± 0.000 ± 0.000 ± 0.032 ± 0.326 ± 0.244 ± 0.774 ± 0.805 ± 1220 30 OA (%) 95.69 95.56 96.11 99.85 78.71 66.68 86.97 76.77 STD ± 0.000 ± 0.020 ± 0.000 ± 0.228 ± 0.435 ± 0.035 ± 1.020 ± 0.125 50 OA (%) 97.64 96.96 98.17 99.94 81.41 68.05 87.59 79.41 STD ± 0.000 ± 0.256 ± 0.017 ± 0.184 ± 0.429 ± 0.240 ± 0.525 ± 0.589 100 OA (%) 98.93 98.49 98.96 99.99 85.44 68.74 90.27 80.99 STD ± 0.016 ± 0.193 ± 0.056 ± 0.088 ± 0.568 ± 2.510 ± 0.241 ± 0.842
4.8 Jasper
Jasper Ridge is a popular hyperspectral data used in
hyperspectral studies. There are 512 9 614 pixels in it.
Each pixel is recorded at 224 channels ranging from 380 to
2500 nm. The spectral resolution is up to 9.46 nm. Since
this hyperspectral image is too complex to get the ground
truth, we consider a sub-image of 100 9 100 pixels. The
Table 11 Overall (OA)classification accuracy (%) and standard deviation (STD) of all datasets classified by the KELM classifier when a different number of samples are selected in each class for training set
NC Indian Pines PaviaU Salinas KSC MUUFL Urban Samson Jasper
1 OA (%) 50.61 51.99 78.57 77.06 47.32 47.90 57.37 57.37 STD – – – – – – – – 2 OA (%) 59.02 62.6 83.72 88.14 56.47 51.34 69.1 69.1 STD ± 0.001 ± 0.001 ± 0.000 ± 0.000 ± 0.000 ± 0.000 ± 0.000 ± 0.000 3 OA (%) 67.95 65.56 87.33 93.78 63.07 55.88 74.61 74.61 STD ± 0.000 ± 0.001 ± 0.001 ± 0.000 ± 0.001 ± 0.000 ± 0.000 ± 0.000 4 OA (%) 73.28 74.91 87.52 94.62 65.04 60.02 80.72 80.72 STD ± 0.000 ± 0.002 ± 0.001 ± 0.000 ± 0.002 ± 0.000 ± 0.000 ± 0.000 5 OA (%) 77.35 76.67 89.34 95.99 69.24 61.42 73.54 73.54 STD ± 0.001 ± 0.001 ± 0.002 ± 0.000 ± 0.001 ± 0.001 ± 0.000 ± 0.000 10 OA (%) 86.54 86.04 93.25 98.51 77.33 64.82 83.36 83.36 STD ± 0.000 ± 0.002 ± 0.004 ± 0.000 ± 0.005 ± 0.002 ± 0.000 ± 0.000 20 OA (%) 92.49 93.03 95.89 99.68 79.24 67.7 83.26 83.26 STD ± 0.000 ± 0.002 ± 0.002 ± 0.000 ± 0.005 ± 0.004 ± 0.000 ± 0.000 30 OA (%) 95.02 95.34 96.43 99.87 81.03 68.08 86.01 86.01 STD ± 0.000 ± 0.004 ± 0.003 ± 0.000 ± 0.006 ± 0.004 ± 0.000 ± 0.000 50 OA (%) 97.21 97.31 98.15 99.97 83.41 70.28 86.6 86.6 STD ± 0.000 ± 0.004 ± 0.002 ± 0.000 ± 0.008 ± 0.006 ± 0.000 ± 0.000 100 OA (%) 98.73 98.7 99.19 100 86.14 72.11 89.53 89.53 STD ± 0.000 ± 0.003 ± 0.003 ± 0.000 ± 0.011 ± 0.007 ± 0.000 ± 0.000
Fig. 2 Classification maps generated by SVM classifier on the Indian Pines dataset
first pixel starts from the (105,269)th pixel in the original
image. After removing the channels 1–3, 108–112,
154–166 and 220–224 due to dense water vapor and
atmospheric effects, we remain 198 channels [
34
–
36
].
Table
9
defines four classes in the ground truth.
4.9 Experimental design
Experiments were performed on a laptop with 16 GB RAM
and a 2.70 GHz i7-3740QM CPU and all source codes
written in MATLAB. The proposed method for feature
extraction is compared with three well-known methods
such as principal component analysis (PCA) [
29
],
semi-supervised local Fisher discriminant analysis for
dimen-sionality reduction (SELF) [
30
] and semi-supervised
dis-criminant analysis (SDA) [
31
]. Support vector machines
(SVMs) [
32
] and kernel extreme learning machine
(KELM) [
33
] were used as the classification methods. The
SVM was used due to its capability to deal with
high-dimensional data. Its resilience, because of the kernel
function, allows alternative strategies for including spatial
features in the classification process such as feature fusion
or composite kernels. KELM was used in many studies for
multispectral and hyperspectral remote sensing images
classification. The results show that KELM is more
accu-rate than, or similar to, SVMs in terms of classification
accuracy and offers notably low computational cost.
In the experiments, spectral reflectance values were
nor-malized to [0, 1] and all parameters were determined by
tenfold validation method and leave-one-out
cross-validation for sample size lower than 10. The classification
accuracies reported in this paper are given as the average of
classification accuracies resulting from 10 trials. In the
experiments, classification accuracies are compared by
classwise accuracy, which provided as supplementary
mate-rial, average accuracy, overall accuracy, the standard
devia-tion of tenfold cross-validadevia-tion and kappa coefficient [
37
],
which is a metric that compares an observed accuracy with an
expected accuracy. In the experiments, classification results
are also validated by the Wilcoxon test [
38
]. This is
Fig. 3 Classification mapsgenerated by KELM classifier on the Indian Pines dataset
Table 12 p values obtained through Wilcoxon tests between pro-posed method and other competing methods for the SVM classifier on Indian pines dataset
p value versus other methods PCA SELF SDA FNPA NC 1 0.014 0.0001 0.508 0.012 2 0.0001 0.065 0.0001 0.296 3 0.57 0.0001 0.862 0.0001 4 0.026 0.0001 0.033 0.0001 5 0.029 0.0001 0.251 0.001 10 0.028 0.001 0.026 0.0001 20 0.006 0.005 0.033 0.001 30 0.01 0.058 0.002 0.001 50 0.019 0.01 0.016 0.0001 100 0.012 0.196 0.003 0.003
Table 13 p values obtained through Wilcoxon tests between proposed method and other competing methods for the KELM classifier on Indian pines dataset
p value versus other methods PCA SELF SDA FNPA
NC 1 0.015 0.0001 0.433 0.021 2 0.0001 0.001 0.0001 0.507 3 0.009 0.0001 0.85 0.0001 4 0.0001 0.0001 0.019 0.0001 5 0.003 0.0001 0.042 0.001 10 0.001 0.001 0.018 0.0001 20 0.001 0.001 0.005 0.0001 30 0.001 0.001 0.002 0.003 50 0.014 0.007 0.052 0.002 100 0.055 0.003 0.017 0.014
Table 14 Overall (OA) classification accuracy (%), standard deviation (STD) and kappa values of the SVM classifier when a different number of samples are selected in each class for training set
NC FNPA PCA SFNPA SELF SDA
OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa 1 42.65 – 0.37 49.63 – 0.43 50.61 – 0.44 27.83 – 0.18 50.45 – 0.45 2 55.95 ± 3.060 0.53 58.71 ± 1.570 0.54 59.47 ± 0.000 0.55 36.01 ± 2.630 0.29 58.54 ± 1.690 0.54 3 46.99 ± 3.010 0.42 68.00 ± 1.660 0.64 69.26 ± 0.001 0.66 42.57 ± 1.460 0.37 64.96 ± 1.670 0.61 4 56.67 ± 2.750 0.52 72.01 ± 1.140 0.69 74.57 ± 0.000 0.71 49.90 ± 2.020 0.43 70.75 ± 1.220 0.67 5 59.76 ± 1.130 0.56 75.69 ± 0.707 0.73 78.20 ± 0.000 0.75 55.11 ± 2.880 0.49 74.06 ± 0.973 0.71 10 62.94 ± 0.891 0.59 86.14 ± 0.649 0.84 87.98 ± 0.000 0.86 79.43 ± 4.050 0.76 83.75 ± 0.647 0.82 20 78.12 ± 0.345 0.76 92.25 ± 0.278 0.91 93.25 ± 0.000 0.92 90.31 ± 1.750 0.89 91.38 ± 0.300 0.90 30 88.91 ± 0.224 0.87 94.61 ± 0.135 0.94 95.69 ± 0.000 0.95 94.52 ± 1.080 0.94 94.00 ± 0.213 0.93 50 93.93 ± 0.070 0.93 97.07 ± 0.099 0.97 97.64 ± 0.000 0.97 96.73 ± 1.100 0.96 96.64 ± 0.120 0.96 100 97.90 ± 0.043 0.98 98.71 ± 0.105 0.99 98.93 ± 0.016 0.99 98.71 ± 0.696 0.99 98.59 ± 0.076 0.98
Table 15 Overall (OA) classification accuracy (%), standard deviation (STD) and kappa values of the KELM classifier when a different number of samples are selected in each class for training set
NC FNPA PCA SFNPA SELF SDA
OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa 1 43.26 – 0.38 49.64 – 0.43 50.61 – 0.44 27.84 – 0.18 50.46 – 0.45 2 58.63 ± 0.001 0.55 57.57 ± 0.003 0.53 59.02 ± 0.001 0.54 28.74 ± 0.000 0.23 56.27 ± 0.000 0.52 3 48.36 ± 0.006 0.43 66.21 ± 0.001 0.62 67.95 ± 0.000 0.64 39.15 ± 0.003 0.34 62.34 ± 0.001 0.58 4 57.20 ± 0.004 0.53 71.50 ± 0.001 0.68 73.28 ± 0.000 0.70 44.27 ± 0.005 0.40 67.05 ± 0.002 0.63 5 60.72 ± 0.004 0.57 75.32 ± 0.002 0.72 77.35 ± 0.001 0.74 50.04 ± 0.005 0.46 71.38 ± 0.003 0.68 10 63.34 ± 0.002 0.59 84.89 ± 0.002 0.83 86.54 ± 0.000 0.85 73.41 ± 0.006 0.70 80.99 ± 0.003 0.79 20 78.55 ± 0.010 0.76 91.44 ± 0.002 0.90 92.49 ± 0.000 0.91 87.81 ± 0.005 0.86 89.39 ± 0.003 0.88 30 89.54 ± 0.006 0.88 94.11 ± 0.001 0.93 95.02 ± 0.000 0.94 90.89 ± 0.004 0.90 92.91 ± 0.003 0.92 50 94.55 ± 0.002 0.94 96.74 ± 0.001 0.96 97.21 ± 0.000 0.97 94.58 ± 0.002 0.94 95.97 ± 0.003 0.95 100 98.16 ± 0.001 0.98 98.53 ± 0.001 0.98 98.73 ± 0.000 0.99 97.51 ± 0.001 0.97 98.16 ± 0.002 0.98
nonparametric statistical hypothesis test, which bases the
hypothesis decision on the p value, and determines whether
the differences of the classification results between two
methods are statistically significant. A p value smaller than
0.05 indicates that the difference between classification
accuracies is statistically significant with 95% confidence.
The results of the Wilcoxon test are shown in experimental
results, including comparisons between the proposed method
and
the
other
methods.
The
statistical
difference
(p value \ 0.05) in results indicates that the improved
per-formance of the proposed method is statistically significant.
5 Experimental results
Experimental results for all datasets are listed in Tables
10
and
11
obtained by SVM and KELM, respectively. NC is
the number of samples in training set for each class where
total sample size is lower than NC; all samples are used for
that class.
5.1 Indian Pines dataset
For the Indian Pines [
29
] dataset, all the data consist of 1,
2, 3, 4, 5, 10, 20, 30, 50 and 100 samples randomly selected
in each class for learning sets, and the remaining data
constitute the test set.
Classwise classification accuracy on the Indian Pines
dataset when 10 samples are selected in each class for the
training set is given in Online Resource 1. As shown in
Online Resource 1, classwise discriminability of SFNPA is
better than others.
Figure
2
shows the classification maps generated by
SVM classifier on the Indian Pines dataset when 10
sam-ples are selected in each class for the training set.
Fig. 4 Classification mapsgenerated by SVM classifier on the PaviaU dataset
Figure
3
shows the classification maps generated by
KELM classifier on the Indian Pines dataset when 10
samples are selected in each class for the training set.
The results of the Wilcoxon test are listed in Table
12
,
including comparisons between the proposed method and
the other methods for the SVM classifier. The statistical
difference (p value \ 0.05) in Table
12
indicates that the
improved performance of the proposed method is
statisti-cally significant.
The results of the Wilcoxon test are listed in Table
13
,
including comparisons between the proposed method and
the other methods for the KELM classifier. The statistical
difference (p value \ 0.05) in Table
13
indicates that the
improved performance of the proposed method is
statisti-cally significant.
Table
14
shows the classification accuracy of the SVM
classifier when a different number of the samples are
selected in each class for the training set. As listed in
Table
14
, classwise discriminability of SFNPA is better
than others.
Table
15
shows the classification accuracy of the KELM
classifier when a different number of the samples are
selected in each class for the training set.
As shown in results, in the Indian Pines dataset, the
proposed method provides superior performance than
FNPA, PCA, SELF and SDA methods. Furthermore, the
proposed method preserves neighborhood and effectively
discriminates small structures and edges.
5.2 PaviaU dataset
For the PaviaU [
30
] dataset, all the data consist of 1, 2, 3, 4,
5, 10, 20, 30, 50 and 100 samples randomly selected in
each class for learning sets, and the remaining data
con-stitute the test set.
Classwise classification accuracy on the PaviaU dataset
when 10 samples are selected in each class for the training
set is given in Online Resource 2. As shown in Online
Resource 2, classwise discriminability of SFNPA is better
than others.
Figure
4
shows the classification maps generated by
SVM classifier on the PaviaU dataset when 10 samples are
selected in each class for the training set.
Figure
5
shows the classification maps generated by
KELM classifier on the PaviaU dataset when 10 samples
are selected in each class for the training set.
The results of the Wilcoxon test are listed in Table
16
,
including comparisons between the proposed method and
the other methods for the SVM classifier. The statistical
difference (p value \ 0.05) in Table
16
indicates that the
improved performance of the proposed method is
statisti-cally significant.
The results of the Wilcoxon test are listed in Table
17
,
including comparisons between the proposed method and
the other methods for the KELM classifier. The statistical
difference (p value \ 0.05) in Table
17
indicates that the
improved performance of the proposed method is
statisti-cally significant.
Table
18
shows the classification accuracy of the SVM
classifier when a different number of the samples are
selected in each class for the training set. As listed in
Table
18
, classwise discriminability of SFNPA is better
than others.
Table 16 p values obtained through Wilcoxon tests between pro-posed method and other competing methods for the SVM classifier on PaviaU dataset
p value versus other methods PCA SELF SDA FNPA NC 1 0.009 0.005 0.724 0.078 2 0.055 0.004 0.17 0.003 3 0.043 0.005 0.081 0.002 4 0.069 0.005 0.485 0.002 5 0.025 0.003 0.889 0.002 10 0.108 0.005 0.295 0.002 20 0.043 0.004 0.224 0.002 30 0.014 0.008 0.093 0.002 50 0.078 0.002 0.017 0.003 100 0.021 0.012 0.021 0.004
Table 17 p values obtained through Wilcoxon tests between pro-posed method and other competing methods for the KELM classifier on PaviaU dataset
p value versus other methods PCA SELF SDA FNPA NC 1 0.017 0.005 0.625 0.162 2 0.021 0.005 0.021 0.002 3 0.012 0.004 0.014 0.002 4 0.006 0.004 0.006 0.002 5 0.002 0.002 0.005 0.002 10 0.014 0.004 0.017 0.002 20 0.021 0.002 0.01 0.002 30 0.05 0.002 0.017 0.002 50 0.108 0.002 0.008 0.007 100 0.235 0.014 0.03 0.003
Table 18 Overall (OA) classification accuracy (%), standard deviation (STD) and kappa values of the SVM classifier when a different number of the samples are selected in each class for the training set
NC FNPA PCA SFNPA SELF SDA
OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa 1 47.89 – 0.38 49.64 – 0.41 51.99 – 0.43 32.88 – 0.23 53.34 – 0.44 2 43.69 ± 0.000 0.34 65.03 ± 0.000 0.57 65.30 ± 0.000 0.58 49.35 ± 0.000 0.38 63.47 ± 0.000 0.55 3 38.71 ± 1.420 0.29 68.86 ± 0.919 0.61 68.42 ± 0.000 0.61 52.88 ± 1.970 0.43 66.80 ± 0.943 0.59 4 52.25 ± 0.119 0.42 76.79 ± 1.130 0.71 76.58 ± 0.000 0.71 61.25 ± 1.510 0.52 74.68 ± 1.870 0.68 5 45.58 ± 0.672 0.36 76.53 ± 1.010 0.71 77.76 ± 0.002 0.72 65.30 ± 1.360 0.57 77.50 ± 1.530 0.71 10 42.45 ± 0.582 0.32 84.23 ± 0.598 0.80 85.91 ± 0.000 0.82 77.62 ± 1.220 0.72 83.63 ± 1.330 0.79 20 80.15 ± 0.373 0.74 91.42 ± 0.591 0.89 92.66 ± 0.000 0.90 89.78 ± 0.726 0.87 90.58 ± 0.387 0.88 30 87.51 ± 0.276 0.84 94.89 ± 0.334 0.93 95.56 ± 0.020 0.94 93.60 ± 0.469 0.92 94.02 ± 0.486 0.92 50 91.51 ± 0.247 0.89 96.11 ± 0.354 0.95 96.96 ± 0.256 0.96 94.55 ± 0.230 0.93 95.92 ± 0.268 0.95 100 94.73 ± 0.061 0.93 98.13 ± 0.162 0.98 98.49 ± 0.193 0.98 97.91 ± 0.131 0.97 97.95 ± 0.150 0.97
Table
19
shows the classification accuracy of the KELM
classifier when a different number of the samples are
selected in each class for the training set.
As shown in results, in the PaviaU dataset, the proposed
method provides superior performance than FNPA, PCA,
SELF and SDA methods. Furthermore, the proposed
method preserves neighborhood and effectively
discrimi-nates small structures and edges.
5.3 Salinas dataset
For the Salinas [
30
] dataset, all the data consist of 1, 2, 3, 4,
5, 10, 20, 30, 50 and 100 samples randomly selected in
each class for learning sets, and the remaining data
con-stitute the test set.
Classwise classification accuracy on the Salinas dataset
when 10 samples are selected in each class for the training
set is given in Online Resource 3. As shown in Online
Resource 3, classwise discriminability of SFNPA is better
than others.
Figure
6
shows the classification maps generated by
SVM classifier on the Salinas dataset when 10 samples are
selected in each class for the training set.
Figure
7
shows the classification maps generated by
KELM classifier on the Salinas dataset when 10 samples
are selected in each class for the training set.
The results of the Wilcoxon test are listed in Table
20
,
including comparisons between the proposed method and
the other methods for the SVM classifier. The statistical
difference (p value \ 0.05) in Table
20
indicates that the
improved performance of the proposed method is
statisti-cally significant.
The results of the Wilcoxon test are listed in Table
21
,
including comparisons between the proposed method and
the other methods for the KELM classifier. The statistical
difference (p value \ 0.05) in Table
21
indicates that the
improved performance of the proposed method is
statisti-cally significant.
Table
22
shows the classification accuracy of the SVM
classifier when a different number of the samples are
selected in each class for the training set.
Table
23
shows the classification accuracy of the KELM
classifier when a different number of the samples are
selected in each class for the training set.
As shown in results, in the Salinas dataset, the proposed
method provides superior performance than FNPA, PCA,
SELF and SDA methods. Furthermore, the proposed
method preserves neighborhood and effectively
discrimi-nates small structures and edges.
5.4 Kennedy space center dataset
For the Kennedy Space Center (KSC) [
31
] dataset, all the
data consist of 1, 2, 3, 4, 5, 10, 20, 30, 50 and 100 samples
randomly selected in each class for learning sets, and the
remaining data constitute the test set.
Classwise classification accuracy on the KSC dataset
when 10 samples are selected in each class for the training
set is given in Online Resource 4. As shown in Online
Resource 4, classwise discriminability of SFNPA is better
than others.
Figure
8
shows the classification maps generated by
SVM classifier on the KSC dataset when 10 samples are
selected in each class for the training set.
Table 19 Overall (OA) classification accuracy (%), standard deviation (STD) and kappa values of the KELM classifier when a different number of samples are selected in each class for training set
NC FNPA PCA SFNPA SELF SDA
OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa 1 49.80 – 0.40 49.78 – 0.41 51.99 – 0.43 32.88 – 0.23 53.67 – 0.45 2 44.63 ± 0.015 0.35 59.51 ± 0.004 0.51 62.60 ± 0.001 0.55 37.68 ± 0.002 0.28 55.86 ± 0.003 0.47 3 41.02 ± 0.012 0.30 62.86 ± 0.005 0.55 65.56 ± 0.001 0.58 42.87 ± 0.002 0.33 59.12 ± 0.005 0.51 4 53.45 ± 0.012 0.43 72.11 ± 0.011 0.65 74.91 ± 0.002 0.69 44.87 ± 0.002 0.35 66.49 ± 0.010 0.58 5 49.22 ± 0.017 0.39 74.70 ± 0.011 0.69 76.67 ± 0.001 0.71 50.96 ± 0.001 0.42 71.38 ± 0.008 0.64 10 44.05 ± 0.038 0.32 83.77 ± 0.017 0.79 86.04 ± 0.002 0.82 63.99 ± 0.006 0.56 77.73 ± 0.012 0.72 20 81.70 ± 0.058 0.76 91.51 ± 0.016 0.89 93.03 ± 0.002 0.91 80.93 ± 0.012 0.76 87.50 ± 0.013 0.84 30 88.06 ± 0.037 0.85 94.12 ± 0.015 0.92 95.34 ± 0.004 0.94 89.19 ± 0.018 0.86 91.88 ± 0.012 0.89 50 92.24 ± 0.022 0.90 96.42 ± 0.015 0.95 97.31 ± 0.004 0.96 92.91 ± 0.024 0.91 94.43 ± 0.017 0.93 100 95.24 ± 0.015 0.94 98.36 ± 0.008 0.98 98.70 ± 0.003 0.98 96.68 ± 0.014 0.96 97.50 ± 0.009 0.97
Figure
9
shows the classification maps generated by
KELM classifier on the KSC dataset when 10 samples are
selected in each class for the training set.
The results of the Wilcoxon test are listed in Table
24
,
including comparisons between the proposed method and
the other methods for the SVM classifier. The statistical
difference (p value \ 0.05) in Table
24
indicates that the
improved performance of the proposed method is
statisti-cally significant.
The results of the Wilcoxon test are listed in Table
25
,
including comparisons between the proposed method and
the other methods for the KELM classifier. The statistical
difference (p value \ 0.05) in Table
25
indicates that the
improved performance of the proposed method is
statisti-cally significant.
Fig. 6 Classification maps generated by SVM classifier on the Salinas Valley dataset
Table
26
shows the classification accuracy of the SVM
classifier when a different number of the samples are
selected in each class for the training set.
Table
27
shows the classification accuracy of the KELM
classifier when a different number of the samples are
selected in each class for the training set.
As shown in results, in the KSC dataset, the proposed
method provides superior performance than FNPA, PCA,
SELF and SDA methods. Furthermore, proposed method
preserves neighborhood and effectively discriminates small
structures and edges.
5.5 MUUFL Gulf Port
For the MUUFL Gulf Port [
32
,
33
] dataset, all the data
consist of 1, 2, 3, 4, 5, 10, 20, 30, 50 and 100 samples
Fig. 7 Classification mapsgenerated by KELM classifier on the Salinas Valley dataset
randomly selected in each class for learning sets, and the
remaining data constitute the test set.
Classwise classification accuracy on the MUUFL Gulf
Port dataset when 10 samples are selected in each class for
the training set is given in Online Resource 5. As shown in
Online Resource 5, classwise discriminability of SFNPA is
better than others.
Figure
10
shows the classification maps generated by
SVM classifier on the MUUFL Gulf Port dataset when 10
samples are selected in each class for the training set.
Figure
11
shows the classification maps generated by
KELM classifier on the MUUFL Gulf Port dataset when 10
samples are selected in each class for the training set.
The results of the Wilcoxon test are listed in Table
28
,
including comparisons between the proposed method and
the other methods for the SVM classifier. The statistical
difference (p value \ 0.05) in Table
28
indicates that the
improved performance of the proposed method is
statisti-cally significant.
The results of the Wilcoxon test are listed in Table
29
,
including comparisons between the proposed method and
the other methods for the KELM classifier. The statistical
difference (p value \ 0.05) in Table
29
indicates that the
improved performance of the proposed method is
statisti-cally significant.
Table
30
shows the classification accuracy of the SVM
classifier when a different number of the samples are
selected in each class for the training set.
Table
31
shows the classification accuracy of the KELM
classifier when a different number of the samples are
selected in each class for the training set.
As shown in results, in the MUUFL Gulf Port dataset,
the proposed method provides superior performance than
FNPA, PCA, SELF and SDA methods. Furthermore, the
Table 20 p values obtained through Wilcoxon tests betweenpro-posed method and other competing methods for the SVM classifier on Salinas dataset
p value versus other methods PCA SELF SDA FNPA NC 1 0.001 0.0001 0.121 0.008 2 0.003 0.0001 0.856 0.0001 3 0.0001 0.0001 0.08 0.0001 4 0.0001 0.0001 0.235 0.0001 5 0.003 0.0001 0.837 0.0001 10 0.0001 0.001 0.457 0.0001 20 0.0001 0.001 0.0001 0.0001 30 0.177 0.004 0.324 0.38 50 0.033 0.052 0.012 0.115 100 0.889 0.001 0.003 0.017
Table 21 p values obtained through Wilcoxon tests between pro-posed method and other competing methods for the KELM classifier on Salinas dataset
p value versus other methods PCA SELF SDA FNPA NC 1 0.001 0.0001 0.121 0.007 2 0.0001 0.0001 0.926 0.0001 3 0.0001 0.0001 0.14 0.0001 4 0.001 0.0001 0.507 0.0001 5 0.0001 0.0001 0.481 0.0001 10 0.0001 0.0001 0.365 0.0001 20 0.0001 0.0001 0.237 0.001 30 0.0001 0.001 0.007 0.021 50 0.001 0.001 0.007 0.258 100 0.031 0.0001 0.001 0.262
Table 22 Overall (OA) classification accuracy (%), standard deviation (STD) and kappa values of the SVM classifier when a different number of samples are selected in each class for training set
NC FNPA PCA SFNPA SELF SDA
OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa 1 68.08 – 0.65 76.27 – 0.74 78.57 – 0.76 37.45 – 0.28 80.76 – 0.79 2 74.66 ± 0.630 0.72 83.01 ± 0.539 0.81 84.25 ± 0.170 0.83 51.37 ± 5.840 0.46 86.23 ± 0.742 0.85 3 77.76 ± 0.577 0.75 86.42 ± 0.567 0.85 87.61 ± 0.142 0.86 56.79 ± 1.520 0.52 90.19 ± 0.593 0.89 4 83.95 ± 0.384 0.82 87.54 ± 0.176 0.86 87.98 ± 0.115 0.87 61.90 ± 3.330 0.58 89.25 ± 0.678 0.88 5 84.02 ± 0.495 0.82 89.01 ± 0.478 0.88 89.65 ± 0.087 0.88 64.88 ± 3.320 0.61 90.09 ± 0.458 0.89 10 82.92 ± 0.428 0.81 91.77 ± 0.392 0.91 92.70 ± 0.059 0.92 82.48 ± 13.600 0.81 92.58 ± 0.420 0.92 20 93.26 ± 0.107 0.92 94.96 ± 0.211 0.94 95.98 ± 0.032 0.96 94.20 ± 7.070 0.94 95.37 ± 0.118 0.95 30 94.47 ± 0.155 0.94 95.38 ± 0.164 0.95 96.11 ± 0.000 0.96 95.87 ± 0.992 0.95 96.43 ± 0.190 0.96 50 95.30 ± 0.120 0.95 97.34 ± 0.116 0.97 98.17 ± 0.017 0.98 97.40 ± 0.329 0.97 97.66 ± 0.101 0.97 100 96.65 ± 0.505 0.96 98.53 ± 0.053 0.98 98.96 ± 0.056 0.99 98.61 ± 0.107 0.98 98.67 ± 0.056 0.99
proposed method preserves neighborhood and effectively
discriminates small structures and edges.
5.6 Urban
For the Urban [
34
–
36
] dataset, all the data consist of 1, 2,
3, 4, 5, 10, 20, 30, 50 and 100 samples randomly selected in
each class for learning sets, and the remaining data
con-stitute the test set.
Classwise classification accuracy on the Urban dataset
when 10 samples are selected in each class for the training
set is given in Online Resource 6. As shown in Online
Resource 6, classwise discriminability of SFNPA is better
than others.
Figure
12
shows the classification maps generated by
SVM classifier on the Urban dataset when 10 samples are
selected in each class for the training set.
Figure
13
shows the classification maps generated by
KELM classifier on the Urban dataset when 10 samples are
selected in each class for the training set.
The results of the Wilcoxon test are listed in Table
32
,
including comparisons between the proposed method and
the other methods for the SVM classifier. The statistical
difference (p value \ 0.05) in Table
32
indicates that the
Table 23 Overall (OA) classification accuracy (%), standard deviation (STD) and kappa values of the KELM classifier when a different number of samples are selected in each class for training setNC FNPA PCA SFNPA SELF SDA
OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa 1 68.44 – 0.65 76.28 – 0.74 78.57 – 0.76 37.45 – 0.28 80.77 – 0.79 2 75.77 ± 0.010 0.73 82.33 ± 0.001 0.80 83.72 ± 0.000 0.82 42.23 ± 0.000 0.35 85.68 ± 0.004 0.84 3 78.77 ± 0.004 0.77 86.05 ± 0.003 0.85 87.33 ± 0.001 0.86 43.37 ± 0.002 0.36 89.79 ± 0.007 0.89 4 84.55 ± 0.006 0.83 86.60 ± 0.003 0.85 87.52 ± 0.001 0.86 47.64 ± 0.002 0.41 88.75 ± 0.003 0.88 5 84.89 ± 0.007 0.83 88.19 ± 0.005 0.87 89.34 ± 0.002 0.88 50.06 ± 0.003 0.44 91.30 ± 0.008 0.90 10 84.69 ± 0.007 0.83 92.70 ± 0.013 0.92 93.25 ± 0.004 0.93 69.80 ± 0.003 0.66 92.93 ± 0.010 0.92 20 93.59 ± 0.002 0.93 95.11 ± 0.005 0.95 95.89 ± 0.002 0.95 91.79 ± 0.001 0.91 95.89 ± 0.005 0.95 30 95.00 ± 0.004 0.94 95.94 ± 0.007 0.95 96.43 ± 0.003 0.96 95.88 ± 0.000 0.95 96.86 ± 0.008 0.97 50 96.06 ± 0.002 0.96 97.59 ± 0.005 0.97 98.15 ± 0.002 0.98 97.56 ± 0.000 0.97 98.12 ± 0.007 0.98 100 97.29 ± 0.003 0.97 98.89 ± 0.004 0.99 99.19 ± 0.003 0.99 98.36 ± 0.003 0.98 99.03 ± 0.005 0.99
Fig. 8 Classification maps generated by SVM classifier on the KSC dataset
Fig. 9 Classification maps generated by KELM classifier on the KSC dataset
Table 24 p values obtained through Wilcoxon tests between pro-posed method and other competing methods for the SVM classifier on KSC dataset
p value versus other methods PCA SELF SDA FNPA NC 1 0.059 0.001 0.629 0.001 2 0.16 0.0001 0.182 0.0001 3 0.776 0.0001 0.08 0.0001 4 0.093 0.0001 0.453 0.0001 5 0.033 0.001 0.887 0.0001 10 0.001 0.005 0.066 0.0001 20 0.006 0.001 0.727 0.0001 30 0.009 0.002 0.005 0.001 50 0.005 0.031 0.029 0.001 100 0.014 0.044 0.014 0.083
Table 25 p values obtained through Wilcoxon tests between pro-posed method and other competing methods for the KELM classifier on KSC dataset
p value versus other methods PCA SELF SDA FNPA NC 1 0.298 0.001 0.449 0.001 2 0.002 0.0001 0.156 0.0001 3 0.001 0.0001 0.142 0.0001 4 0.001 0.0001 0.856 0.0001 5 0.004 0.001 0.079 0.001 10 0.003 0.001 0.66 0.001 20 0.021 0.001 0.078 0.001 30 0.007 0.069 0.76 0.001 50 0.03 0.155 0.107 0.001 100 0.093 0.205 0.093 0.262
improved performance of the proposed method is
statisti-cally significant.
The results of the Wilcoxon test are listed in Table
33
,
including comparisons between the proposed method and
the other methods for the KELM classifier. The statistical
difference (p value \ 0.05) in Table
33
indicates that the
improved performance of the proposed method is
statisti-cally significant.
Table
34
shows the classification accuracy of the SVM
classifier when a different number of the samples are
selected in each class for the training set.
Table
35
shows the classification accuracy of the KELM
classifier when a different number of the samples are
selected in each class for the training set.
As shown in results, in the Urban dataset, the proposed
method provides superior performance than FNPA, PCA,
SELF and SDA methods. Furthermore, the proposed
method preserves neighborhood and effectively
discrimi-nates small structures and edges.
5.7 Samson
For the Samson [
34
–
36
] dataset, all the data consist of 1, 2,
3, 4, 5, 10, 20, 30, 50 and 100 samples randomly selected in
each class for learning sets, and the remaining data
con-stitute the test set.
Classwise classification accuracy on the Samson dataset
when 10 samples are selected in each class for the training
Table 26 Overall (OA) classification accuracy (%), standard deviation (STD) and kappa values of the SVM classifier when a different number of samples are selected in each class for training setNC FNPA PCA SFNPA SELF SDA
OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa 1 57.87 – 0.53 76.61 – 0.74 77.06 – 0.74 47.26 – 0.41 78.18 – 0.76 2 60.63 ± 0.000 0.57 87.93 ± 0.000 0.87 88.42 ± 0.000 0.87 65.22 ± 0.000 0.61 87.49 ± 0.000 0.86 3 62.33 ± 1.220 0.58 92.53 ± 1.600 0.92 93.87 ± 0.499 0.93 74.54 ± 1.670 0.72 90.51 ± 0.657 0.89 4 66.99 ± 1.380 0.64 93.72 ± 1.570 0.93 94.80 ± 0.814 0.94 80.60 ± 2.200 0.78 92.27 ± 1.200 0.91 5 71.19 ± 0.266 0.68 95.52 ± 0.650 0.95 96.14 ± 0.963 0.96 86.03 ± 2.950 0.84 93.23 ± 0.205 0.92 10 77.18 ± 0.455 0.75 97.70 ± 0.370 0.97 98.50 ± 0.455 0.98 93.47 ± 1.340 0.93 97.37 ± 0.286 0.97 20 90.22 ± 0.245 0.89 99.26 ± 0.228 0.99 99.65 ± 0.326 1.00 98.01 ± 0.222 0.98 99.13 ± 0.293 0.99 30 95.77 ± 0.221 0.95 99.54 ± 0.154 0.99 99.85 ± 0.228 1.00 98.87 ± 0.091 0.99 99.52 ± 0.094 0.99 50 99.11 ± 0.161 0.99 99.74 ± 0.125 1.00 99.94 ± 0.184 1.00 99.63 ± 0.062 1.00 99.85 ± 0.113 1.00 100 99.93 ± 0.073 1.00 99.98 ± 0.050 1.00 99.99 ± 0.088 1.00 99.91 ± 0.056 1.00 99.98 ± 0.041 1.00
Table 27 Overall (OA) classification accuracy (%), standard deviation (STD) and kappa values of the KELM classifier when a different number of samples are selected in each class for training set
NC FNPA PCA SFNPA SELF SDA
OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa 1 59.65 – 0.55 76.68 – 0.74 77.06 – 0.74 47.28 – 0.41 78.90 – 0.77 2 61.75 ± 0.031 0.58 87.15 ± 0.005 0.86 88.14 ± 0.000 0.87 58.73 ± 0.001 0.54 87.22 ± 0.005 0.86 3 62.49 ± 0.034 0.58 92.72 ± 0.004 0.92 93.78 ± 0.000 0.93 65.22 ± 0.001 0.62 90.86 ± 0.007 0.90 4 68.87 ± 0.033 0.66 93.63 ± 0.010 0.93 94.62 ± 0.000 0.94 73.78 ± 0.005 0.71 92.44 ± 0.012 0.92 5 72.79 ± 0.041 0.70 95.14 ± 0.010 0.95 95.99 ± 0.000 0.96 80.43 ± 0.004 0.78 92.67 ± 0.008 0.92 10 77.39 ± 0.017 0.75 98.10 ± 0.010 0.98 98.51 ± 0.000 0.98 93.00 ± 0.007 0.92 97.54 ± 0.010 0.97 20 90.30 ± 0.051 0.89 99.49 ± 0.004 0.99 99.68 ± 0.000 1.00 98.97 ± 0.014 0.99 99.29 ± 0.005 0.99 30 95.92 ± 0.013 0.95 99.73 ± 0.003 1.00 99.87 ± 0.000 1.00 99.20 ± 0.009 0.99 99.83 ± 0.003 1.00 50 99.19 ± 0.003 0.99 99.93 ± 0.002 1.00 99.97 ± 0.000 1.00 99.87 ± 0.003 1.00 99.96 ± 0.001 1.00 100 99.95 ± 0.000 1.00 99.99 ± 0.000 1.00 100.0 ± 0.000 1.00 99.99 ± 0.000 1.00 100.0 ± 0.000 1.00
set is given in Online Resource 7. As shown in Online
Resource 7, classwise discriminability of SFNPA is better
than others.
Figure
14
shows the classification maps generated by
SVM classifier on the Samson dataset when 10 samples are
selected in each class for the training set.
Figure
15
shows the classification maps generated by
KELM classifier on the Samson dataset when 10 samples
are selected in each class for the training set.
The results of the Wilcoxon test are listed in Table
36
,
including comparisons between the proposed method and
the other methods for the SVM classifier. The statistical
difference (p value \ 0.05) in Table
36
indicates that the
improved performance of the proposed method is
statisti-cally significant.
The results of the Wilcoxon test are listed in Table
37
,
including comparisons between the proposed method and
the other methods for the KELM classifier. The statistical
difference (p value \ 0.05) in Table
37
indicates that the
improved performance of the proposed method is
statisti-cally significant.
Table
38
shows the classification accuracy of the SVM
classifier when a different number of the samples are
selected in each class for the training set.
Table
39
shows the classification accuracy of the KELM
classifier when a different number of the samples are
selected in each class for the training set.
As shown in results, in the Samson dataset, the proposed
method provides superior performance than FNPA, PCA,
SELF and SDA methods. Furthermore, the proposed
method preserves neighborhood and effectively
discrimi-nates small structures and edges.
5.8 Jasper
For the Jasper [
34
–
36
] dataset, all the data consist of 1, 2,
3, 4, 5, 10, 20, 30, 50 and 100 samples randomly selected in
each class for learning sets, and the remaining data
con-stitute the test set.
Fig. 11 Classification maps generated by KELM classifier on the MUUFL Gulf Port dataset Table 28 p values obtained through Wilcoxon tests between
pro-posed method and other competing methods for the SVM classifier on MUUFL Gulf Port dataset
p value versus other methods PCA SELF SDA FNPA NC 1 0.002 0.004 0.01 0.021 2 0.001 0.003 0.028 0.003 3 0.001 0.001 0.018 0.001 4 0.002 0.001 0.057 0.001 5 0.001 0.001 0.293 0.001 10 0.001 0.006 0.083 0.001 20 0.001 0.057 0.798 0.001 30 0.001 0.003 0.164 0.001 50 0.002 0.001 0.551 0.003 100 0.001 0.057 0.094 0.001
Table 29 p values obtained through Wilcoxon tests between pro-posed method and other competing methods for the KELM classifier on MUUFL Gulf Port dataset
p value versus other methods PCA SELF SDA FNPA NC 1 0.002 0.002 0.016 0.41 2 0.001 0.001 0.011 0.029 3 0.001 0.002 0.003 0.001 4 0.001 0.001 0.025 0.001 5 0.001 0.001 0.01 0.002 10 0.001 0.003 0.005 0.006 20 0.001 0.201 0.029 0.106 30 0.002 0.029 0.013 0.065 50 0.001 0.01 0.038 0.033 100 0.003 0.045 0.029 0.005
Classwise classification accuracy on the Jasper dataset
when 10 samples are selected in each class for the training
set is given in Online Resource 8. As shown in Online
Resource 8, classwise discriminability of SFNPA is better
than others.
Figure
16
shows the classification maps generated by
SVM classifier on the Jasper dataset when 10 samples are
selected in each class for the training set.
Figure
17
shows the classification maps generated by
KELM classifier on the Jasper dataset when 10 samples are
selected in each class for the training set.
The results of the Wilcoxon test are listed in Table
40
,
including comparisons between the proposed method and
the other methods for the SVM classifier. The statistical
difference (p value \ 0.05) in Table
40
indicates that the
improved performance of the proposed method is
statisti-cally significant.
The results of the Wilcoxon test are listed in Table
41
,
including comparisons between the proposed method and
the other methods for the KELM classifier. The statistical
difference (p value \ 0.05) in Table
41
indicates that the
improved performance of the proposed method is
statisti-cally significant.
Table
42
shows the classification accuracy of the SVM
classifier when a different number of the samples are
selected in each class for the training set.
Table 30 Overall (OA) classification accuracy (%), standard deviation (STD) and kappa values of the SVM classifier when a different number of samples are selected in each class for training set
NC FNPA PCA SFNPA SELF SDA
OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa 1 31.22 – 0.22 23.39 – 0.12 42.67 – 0.34 13.64 – 0.06 27.52 – 0.19 2 31.83 ± 0.000 0.22 31.07 ± 0.000 0.20 54.24 ± 0.000 0.44 23.33 ± 0.000 0.13 40.81 ± 0.000 0.3 3 35.93 ± 1.230 0.26 40.21 ± 1.160 0.29 60.20 ± 0.555 0.51 31.49 ± 1.510 0.20 52.14 ± 2.440 0.42 4 30.48 ± 1.580 0.21 37.04 ± 2.060 0.28 63.27 ± 1.170 0.55 34.70 ± 2.070 0.24 48.85 ± 1.370 0.4 5 37.73 ± 1.000 0.26 49.39 ± 1.840 0.39 66.28 ± 0.874 0.58 45.36 ± 1.380 0.33 62.3 ± 1.200 0.53 10 58.42 ± 1.070 0.49 57.78 ± 0.985 0.48 74.12 ± 0.000 0.67 61.01 ± 1.050 0.52 69.87 ± 0.948 0.62 20 68.04 ± 0.819 0.60 65.93 ± 0.630 0.58 76.59 ± 0.244 0.70 73.16 ± 0.646 0.66 74.39 ± 0.695 0.68 30 73.16 ± 0.713 0.66 71.67 ± 0.931 0.65 78.71 ± 0.435 0.73 76.57 ± 0.613 0.70 78.49 ± 0.514 0.73 50 77.47 ± 0.380 0.71 76.42 ± 0.456 0.70 81.41 ± 0.429 0.76 78.57 ± 0.317 0.73 80.43 ± 0.557 0.75 100 80.63 ± 0.359 0.75 83.21 ± 0.304 0.79 85.44 ± 0.568 0.81 84.97 ± 0.614 0.81 85.53 ± 0.388 0.81
Table 31 Overall (OA) classification accuracy (%), standard deviation (STD) and kappa values of the KELM classifier when a different number of samples are selected in each class for training set
NC FNPA PCA SFNPA SELF SDA
OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa 1 36.76 – 0.26 23.49 – 0.13 47.32 – 0.37 13.64 – 0.06 27.53 – 0.19 2 35.91 ± 0.027 0.25 23.76 ± 0.001 0.18 56.47 ± 0.000 0.46 20.04 ± 0.000 0.10 38.49 ± 0.023 0.28 3 39.85 ± 0.018 0.29 38.05 ± 0.003 0.26 63.07 ± 0.001 0.54 24.11 ± 0.004 0.14 49.81 ± 0.029 0.4 4 32.04 ± 0.009 0.23 33.18 ± 0.006 0.24 65.04 ± 0.002 0.57 24.98 ± 0.006 0.15 44.81 ± 0.032 0.36 5 41.19 ± 0.026 0.28 42.50 ± 0.007 0.32 69.24 ± 0.001 0.62 27.98 ± 0.007 0.19 57.49 ± 0.025 0.48 10 58.86 ± 0.028 0.49 47.81 ± 0.017 0.39 77.33 ± 0.005 0.71 48.25 ± 0.015 0.39 64.02 ± 0.038 0.56 20 70.53 ± 0.039 0.63 60.49 ± 0.014 0.52 79.24 ± 0.005 0.74 72.14 ± 0.015 0.65 72.6 ± 0.035 0.66 30 75.46 ± 0.039 0.69 67.58 ± 0.015 0.60 81.03 ± 0.006 0.76 74.28 ± 0.022 0.68 77.58 ± 0.030 0.72 50 78.39 ± 0.033 0.73 73.65 ± 0.018 0.67 83.41 ± 0.008 0.79 77.59 ± 0.024 0.72 79.56 ± 0.031 0.74 100 82.52 ± 0.037 0.78 81.28 ± 0.018 0.76 86.14 ± 0.011 0.82 84.09 ± 0.024 0.80 85.25 ± 0.031 0.81
Table
43
shows the classification accuracy of the KELM
classifier when a different number of the samples are
selected in each class for the training set.
As shown in results, in the Jasper dataset, the proposed
method provides nearly superior performance than FNPA,
PCA, SELF and SDA methods while training set has more
than 10 samples for each class. Due to dataset does not
have groundtruth data, the class with the highest
mem-bership value in the unmixing data is considered as the
groundtruth data. Furthermore, the proposed method
pre-serves neighborhood and effectively discriminates small
structures and edges.
Fig. 12 Classification maps generated by SVM classifier on the Urban dataset
Fig. 13 Classification maps generated by KELM classifier on the Urban dataset
Table 32 p values obtained through Wilcoxon tests between proposed method and other competing methods for the SVM classifier on Urban dataset
p value versus other methods PCA SELF SDA FNPA
NC 1 0.343 0.009 1.000 1.000 2 0.236 0.009 0.813 0.155 3 0.541 0.006 0.041 0.006 4 0.541 0.011 0.067 0.006 5 0.041 0.011 0.014 0.006 10 0.053 0.011 0.032 0.006 20 0.019 0.011 0.011 0.006 30 0.011 0.011 0.025 0.006 50 0.006 0.032 0.019 0.006 100 0.025 0.262 0.032 0.185
Table 33 p values obtained through Wilcoxon tests between proposed method and other competing methods for the KELM classifier on Urban dataset
p value versus other methods PCA SELF SDA FNPA
NC 1 0.154 0.008 0.76 0.838 2 0.476 0.006 0.221 0.359 3 0.476 0.006 0.053 0.008 4 0.919 0.008 0.083 0.008 5 0.308 0.008 0.019 0.008 10 0.359 0.006 0.053 0.008 20 0.610 0.008 0.683 0.008 30 0.415 0.008 0.76 0.008 50 0.359 0.008 0.185 0.008 100 0.541 0.006 0.359 0.025
Table 34 Overall (OA) classification accuracy (%), standard deviation (STD) and kappa values of the SVM classifier when a different number of samples are selected in each class for training set
NC FNPA PCA SFNPA SELF SDA
OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa OA (%) STD Kappa 1 41.89 – 0.28 47.74 – 0.33 47.90 – 0.33 23.38 – 0.07 44.73 – 0.33 2 44.98 ± 0.000 0.33 54.77 ± 0.000 0.43 51.98 ± 0.000 0.40 26.83 ± 0.000 0.11 52.62 ± 0.000 0.40 3 44.53 ± 4.980 0.30 55.75 ± 0.674 0.44 56.47 ± 0.194 0.44 34.29 ± 2.980 0.18 48.91 ± 2.440 0.37 4 39.86 ± 3.280 0.24 59.52 ± 1.940 0.48 60.72 ± 0.289 0.49 35.93 ± 3.680 0.20 56.15 ± 1.390 0.44 5 42.26 ± 1.700 0.27 59.01 ± 1.280 0.48 61.68 ± 0.165 0.51 42.51 ± 2.900 0.26 55.64 ± 0.951 0.44 10 42.32 ± 0.883 0.29 62.36 ± 1.730 0.52 65.13 ± 0.153 0.55 48.81 ± 2.330 0.33 60.39 ± 1.290 0.49 20 37.63 ± 0.588 0.23 64.48 ± 1.040 0.54 65.65 ± 0.774 0.56 57.56 ± 2.890 0.46 63.28 ± 1.120 0.53 30 32.94 ± 0.787 0.18 65.05 ± 0.884 0.55 66.68 ± 0.035 0.57 60.61 ± 1.570 0.49 64.73 ± 0.943 0.54 50 58.47 ± 0.713 0.47 65.87 ± 0.642 0.56 68.05 ± 0.240 0.59 63.47 ± 0.608 0.53 65.70 ± 0.746 0.55 100 67.38 ± 0.446 0.58 67.80 ± 0.500 0.58 68.74 ± 2.510 0.60 68.06 ± 0.643 0.59 68.19 ± 0.485 0.59