A THESIS SUBMITTED TO THE GRADUATE SCHOOL OF APPLIED SCIENCES

(1)

PLANTS CLASSIFICATION USING SVM AND KNN CLASSIFIERS

A THESIS SUBMITTED TO THE GRADUATE SCHOOL OF APPLIED SCIENCES

OF

NEAR EAST UNIVERSITY

By

ABDALLAH KHALED ALZOUHBI

In Partial Fulfillment of the Requirements for the Degree of Master of Science

in

Mechatronics Engineering

NICOSIA, 2017

ABDALLAH ALZOUHBIPLANTS CLASSIFICATION USNIG

SVM AND KNN CLASSIFIERS NEU 2017

(2)

PLANTS CLASSIFICATION USING SVM AND KNN CLASSIFIERS

A THESIS SUBMITTED TO THE GRADUATE SCHOOL OF APPLIED SCIENCES

OF

NEAR EAST UNIVERSITY

By

ABDALLAH KHALED ALZOUHBI

In Partial Fulfillment of the Requirements for The Degree of Master of Science

in

Mechatronics Engineering

NICOSIA, 2017

(3)

Abdallah Khaled ALZOUHBI: PLANTS CLASSIFICATION USING SVM AND KNN CLASSIFIERS

Approval of Director of Graduate School of Applied Sciences

Prof.Dr.Nadire CAVUS

We certify this thesis is satisfactory for the award of the degree of Masters of Science in Mechatronics Engineering

Examining Committee in Charge:

Prof. Dr. Rahib H. ABIYEV Head of Department of Computer Engineering, NEU

Prof. Dr. Kamil Dimililer Committee member, Department of Electrical Engineering, NEU

Assist. Prof. Dr. Elbrus Imanov Supervisor, Department of Computer Engineering, NEU

(4)

I hereby affirm that all information in this document has been acquired and offered in agreement with academic regulations and ethical conduct. I also affirm that, as demented by these regulations and conduct, I have completely cited and referenced all material and outcomes that are not unique to this job.

Name, Last name: Abdallah ALZOUHBI Signature:

Date:

(5)

i

ACKNOWLEDGEMENTS

It is a happiness to show appreciation everybody who let my thesis be achievable. Firstly let me thanks my supervisor Assist.Prof.Dr. Elbrus Imanov who helped me in several ways to finish this job, The discussions I had with him were very precious.I have also to mentioned that this job would not be achieved without the assistance of my department chairperson Prof.Dr. Bulent Bilgehan who gave me the opportunity and the chance to finish up this project so i must thanks him for his efforts, the direction and guidance by him has been sufficiently useful conjointly for his supporting to me in many ways. Special gratitude to my thesis committee members for having here and for their attendance and support. My last words are heading to my family and my parents for so let me thanks them and mentioned their assistance and support that aid me to achieve my aims.

(6)

ii

ABSTRACT

Nowadays , digital image processing, artificial neural network and machine visualization have been pettishly progressing, and they cover a significant side of artificial cleverness and the rule among human being and electro-mechanical devices .

These technologies have been utilized in wide range in agricultural operations, medicine and manufacture. In this assignment, the preparation of some functions has been done.

The classification of maize leaves from pictures reveal many conditions, opening among pictures pre-processing, feature taking out, plant recognition, matching and training and lastly getting the outcomes executed in MATLAB.

These giving features are separated of leaf maturity and pictures interpretation, rotary motions and calibration and they are calculated to develop an approach that gives as results the best classification algorithm.

While a plant scientist may be introduce with a organism for registration with a plant classes revealed in its natural home ground, to be known and registrant latter, this submission follow at supplying an in depth recognition.

Keywords: Digital image processing; artificial neural network; machine visualization;

classification; MATLAB; recognition

(7)

iii ÖZET

Şu günlerde, Dijital görüntü işleme, yapay sinir ağları ve makine görselleştirme, peterce bir şekilde ilerlemektedir ve insan ve elektro-mekanik cihazlar arasındaki yapay akıllılık ve kuralın önemli bir bölümünü kapsar. Bu teknolojiler, tarımsal işlemler, ilaçlar ve üretim alanlarında geniş bir yelpazede kullanılmaktadır. Bu ödevde bazı işlevlerin hazırlanması yapılmıştır. Mısır yapraklarının resimlerden sınıflandırılması, resim öncesi işleme, özellik çıkarılması, bitki tanıma, eşleştirme ve eğitim arasında açılan ve sonuç olarak MATLAB'da yürütülen sonuçları almak için birçok koşul ortaya koymaktadır. Bu verici özellikler, yaprak olgunluğu ve resim yorumlaması, dönme hareketi ve kalibrasyondan ayrılır ve en iyi sınıflandırma algoritmasını sonuç olarak veren bir yaklaşım geliştirmek için hesaplanır.

Bir bitki bilimcisi doğal bir ev sahasında ortaya çıkarılan bitki sınıflarıyla kayıt için bir organizma ile tanıtılabilirken, bu bitki sınıfının bilinmesi ve tescillenmesi için, bu sunum derinlemesine bir tanımlama temin ederek devam eder.

Anahtar Kelimeler: Dijital görüntü işleme; yapay sinir ağı; makine görselleştirme;

sınıflandırması; MATLAB;tanımlama

(8)

iv

TABLE OF CONTENTS

ACKNOWLEDGMENTS………..……… i

ABSTARCT………...………. ii

ÖZET……….……….…….…... iii

TABLE OF CONTENTS………..………..………..…... iv

LIST OF TABLES………..………... vii

LIST OF FIGURES………...………... viii

LIST OF ABBREVIATIONS……….………... ix

CHAPTER 1: INTRODUCTION 1.1 Computer Vision ... 1

1.2 Measurement/Extraction of Features ... 1

1.3 Pattern Classification... 2

1.4 Pattern Recognition ... 2

1.5 Applications of Computer Vision ... 3

1.6 Thesis Organization ... 3

CHAPTER 2: RELATED WORK 2.1 Neural-network and Statistical Classifier Technology………..………..… 6

2.2 Machine Learning Algorithm……….………….... 11

2.3 Feature Extraction + Domain Knowledge ... 12

2.4 Feature Selection ... 12

2.5 Choice of Algorithm ... 12

2.6 Training ... 12

2.7 Choice of Metrics/Evaluation Criteria ... 13

2.8 Testing ... 13

2.9 Qualifications……….…… 13

(9)

v

2.9.1 Other benefits……….………..…..…… 14

2.10 Methodology……….………... 14

CHAPTER 3: MACHINE LEARNING TECHNIQUES 3.1 Supervised Learning ... 17

3.1.1 Classification techniques... 17

3.1.2 Regression techniques ... 18

3.1.3 Steps in Supervised Learning ... 18

3.1.4 Set Data ... 19

3.1.5 Choose an Algorithm ... 19

3.1.6 Fit a Model ... 19

3.1.7 Choose a Validation Method ... 20

3.2 Characteristics of Classification Algorithms... 21

3.3 Categorical Predictor Support ... 22

3.4 Unsupervised Learning... 23

3.4.1 Clustering ... 24

3.4.2 How Do You Decide Which Machine Learning Algorithm to Use? ... 25

CHAPTER 4: Machine Learning With MATLAB 4.1. Materials and Methods……….………. 28

4.1.1 Image Database ... 28

4.1.2 Plants Classification ... 28

4.1.3 Template Matching ... 29

4.1.5 K-nearest Neighbors Algorithm (k-NN) ... 30

4.1.6 KNN for Classification ... 30

4.1.7 When do We Use KNN Algorithm? ... 32

(10)

vi CHAPTER 5 : EXPERIMENTAL RESULTS

5.1 MATLAB Work ... 39

5.1.1 Load Image Data ... 39

5.1.2 Display Class Names and Counts ... 39

5.1.3 Display Sampling of Image Data ... 40

5.1.4 Images Separation into a Training Set and Test Set. ... 41

5.1.5 Create Visual Vocabulary ... 41

5.1.6 Visualize Exctracted Features Vectors ... 42

5.1.7 Create a Table Using the Encoded Features ... 42

5.1.8 Use features to train a model using different classifiers ... 43

5.1.9 Test out Accuracy on Test Set ... 45

5.1.10 Visualize How the Classifier Works: ... 50

CHAPTER 6: DISCUSSION AND CONCLUSION 6.1 Discussion………...………...….. 51

6.2 Conclusion………..…….………….. 51

REFERENCES…………..………...…………. 53

APPENDICES Appendix 1: Matlab Code……….………... 56

Appendix 2: Neural Network Daily Using……….………….. 59

(11)

vii

LIST OF TABLES

Table 3.1: Characteristics of Classification Algorithms ... 21

Table 3.2: Data-type ... 23

Table 4.1: KNN example ... 32

Table 5.1: Classification ratio………. 46

Table 5.2: Classification ratio (percentage)... 47

Table 5.3: Optimal parameter setting for the local pattern operators... 48

(12)

viii

LIST OF FIGURES

Figure 2.1: Flowchart the projected methodology. ... 6

Figure 3.1: Machine learning techniques……. ... 16

Figure 3.2: Clustering ... 24

Figure 3.3: Machine learning techniques. ... 25

Figure 4.1: Maize plant. ... 28

Figure 4.2: Non maize plants ... 28

Figure 4.3: KNN samples(1) ... 33

Figure 4.4: KNN samples(2) ... 34

Figure 4.5: K factor samples ... 35

Figure 4.6: K factor histogram ... 36

Figure 4.7: K factor histogram ... 37

Figure 5.1: Main stages of the system ... 39

Figure 5.2: Display class names and counts ... 40

Figure 5.3: Display sampling of image data ... 40

Figure 5.4: Extracted features... 42

Figure 5.5: Encoded features ... 42

Figure 5.6: New features to train a model ... 43

Figure 5.7: Scene image data ... 43

Figure 5.8: KNN and SVM accuracies ... 44

Figure 5.9: Confusion matrix ... 45

Figure 5.10: Comparison of existing methods against LDP. ... 49

Figure 5.11: Accuracy ... 49

Figure 9.12: Results……….………....……….………... 50

(13)

ix

LIST OF ABBREVIATIONS

K-NN: K-nearest Neighbors Algorithm SVM: Support Vector Machine ANN: Artificial Neural Network NN: Neural Network

CCM: Co-Occurrence Mechanism SOM: Self-Arranging Map

CBIR: Content Based Generally Picture Recovery ECOC: Error Correcting Output Codes

BS: Blue Stellar RC: Red Rings

GS: Green Rectangles

LDP: Loader Debugger Protocol LTP: Long Term Potentiating

(14)

1 CHAPTER 1 INTRODUCTION

1.1.Computer Vision

This assumption focuses on automatic identification throughout computer visualization and exploitation machine learning. Machine learning doubts with the speculation behind artificial systems that extort info from pictures. The picture information has several forms which might be taken, like video progression, views from many cameras, or multi-dimensional information from a medicinal scanner. In alternative terms, machine learning and computer vision is that the science and machinery of machines that have the power to check and acknowledge.

Snyder describes the expression computer vision as “The method where by a machine, usually a computing mechanism, mechanically processes a sets image and inform ‘what is within the the image’, it work on understand the content of the image.

Most likely the import is also a machine half, and also the purpose isn't solely to discover the purpose isn't solely to discover the half, however to look at it also.”

computer vision, collectively referred to as machine vision, consists of 3 parts : activity of of 3 parts: activity of options, pattern classification confirming those options, and pattern recognition. This proposal was target-hunting to expand a structure that extracts entirely different options from a leaf picture and grouped different categories of leaves confirming the extracted options. Moreover, my system uses the results of category the category theme to spot the class of latest leaf images.

1.2. Measurement/Extraction of Features

Image processing technology is used to export a set of features which describe or symbolize the image. The amount of these features supply a brief demonstration about the information in

(15)

2

such a figure. Example : a set of features that describe a triangle might be the span of every part of the that triangle.

1.3. Pattern Classification

It is the association of patterns into sets of patterns having the same set of possession. Given a group of measurements of an unidentified item and the information of probable classes to which an item might belong to, a choice about to which class the unidentified object belongs can be made. Example, if an information about the length of sides of an indeterminate triangle is extracted, a categorization on whether the unidentified triangle is an equilateral, isosceles or scalene triangle can be completed. Normally, if a group of features/dimensions is extracted from a leaf, a decision about the probable group of the leaf can be specified. Pattern categorization may be numerical or syntactic.

Statistical classification is the categorization of individual objects into sets based on quantitative Data of one or more features/dimensions of the object and based on a training set of formerly classified objects. Example for this type of categorization is "clustering";

This survey uses clustering for pattern classification.

Syntactic classification (Structural classification) is the classification of independent items based on a organization in the prototype of the measurements. Objects are classified syntactically just if there is a clear arrangement in the pattern of the dimensions measured.

1.4. Pattern Recognition

Pattern detection is the procedure of classifying information or patterns grounded on the data/information extracted from patterns. The sample to be recognized are most likely sets of dimensions or interpretation presenting points in an suitable multidimensional room.

Consequently in this proposal, pattern detection is implemented on a set of trial pictures in sequence to confirm and estimate the act of the underlying categorization plan.

(16)

3 1.5. Applications of Computer Vision

Several applications of machine learning computer vision are features recognition, finger print identification, image-based penetrating, visual character detection, remote sensing, and numeral plate detection.

This thesis is extremely enthused by the real world implementation of machine learning using diverse sorts of classifiers.

The primary concept of the majority of these technologies is mechanization which is an interdisciplinary concept that utilize technologies in the computer world to make simpler.

Multifaceted issues in other propriety or in daily life. This proposal focuses on utilizing image processing and artificial neural network to mechanize categorization and perform plant detection based on the pictures of their leaves. Automatic plant categorization and identification can help botanists in their research as well as assist laymen to classify and study plants more simply and more intensely.

Several shape-associated features were extracted from these pictures using image processing and neural network method.

Counting on these options, a applied math categorization of plants was conducted. The organization theme was then valid employing a set of check pictures.

1.6. Thesis Organization

This thesis is organized into six chapters: Introduction, Related Work, Machine learning with MATLAB, Over View, Experimental results, Discussion and Conclusion.

The introduction chapter gives a short definition about the general idea of this thesis, and describe briefly what coming next.

The Related Work chapter gives a short preview on similar studies that have been done previously and which may be grouped into different categories.

Machine learning algorithm chapter explain what machine learning algorithm is, how does it work, and why we need it in image processing.

(17)

4

Qualifications chapter explain the specifications and motivations of using Machine Learning and Artificial Neural Network.

Methodology chapter describes and explain different image processing, Neural Network and feature extraction techniques used and the classification algorithm implemented for correctly identifying the plants based on their leaves.

The Over View went more deeply, and explain different methodologies of image classification and recognition by its two parts Supervised and Non-Supervised learning algorithms.

Machine Learning with MATLAB chapter describe how to use MATLAB application to make your own machine learning classifier, and explain the relation between MATLAB and image processing algorithms.

Materials and methods chapter explain the using materials which help finishing and achieving this study. And it describe every related step such as, k-nearest neighbors algorithm (k-NN), Support Vector Machine, Template Matching plants, Classification Image Database…..

Experimental results chapter, this chapter shows the final results and analyze them. Finally, the discussion and Conclusions chapter summarizes the outcomes of this study, and discusses the deference between SVM and KNN classifiers behave and accuracies.

(18)

5 CHAPTRE 2 RELATED WORK

To many researches are done making an attempt to spot plants. Some studies determine the plants supported plant image color bar chart, edge options and its texture knowledge.

They additionally sorted the plants as trees, shrubs and herbs victimization complication classifier algorithms. but this thesis work on creating straightforward an easy straightforward}

approach by simply considering leaf details victimization simple Support Vector Machine Classifier (SVM) and K-Nearest Neighbor (KNN) rule for image classification while not several complications.

Several researchers have counsel several strategies for locating out the world of the leaf in a picture. Out of those my work uses an easy and a strong space computation by victimization a different item as indication. Away from the various edge recognition mechanism, this projected job based on SOBEL edge finding regulation that find our the limits prototype with accomplishment.

(19)

6

Figure 2.1: Flowchart the projected methodology

2.1. Neural-Network and Statistical Classifier Technology

Neural-networks four-sided figure compute earned recognition for numerous rational application in manufacturing field . . They want performed predominantly well as classifiers classifiers for image-processing implementations and as achieve estimators for longitudinal and non-longitudinal completion.

A revise by Lee and Slaughter (1995) mentioned the practical blueness of mistreatment synthetic neural-network joint with hard-ware used for rising process rate and leaf classification ratio. An instance period sensible automatic wild plant structure used for tomatoes had been expended and experienced for choosy spraying of in-row weed employed a mechanism vision structure and accuracy substance application. This survey decelerate that latest options required to be present and urbanized used for superior identification of the set of

Input leaf image fromDatabase

I agePre‐processi g HistogramEqualization

Leaf EdgeDetection

FeatureExtraction

Classification

(20)

7 tomato. by means of the hardware- base on neural.

Network, just 34.7% of tomato cotyledons, 36.4% of tomato right plants, and 78.6% of weed were correctly legendary. A development via principle et al. (1998), synthetic neural networks was applied for pictures classification of yield and grass in a very meadow of maize.

Artificial Neural Network form was examining to tell apart amidst maize vegetation and grass by utilizing the color directory starting of every picture element. The outcomes illustrate the likelihood of Artificial Neural networks for correct and quick image process and detection.

The correctness magnitude relation of figure identification was as aloft as 91 to 99% for maize interval. Several weed types have carmine stem, however stem of corn and leguminous plant bean square measure inexperienced. These color lineaments were utilized by ELL- FAEKI et al. (2001) in a very important research to line up a straightforward grass classification methodology employing a color machine-visualization method.

This methodology was additional sensible than texture- or shape-based strategies as a result of its small compassion to cover extend beyond, leaf direction, camera centering, and airstream impact. An applied mathematics classifier was produced supported discriminate analysis (DA) and 2 Artificial Neural Networks classifiers. The outcomes displayed that the applied mathematics prosecuting attorney classifier was additional correct than the neural network classifiers in categorization rigor. the smallest amount squares suggests that of the categorization ratios mistreatment the prosecuting attorney classifiers for leguminous plant bean and wheat were fifty four.

nine and 59.9%, severally. The wrong categorization ratios for many weeds varieties were under third. Color co-incidence methodology (CCM) surface info that utilized by Burks et al. (2000a) as enter factors for Associate in Nursing BP neural-network weeds categorization form. The survey calculable categorization correctness as a perform of configuration,

and coaching parameter choice. additionally, coaching cycle necessities and coaching repea tability were studied. the simplest symmetrical BP network achieved a ninety four.8%

categorization correctness for a sample composing of eleven inputs, 5 nodes at every of the 2 hidden layers and 7 output nodes. A pointed strategy out-performed all different BP

(21)

8

topologies with Associate in Nursing overall correctness of ninety six,and entity category accurateness of ninetieth or additional. Mosshou ett ael. (2003) planned a brand new neural-network design. The self-organizing map (SOM).

Neural-implementations and as perform estimators for each linear and non-linear implementations. a probe by Liee and Sleaughter (1997) presented the practical blueness of mistreatment a synthetic neural-network in common with smalls for rising process speed and plant classification rate. A time period sensible robotic weed system for tomatoes had been exhausted and tested for selective spraying of in-row weeds employing a machine vision methodology and exactness chemical applications. This study mentioned that new options required to be developed for a best recognition of tomato plants. With the hardware-based neural-network, only 38.9% of tomato cotyledons, 37.5%

of tomato true leaves, and 85.7% off weeds were justifiably known. within the project by principle et al. (1998), artificial neural-networks (ANN) were used for image classification of yields and weeds in a very field of maize.

The ANN model was trained to differentiate between maize plants and weeds by mistreatment the color index from every picture element. the result show the potential of ANNs for correct and fast image process and recognition. The accuracy magnitude relation of image recognition was as high as 90–100% for maize and 60–70% for weeds, have carmine stems, however stems of wheat and leguminous plant bean square measure inexperienced.

categorization accurateness used for a form made by eleven inputs, 5 nodes at every of the 2 unknown layers and 6 output nodes. A tapering topology out-complete all different BP topologies with associate degree in general correctness of 96.7% and entity category accuracies of 90% or more. Mooshou ett ael. (2003) projected a replacement neural-network design. The self-organizing map (SOM) neural- implementations and as perform estimators for each linear and non-linear implementations.

A research by Lea and Slaeughter (1997) offered the practical blueness of exploitation a man-made Neural Network shared with smalls for rising process velocity and plant classification ratio. A moment clever automatic wild plant method for tomatoes had been

(22)

9

created and experienced for choosy spraying of in row grasses employing a mechanism visualization structure and meticulousness chemical implementations. This research indicate that latest options desired to be expended for superior identification of tomato vegetations. With the hardware based NN , exclusively 38.9% of tomato cotyledons, 37.5%

of tomato real leaves, and 85.7% of grasses were properly known. Within the development by rule et al. (1998), artificial neural-networks were utilized for figures detection of yields and grasses in an extremely wide pasture of maize. The Artificial Neural Network form was qualified to tell apart among maize vegetation and grasses by exploitation the color directory from every picture element. The outcomes figure out the possibility of Artificial Neural Network for correct and quick image process and detection. The accurateness ratio of pictures identification was as elevated as 89 to 99% for maize and 62 really low interval. several weed species have scarlet stems, however stems of wheat and Glycine max bean area unit inexperienced. These color options were utilized by El- Faki et al. (2000) in a very study to ascertain an easy weed detection technique employing a color machine-vision system. This technique was additional sensible than texture- or shape-based strategies owing to its low sensitivity to cover overlap, leaf orientation, camera focusing, and wind result. A applied mathematics classifier was created supported discriminate analysis (DA) mathematics prosecutor classifier was additional correct than the neural-network classifiers in classification accuracy. the smallest amount squares means that of the classification.

rates exploitation the prosecutor classifiers for Glycine max bean and wheat were 549 and 622%, severally.

The misclassification rates for many weed species were below third color co- occurrence technique (CCM) texture statistics were utilized by Burks et al. (2000a) as input variables for associate degree BP NN wild plants categorization form. The research rated and coaching parameter choice. additionally, coaching cycle necessities and coaching repea tability were studied. the simplest symmetrical BP network achieved a ninety four. 7%

grouping exactness for a model comprising of eleven sources of info, 5 hubs at everything about 2 shrouded layers and 6 yield hubs. A decreased topology out-played out all unique

(23)

10

BP topologies with partner degree general precision of ninety six.7% and individual class exactnesses of 900% or higher. Moshou et al. (2002) anticipated a substitution neural- organize plan. The self-arranging map (SOM) neural system was used in a managed strategy for order of yields and entirely unexpected sorts of weeds misuse ghostly coefficient of reflection estimations. The characterization execution of the anticipated procedure was confirm better analyzed than various neural classifiers.

Examine on the abuse of minutes for question portrayal in every invariant and non- invariant undertakings has gotten significant consideration as of late. an extensive amount of work has been done on common shape based for the most part plant order and acknowledgment. Wu et al. [1], extricated twelve generally utilized computerized morphological alternatives that were distinctive composed into five key factors abuse PCA.

They utilized 1800 leaves to arrange thirty two sorts of plants utilizing a probabilistic neural system framework. Wang et al., utilized centrically form separate (CCD) bend, flightiness and point code bar graph (ACH). Fu et al. [3] conjointly utilized driven form separate bend to speak to leaf shapes inside which relate degree coordinated approach for partner degree cosmology based leaf course of action is anticipated. For the leaf shape grouping, a scaled Recognition of plants by Leaf Image abuse Moment Invariant and Texture Analysis CCD code framework is anticipated to reason the fundamental frame and edge kind of a leaf by misuse the comparative scientific categorization rule embraced by the botanists. At that point a prepared neural system is used to recognize the cautious tooth designs.

The CCD framework takes a plant picture as information and finds the coordinating plant from a plant picture data and is intended to create clients a simple system to discover data with respect to their plants. With a greater information, the framework can be used by scholars, as a clear on account of get to plant databases. Max-stream min-cut instrument is utilized on the grounds that the picture division strategy to isolate up the plant from the foundation of the picture, so on concentrate the last structure of the plant. very surprising shading, surface and frame choices extricated from the separated plant district zone unit used in coordinative pictures to the information. Shading and surface examination zone

(24)

11

unit upheld commonly utilized alternatives, particularly shading histograms in a few shading zones, shading co-event networks and surface maps. With respect to frame, some new descriptors zone unit acquainted with catch the external form alternatives of a plant.

while shading is staggeringly useful in a few CBIR (content based generally picture recovery) issues, amid this uncommon disadvantage, it presents a few difficulties correspondingly, since a few plants basically shift inside the particular tint of the unpracticed shading. Comes about demonstrate that for fifty four of the inquiries, the correct plant picture is recovered among the main 15 comes about, utilizing an information of plants from fluctuated plant sorts. In addition, the tests {are additionally territory unitary} do on a perfect data inside which all the plant pictures have smooth frame descriptors and are among the photos. The investigate comes about no heritable misuse this spotless information raised the main 15 recovery opportunity to sixty eight. The picture sweetening procedure will manufacture questions inside the supply picture clear.

In view of the very surprising structures and sizes of picture squares of leaves, they might be isolated and extricated from sources. At that point, by misuse picture examination devices from ''MATLAB'', these alternatives like sizes, range, borders, robustness, and irregularity can be figured.

At that point, abuse them as information record, create a spiral premise perform Divide the information document into 2 parts. pick one half to practice the system and furthermore the diverse to imagine out the legitimacy of the model.

At long last, input document from various picture structure underneath indistinguishable condition can be usual investigate the model. the planet of plants is amazingly wide than the universes of creatures or winged animals or bugs.

2.2. Machine Learning Algorithm

Engine Knowledge, basically place is the method of assembly a engine, repeatedly study and advance with previous practice. Just, Engine Knowledge has grown a lot of acceptance and is conclusion its way complete extensive zones such as medication, economics, entertaining.

(25)

12

There is so much we can do with it, see "How Google Uses Engine Education And Neural Networks To Improve Figures Middles".

In this study I will debate the mechanisms complicated in resolving a problematic by means of engine education.

2.3. Feature Extraction + Domain Knowledge

Primary and leading we actually essential to comprehend what kind of statistics we are commerce with and what finally we want to get out of it. Basically we necessity to comprehend how and what landscapes need to be removed from the data. For example, undertake we want to size a software that differentiates among male and female. All the designations in text can be between of as our uncooked data while our topographies could be amount of vowels in the designation, distance, primary & last atmosphere, etc of the designation.

2.4. Feature Selection

In many situations we end up with a lot of topographies at our dumping. We force want to choice a subsection of those founded on the capitals and calculation control we have. In this stage we choice a scarce of persons powerful topographies and discreet them from the not-so- influential landscapes. There are many ways to do this, info improvement, improvement relation, connection etc.

2.5. Choice of Algorithm

There are wide range of procedures from which we can select founded on whether we are annoying to do forecast, organization or gathering. We can also select between rectilinear and non-linear procedures. Simple Bayes, Provision Course Tackles, Result Trees, k-Means Gathering are some shared procedures rummage-sale.

(26)

13 2.6. Training

In this stage we song our procedure founded on the information we previously consume. This data is named exercise set as it is used to train our procedure. This is the part where our mechanism or software study and recover with knowledge.

2.7. Choice of Metrics/Evaluation Criteria

Here we choose our assessment standards for our procedure. Fundamentally we originate up with metrics to assess our consequences. Usually used events of presentation are exactness, memory, f1-measure, heftiness, specificity-compassion, mistake degree etc.

2.8. Testing

Finally, we examined how our mechanism knowledge procedure achieves on an hidden set of assessment exercise set is used in step 4 while the trial set is then used in this stage. Methods such as cross authentication and leave-one-out can be secondhand to contract with situations where we do not have enough data. The overhead list of loads, certainly is not thorough and cannot do whole fairness to a comprehensive arena like Engine Education. Level formerly, most of the times a Mechanism Knowledge scheme would include most of the overhead stated loads, if not all.

2.9. Qualifications

In my study I have chosen machine learning algorithm to benefit of its required characteristics which are mentioned in the session above. Therefore i can make my classifier algorithm in an ideal time and accuracy.

(27)

14 2.9.1. Other benefits

Consider using machine learning when you have a complex task or problem involving a large amount of data and lots of variables, but no existing formula or equation.

2.10. Methodology

The history jump procedure about sixty years. When Alan Turing formed the ‘Turing test’ to control whether a processor had real cunning. It can be contended, though, that the past two epochs have seen the biggest leaps and limits in terms of fees in language technology. But I’m receiving gaining of myself here.

As a human, and as a technology user, you realize convinced tasks that involve you to make a verdict or categorize something. At the moment, when you read your email inbox, you choose to spot that email as junk. How would a processor know what to do? Engine knowledge is an procedures that impart processers to achieve errands that human existences do certainly on a everyday basis.

The primary efforts at reproduction intellect complicated education a processer by inscription a rule. In case if we required to impart a processor how to brand references founded on the climate, then we consume to inscribe an procedure that said: IF the climate is cloudy AND the chance of precipitation is better than 50%, THEN recommend not to go have BBQ outside.

The problematic with this method used in outmoded skilled schemes, though, is that we don’t know how much sureness to home on the instruction. Is it right 50% of the time? Additional?

Fewer?

Then engine knowledge has changed to reproduce the pattern-matching that our brains can do.

Today, engine knowledge processes explain processors to know and categorize features of an things. In these copies, for samples, a processor is exposed some kind of shrubberies and it can be categorize them as maize and non maize shrubberies . The processor then uses that data to organize the numerous features of diverse shrubberies, constructing upon new data every time.

(28)

15

initial, a processor should categorize an maize herbal by mentioning to its form, and shape a perfect that conditions that if somewhat has the same form as it, it’s an its verdures form. Then later, when additional kinds are presented, the processer studies to organize them respect full to their colors to. Then or instance a tomato is presented, and it has to know that isn’t a vegetable generous at all. The processor should recurrently adapt its perfect created on new data and allocate a prognostic rate to each typical, representative the gradation of sureness that an item is one object over alternative. For sample, yellow is a more projecting value for a banana than red is for an apple.

For this reason Processor Image and Engine knowledge has increase generally in precise world, with claims in search, picture empathetic and organization, apps, medication, murmurs, and automatic-driving vehicles. Essential to various of these submissions are painterly acknowledgement responsibilities such as pictures organization, localization and acknowledgement. Latest expansions in neural network ( “engine knowledge”) methods have enormously progressive the presentation of these state-of-the-art graphic acknowledgement schemes.

picture organization founded on reproduction intellect ,which usages image processing apparatuses providing by MATLAB to recognize trees in an perfect and quick responsive time and tall precision, is the foremost goal to use engine knowledge neural network. In adding Engine Knowledge procedure offer us the essential suppleness to take the greatest mixture of Landscapes and Classifiers, and even, we can have superior consequences with negligible information.

The understated channel from a typical bottomless neural network exercise of the perception masses complete back propagation in the sheet astute avaricious technique was the most important development in engine Knowledge in the last 15 years.

that what we are raising as "Engine Knowledge" but might also mentioned as convolution Neural Networks by some peoples.

(29)

16 CHAPTER 3

MACHINE LEARNING TECHNIQUES

Machine learning uses 2 kinds of methodologies: supervised learning, that exercises a form on noted input and output information so it will calculate future outputs, and unsupervised learning, that finds hidden patterns or inherent temples in input file.

Figure 3.1: Machine learning techniques

(30)

17 3.1. Supervised learning

Supervised machine learning construct a model that produces predictions supported proof within the presence of uncertainty. A supervised learning rule have a celebrated set of knowledge input file computer file} and celebrated responses to the info (output) and trains a model to come up with cheap predictions for the reply to new data. Use supervised learning if you have got celebrated data for the output when you're created an effort to predict.

Supervised learning utilize classification and deterioration techniques to expand prognostic models.

3.1.1. Classification techniques

Predict distinct responses for example, whether or not an email is real or spam , or whether or not a growth is incurable or caring. Classification models categorize input file into classes.

Typical applications exemplify medical imaging, voice recognition, and credit rating.

Use classification if your knowledge are often labeled , categorized, or divided into specific teams or categories. for instance, applications for hand-writing identification use classification to acknowledge letters and numbers.

In image procedure and laptop vision, unsupervised pattern detection methodology are used for object recognition and image segmentation.

General methodology for acting categorization embody support vector machine (SVM), supported and bagged call trees, k-nearest neighbor, Naïve Thomas Bayes, discriminate analysis, supplying regression, and neural networks.

(31)

18 3.1.2. Regression techniques

Forecast constant responses ,example: changes in hotness or fluctuations in power require.

Typical applications symbolize electricity load statement and algorithmic commercialism.

Utilize deterioration techniques if you're working with a knowledge vary or if the character of your answer could be a imaginary number, similar to temperature or the time till breakdown for a chunk of kit.

General regression methodology embody linear form, nonlinear form, regularization, stepwise regression, boosted and bagged call trees, neural networks, and adaptation neuro-fuzzy learning

3.1.3. Steps in supervised learning

Whereas there are lots of information and Machine Learning Toolbox methodology for supervised learning, mainly apply the similar fundamental workflow for getting a forecaster model. (Detailed directive on the steps for assembly learning is in Framework for assembly Learning) The steps for supervised learning are:

 Set Data

 Select a methodology

 Fit a sample

 Select an effective Method

 Trained adjust and Update Until Satisfied

 Use Fitted Model for prognosis

(32)

19 3.1.4. Set data all supervised learning

Strategies begin with Associate in computer file matrix, typically known as X here.

Every row of X represents one observation. every column of X represents one variable, or predictor. Represent lost entries with NaN values in X.

Statistics and Machine Learning chest supervised learning algorithms will handle NaN values, both by neglecting them or by neglecting any row with a NaN worth.

3.1.5. Select a methodology

There are tradeoffs among several characteristics of algorithms, for example:

 Rapidity of training

 Memory utilization

 Predictive accuracy on new statistics

 Clearness or interpretability, meaning how simply you are able to realize the reasons 3.1.6. Fit a sample

The appropriate function you use rely on the methodology you choose.

 Categorization Trees

 Deterioration Trees

 Discriminate Analysis (classification)

 K-Nearest Neighbors (KNN classification)

 Support Vector Machines (SVM) used for classification

 SVM for deterioration

 Multiclass pattern for SVM or other classifiers

 Classification or regression ensembles

(33)

20 3.1.7. Select a valuable method

The three major methods to check the accuracy of the performing fitted form are:

 Scan the reconstitution error. For examples, see:

 Classification Tree Reconstitution Error

 Cross Validate a Regression Tree

 Train Ensemble Quality , Example: Reconstitution Error of a Discriminate Analysis

 Test the cross-validation error. For examples, see: Cross support a Regression Tree

 Test Ensemble Quality

 Classification with Many conclusive Levels

 Cross supporting a Discriminate Analysis Classifier

 Try a various algorithm. For viable choices, see:

 Features of Classification methodology

 Select an usable Ensemble methodology

When satisfy with a form of some sorts, you'll be able to neat it victimization the suitable compact operate (compact for categorization trees, compact for deterioration trees, compact for discriminate analysis, compact for naive Thomas Bayes, compact for SVM, compact for ECOC models, compact for categorization ensembles, and compact for deterioration ensembles).

Compact eliminate coaching knowledge and different features not needed for forecast, e.g., pruning data for call trees, from the replica to scale back memory consumption. as a result of KNN classification models need all of the coaching knowledge to predict labels, you can not cut back the scale of a Classification KNN model.

(34)

21 3.2. Features of Classification Methodology

This table display representative characters of the diverse supervised learning methodology.

The features in some special situation is able to vary beginning from the programmed ones.

Using this table like a directory for your first choice of method. Choose on the exchange you desire in velocity, memory handling, suppleness, and interpretability.

Table 3.1: Characteristics of classification algorithms

Classifier Multi-type Support

Conclusive Support

Progenies velocity

Used Memory

Interpretability

Resolution- trees fit tree

Yes Yes Quick Small Easy

Discriminate analysis fit discr

Yes No Quick Small for

linear, large

Easy

SVM fit svm

No.

Merge many binary SVM

Yes Average for linear.

Average for linear.

Easy for linear SVM.

Nearest neighbor fit knn

Yes Yes Slow for

cubic.

Average Hard

Ensembles fit ensemble

Yes Yes Quick to

average

Low to high based on

Hard

(35)

22

The outcome in this table are grounded on an psychoanalysis of various data sets. This data sets in the study have up to seven thousand explanation, eighty predictors, and fifty classes.

This list determines the conditions in the table.

Speed:

 Quick — 0.01 second

 Average — 1 second

 Slow — 100 seconds Memory:

 Small — 1MB

 Average — 4MB

 Large — 100MB

Note: The table supply a common guide. Your outcome based on your data and the velocity of your machine

3.3. Categorical Predictor Support

This table precise the data-type support of predictors for every classifier.

(36)

23 3.4. Unsupervised Learning

Unsupervised learning discovers hidden patterns or inherent temple in information. it's wont to draw inferences from datasets composed of input file while not labeled restraint.

(37)

24 3.4.1. Clustering

It's used for exploratory information analysis to search out hidden patterns or groupings in information. Applications for cluster analysis embody factor sequence analysis, marketing research, and seeing. As an exemplar, if a cellular phone band needs optimize the locations wherever they build cellular phone towers, they will use machine learning to approximate the quantity of clusters of individuals hoping on their towers.

A phone will solely discuss with one tower at a time, therefore the team uses agglomeration algorithms to style the most effective placement of cell towers to optimize signal reception for teams, or clusters, of their customers.

General algorithms for playing agglomeration embody k-means and k-medoids, graded agglomeration, Gaussian mixture models, hidden Mark off models, self-organizing maps, fuzzy c-means agglomeration, and subtractive agglomeration.

Figure 3.2: Clustering

(38)

25

3.4.2. How does one decide which machine learning rule to use?

Selecting the correct rule will appear irresistible there are square measure dozens and potion of supervised and unattended machine learning algorithms, and every takes a distinct approach to learning.

There is no best technique or one size fits all. Finding the right and the best which is most correct rule is partially simply trial Associate in Nursing error even extremely full and complete fledged information scientists can’t tell whether or not an rule can work while not making an attempt it out. However rule choice additionally depends on the dimensions and kind of knowledge you’re operating with, the insights you wish to urge from the information, and the way those insights are going to be utilize.

Figure 3.3: Machine learning techniques

(39)

26

Here are some tips on choosingbetween supervised and unsupervised machine learning:

1- Choose supervised learning if you wish to coach a model to form a prediction--for example, the long run price of a continual variable, like temperature or a stock value, or a classification—for example, determine makes of cars from digital camera video footage.

2- Choose unsupervised learning if you wish to explore your information and require to coach a model to search out a decent cerebral illustration, like ripping information up into clusters.

(40)

27 CHAPTRE 4

MACHINE LEARNING WITH MATLAB

How can you attach the authority of engine Knowledge to use information to made the recovering choices? MATLAB creates engine le Knowledge buildup simple. With tackles and purposes for conduct bigger information, as well as applications to make engine Knowledge available, MATLAB is an perfect atmosphere for smearing engine Knowledge to your information analytics.

With MATLAB, engineers and information scientists have instant admission to prebuilt purposes, wades toolboxes, and particular applications for organization, reversion, and grouping.

MATLAB lets you:

 Associate methods such as logistic reversion, organization plants, provision course engines, collective devices, and deep Knowledge .

 Usage classical modification and discount methods to make an precise classical that finest detentions the prognostic control of your information.

 Assimilate engine Knowledge copies in initiative schemes, bunches, and smokes, and board replicas to real-time entrenched hardware.

 Achieve instinctive encryption group for entrenched device analytics.

 Provision combined workflows from information analytics to placement

(41)

28 4.1. Materials and Methods

4.1.1. Image database

In the experimentation, diverse kind of shrubberies images are used in two datasets primary is maize herbal and the next is varied with more than twenty kinds of shrubberies overriding wildflower. The example pictures were developed in the pitches. The image database includes 200 paint picture divisions of maize vegetal usually shown in Mediterranean republics with 50 examples diverse kinds of shrubberies. Imageries were gotten at diverse periods of a day.

In accumulation, shrubberies are with variable awning extent were designated to upsurge the trouble of the organization problematic. And they are mark from diverse position of sight.

Figure 4.1: Maize plant

Figure 4.2: Non maize plants

(42)

29 4.1.2. Plants classification

For categorizing diverse herbal pictures, several engine Knowledge methods such as pattern corresponding, Bayesian classifier, k-nearest neighbors procedure (k-NN)or provision course mechanism ( SVM) can be secondhand. In our education, both KNN and care direction engine were castoff for the organization duty.

4.1.3. Template matching

Through the exercise stage, histograms of working out examples programmed pictures of the similar lesson are around to produce the pattern classical for that specific lesson. By consuming this technique, two pattern histograms were designed to typical the broadleaf and grassland pictures. The difference among the model and the pattern histograms is a examination of goodness-of-fit that can be restrained using a non-parametric measurement examination, such as chi-square measurement and log-likelihood relation. After scheming the difference worth for individually seminar, the trying example is allocated to the seminar with the minimum variation rate. In our education, chi-square measurement is used to portion the difference value.

4.1.4. Support vector machine

SVM is a state-of-the-art engine Knowledge method founded on the current arithmetic education concept. It has been magnificently useful in diverse classification difficulties. SVM achieves the organization by building a hyperactive smooth in such a way that the unraveling border among optimistic and undesirable samples is optimal. This extrication restless smooth then jobs as the choice superficial.

Here, αi are Lagrange multipliers of twin optimization problematic, b is a verge limit, and K is a grain purpose. The overexcited smooth exploits the unraveling boundary with admiration to the exercise models with αi > 0, which are named the provision directions. SVM kinds binary choices. To realize multi-class organization, the shared method is to accept the one-against- rest or numerous two-class difficulties. In our education, we castoff the one-against-rest tactic

(43)

30

with two diverse seed, specifically polynomial fruit and Radial-Basis Purpose (RBF) kernel. A grid-search is approved out for choosing appropriate kernel limit principles.

4.1.5. K-nearest neighbors algorithm (k-nn)

KNN is an non parametric sluggish knowledge procedure. That is a attractive succinct report.

When you say a method is non parametric , it income that it does not kind any expectations on the fundamental information distribution. This is attractive valuable , as in the actual world , most of the applied information does not submit the distinctive theoretic expectations complete (eg Gaussian mixtures, linearly separable etc) . Non parametric procedures comparable KNN originate to the release here.

It is also a indolent procedure. What this income is that it does not use the exercise information ideas to do any simplification. In extra differences, there is no overt exercise point or it is very nominal. This income the exercise stage is attractive profligate. Lack of simplification income that KNN preserves all the exercise information. Additional accurately, all the exercise data is desirable throughout the challenging stage. This is in difference to supplementary methods like SVM where you can remove all non-provision courses without any problematic. Most of the lazy procedures – especially KNN – brands choice founded on the complete exercise information set (in the best case a subset of them).

The contrast is attractive understandable here – There is a nonexistent or negligible exercise stage but a expensive challenging stage. The price is in rapports of together period and retention. More period must be desirable as in the poorest situation, all information arguments must takes argument in decision More retention is wanted as we want to stock all training information.

4.1.6. Knn for classification

Let’s realize how to custom KNN for arrangement In this situation, we are specified some information arguments for exercise and also a novel unlabelled information for challenging.

Our goal is to discovery the session sticker for the last argument. The procedure has diverse performance created on k.

(44)

31 Case 1 : k = 1 or Nearest Neighbor Rule

This is the simplest state. Let x be the quarrel to be registered. Determine the quarrel closed to x . Occupancy it be y. Now together neighbor regulation requirements to assign the indicator of y to x. This seems too unworldly and occasionally more safety natural. If you feel that this method will outcome a massive error, you are exact – but everywhere is a fastening. This sensitive grasp separate when the amount of the data arguments isn’t large.

If the quantity of data influences is real large, then around is a big coincidental that label of x and y are parallel. An example must care – Let’s judge you receipts a (potentially) limited coinage. You pitch it for 1 billion period and you take skull 900,000,000 periods. Then furthermost probable your following call should be skull. We can custom a parallel dispute here. Let me effort an relaxed dispute here - Adopt all arguments are in a D dimensional flat . The amount of arguments is practically big. This revenue that the thickness of the flat at any argument is equally tall. In additional arguments, indoors any subspace there is tolerable amount of ideas. Reflect a argument x in the subspace which also has a lot of neighbors.

Today let y be the adjacent neighbor. If x and y are adequately nearby, then we can adopt that possibility that x and y belong to same session is equally similar – Then by choice concept, x and y have the similar session.

Where is the Bayes mistake rate, c is the amount of classes and P is the mistake proportion of Adjacent Neighbor. The effect is certainly very arresting (at least to me) cause it speaks that if the amount of arguments is equally big then the mistake degree of adjacent Neighbor is fewer that double the Bayes mistakes degree. Cool for a humble procedure like KNN. Do recite the book for all the spicy particulars.

Case 2 : k = K or k-Nearest Neighbor Rule

This is a frank postponement of 1NN. Essentially what we do is that we stab to discover the k adjacent neighbor and do a popular elective. Classically k is odd when the amount of programs is 2. Let’s say k = 5 and there are 3 examples of C1 and 2 examples of C2. In this case, KNN says that new argument must branded as C1 as it methods the mainstream. We track a parallel

(45)

32

dispute when there are numerous modules. One of the traditional advancing allowance is not to give 1 vote to all the neighbors. A very mutual thing to do is biased KNN where each point has a mass which is classically designed using its coldness. For eg below opposite coldness allowance, every argument has a mass equivalent to the opposite of its coldness to the argument to be confidential. This incomes that adjacent opinions have a advanced election than the beyond arguments. It is quite obvious that the accuracy *might* increase when you increase k but the computation cost also increases.

4.1.7. When do we use knn algorithm?

KNN can be used for mutually organization and reversion extrapolative difficulties. Though, it is more commonly used in organization difficulties in the manufacturing. To estimate any method, we mostly look at 3 significant characteristics:

1. Comfort to understand output 2. Control time

3. Extrapolative Power

Let’s income a few samples to replace KNN in the scale :

Table 4.1: KNN example

(46)

33

KNN procedure carnivals crossways all limits of contemplations. It is usually used for its simple of explanation and little control period.

Let’s income a modest situation to comprehend this procedure. Resulting is a feast of red rings (RC) and green rectangles (GS) :

Figure 4.3: KNN samples(1)

You mean to discover out the session of the blue stellar (BS) . BS can also be RC or GS and unknown. The “K” is KNN procedure is the adjacent neighbors we wish to gross election from. Let’s say K = 3. Hence, we will now create a ring with BS as midpoint just as large as to encircle just three information arguments on the smooth. Denote to subsequent drawing for more details:

(47)

34

Figure 4.4: KNN samples(2)

The three neighboring arguments to BS is all RC. Henceforward, with good sureness scale we can say that the BS should belong to the class RC. Here, the optimal converted very understandable as all three elections from the neighboring neighbor went to RC. The excellent of the limit K is very critical in this procedure. Next, we will comprehend what are the issues to be measured to accomplish the greatest K.

Primary let us try to comprehend what accurately does K effect in the procedure. If we see the last sample, assumed that all the 6 drill comment endure continuous, with a specified K value we can create limitations of every session. These limitations will separate RC from GS. The same method, let’s try to see the consequence of rate “K” on the class limitations. Succeeding are the diverse limitations extrication the two modules with diverse principles of K.

(48)

35

Figure 4.5: K factor samples

As shown the limit converts flatter with growing rate of K. With K growing to endlessness it lastly develops all blue or all red dependent on the whole mainstream. The exercise mistake amount and the authentication mistake degree are two limits we need to entree on diverse K- value. Succeeding is the curvature for the drill mistake degree with changeable rate of K .

(49)

36

Figure 4.6: K factor histogram

As shown, the mistake amount at K=1 is continuously zero for the exercise example. This is for the nearby argument to any exercise information argument is itself. Later the forecast is continuously precise with K=1. If authentication mistake curvature would have been parallel, our optimal of K would have been 1. Succeeding is the validation mistake curve with variable rate of K.

(50)

37

Figure 4.7: K factor histogram

End Note: KNN procedure is one of the humblest organization procedure. Smooth with such straightforwardness, it can give extremely inexpensive marks. KNN procedure can also be used for regression difficulties. The only alteration from the deliberated organization will be using medians of closest neighbors somewhat than elective from closest neighbors. KNN can be oblique in a lone line on R. I am yet to discover how can we use KNN procedure on SAS.

(51)

38 CHAPTRE 5

EXPERIMENTAL RESULTS

In instructions to converge on the key formation of the program, the MATLAB application, the database restore and fixed advantage invention, we will take subseries of the construct functions available in MATLAB for Digital Image Processing. The final program as visible in Figure 2.1 save a division algorithm. In addendum, such as mostly of the image detection programs, a database of huckleberry or paper picture has to be done, in addition to knowledge way to elicitation the advantages for the database, and a different technique to get back the top competition from the database.

Put in data training: Once the mark removal was total, two documents were gained. They were: (1) Traineeship texture feature data and (2) Test texture feature data ranking using Support Vector Machine rooted in Linear classifier : A software monotone was written in MATLAB that would obtain in jalousie documents represent the exercise and check data, workouts the classifier using the train documents and then utilize the trial folder to do the arrangement task on the trial data.

As a result, a MATLAB monotone would loading all the data folders (coaching and experience data files) and make amendment to the data depend on to the suggest sample selected.

(52)

39

Figure 5.1: Main stages of the system

5.1. MATLAB Work 5.1.1. Load image data

Firstly " imageDatastore" function was used to automatically read all the given images.

5.1.2. Display class names and counts

Then" countEachLabel" function is used in order to count and label each database as shown in the figure below

(53)

40

Figure 5.2: Display class names and counts

5.1.3. Display sampling of image data

As a third step I have tried to display sampling of maize images using " montage" function as shown in the figure below:

Figure 5.3: Display sampling of image data

(54)

41

5.1.4. Images separation into a training set and test set

Now is the turn of Pre-process Training Data: Feature Extraction using Bag of features, also known as bag of visual words is one way to extract features from images.

To represent an image using this approach, an image can be treated as a document and occurrence of visual "words" in images are used to generate a histogram that represents an image.

For bagOfFeatures extraction still requires an imageSet object to run. This is on the roadmap to change in the future, but for now, we need to convert this to an imageSet object,so we split our images into a training-set and test-set using this separet function :

function [tr_set,test_set] = prepareInputFiles(dsObj) image_location = fileparts(dsObj.Files{1});

imset = imageSet(strcat(image_location,'\..'),'recursive');

[tr_set,test_set] = imset.partition(15);

test_set = test_set.partition(10);

end

in order to call it later on my software.

5.1.5. Create visual vocabulary

To extract features from the given images i have used "bagOfFeatures" function .These features are used as inputs to the SVM and K-NN classifiers to train them and classify them later. MATLAB offered a classification application which give us a good opportunity to try as many as we want of different types of Classifier.