• Sonuç bulunamadı

An efficient query optimization strategy for spatio-temporal queries in video databases

N/A
N/A
Protected

Academic year: 2021

Share "An efficient query optimization strategy for spatio-temporal queries in video databases"

Copied!
66
0
0

Yükleniyor.... (view fulltext now)

Tam metin

(1)

STRATEGY FOR SPATIO-TEMPORAL

QUERIES IN VIDEO DATABASES

a thesis

submitted to the department of computer engineering

and the institute of engineering and science

of bilkent university

in partial fulfillment of the requirements

for the degree of

master of science By Gulay  Unel July, 2002

(2)

inscope and in quality, as athesis for the degree of Master of Science.

Assoc. Prof. Dr. 

Ozgur Ulusoy(Supervisor)

IcertifythatIhavereadthisthesisandthatinmyopinionitisfullyadequate,

inscope and in quality, as athesis for the degree of Master of Science.

Assist. Prof. Dr. AttilaGursoy

IcertifythatIhavereadthis thesisandthatinmyopinionitisfullyadequate,

inscope and in quality, as athesis for the degree of Master of Science.

Assist. Prof. Dr. _

IbrahimKorpeoglu

Approved for the Institute of Engineering and Science:

Prof. Dr. MehmetB. Baray

(3)

AN EFFICIENT QUERY OPTIMIZATION STRATEGY

FOR SPATIO-TEMPORAL QUERIES IN VIDEO

DATABASES

Gulay 

Unel

M.S. inComputer Engineering

Supervisors: Assoc. Prof. Dr. 

OzgurUlusoy and

Assist. Prof. Dr. Ugur Gudukbay

July, 2002

The interest for multimedia database management systems has grown rapidly

due tothe needfor the storageof huge volumes of multimediadata incomputer

systems. An important building block of a multimedia database system is the

queryprocessor,andaqueryoptimizerembeddedtothequeryprocessorisneeded

to answer user queries eÆciently. Query optimization problemis widely studied

forconventionaldatabase systems,howeveritisanewresearcharea for

multime-diadatabasesystems. Duetothedi erencesinqueryprocessingstrategies,query

optimization techniques used in multimedia database systems are di erent from

thoseusedintraditionaldatabases. Inthisthesis, queryoptimizationproblemin

videodatabase systemsisoutlinedandaqueryoptimizationstrategy isproposed

as a solution to this problem. Reordering algorithms, to be applied on query

execution tree, are also described. Finally, the performance results obtained by

testingthe proposed algorithmsare presented.

Keywords: video databases, query optimization, query tree, querying of video

(4)

OZET V _ IDEO VER _ ITABANLARINDA YERLES _ IM-ZAMAN SORGULARI _ IC _ IN ETK _ IL _ I B _ IR SORGU OPT _ IM _

IZASYON STRATEJ

_ IS _ I Gulay  Unel

BilgisayarMuhendisligi,Yuksek Lisans

Tez Yoneticileri: Doc. Dr. 

Ozgur Ulusoy and

Yrd. Doc. Dr. Ugur Gudukbay

Temmuz, 2002

Multimedya veritaban yonetim sistemlerine olan ilgi buyuk hacimlerde

multimedyaverilerinisaklamaihtiyacndandolayhzlaartmstr. Sorguislemcisi,

bir multimedya veritaban sisteminin onemli yap taslarndan biridir ve

sorgu-lar verimli bir sekilde yantlayabilmek icin sorgu islemcisine yerlestirilmis bir

sorgu eniyileyicisine ihtiyac vardr. Sorgu optimizasyonu problemi

konvansi-yonel veritabanlar icin kapsaml olarak arastrlms olup, multimedya

verita-ban sistemleri icin yeni bir arastrma alandr. Sorgu isleme stratejilerindeki

farkllklardandolay multimedyaveritaban sistemlerindekullanlan sorgu

opti-mizasyon teknikleri, geleneksel veritabanlarnda kullanlanlardan farkldr. Bu

tezde, video veritaban sistemlerindeki sorgu optimizasyon problemi ana

hat-larylaelealnmsvebu probleme cozumolarakbirsorgu optimizasyonstratejisi



onerilmistir. Ayrca, sorgu calsma agacna uygulanacak sralama algoritmalar

tanmlanmstr. Son olarak, onerilen algoritmalarn test edilmesi sonucu elde

edilmisolan performanssonuclar sunulmustur.

Anahtar sozc ukler: video veritabanlar, sorgu optimizasyonu, sorgu agac, video

(5)

I would like to express my special thanks and gratitude to my supervisors

Assoc. Prof. Dr. 

Ozgur Ulusoy and Assist. Prof. Dr. Ugur Gudukbay for their

concern in the supervision of the thesis.

I would like to express my gratitude to Assist. Prof. Dr. Attila Gursoy and

Assist. Prof. Dr. _

Ibrahim Korpeoglufortheir interest tothe subject matterand

spending their timefor reading and reviewingthe thesis.

I would like to acknowledge the support of Turkish Scienti c and Technical

Research Council(T 

UB _

ITAK).

I would like toexpress my special thanks toMehmet Emin Donderler for his

support and patience in allstages of the thesis research.

I thank to my spouse Cuneyt, my brother Semih, my mother and father for

their support.

Finally,Iwouldliketoexpressmyspecialthanksandgratitudetomymanager

(6)

1 Introduction 1

1.1 Organizationof the Thesis . . . 3

2 Related Work 4 3 BilVideo: A Video DBMS 8 3.1 Video Database System Architecture . . . 8

3.2 Video Query Language . . . 10

3.3 Query Types . . . 11

3.4 Query Processing . . . 12

4 Query Optimization 15 4.1 Structure of the Query Tree . . . 16

4.2 Internal Node Reordering Algorithm . . . 17

4.2.1 Examples . . . 19

(7)

4.3.1 Examples . . . 27

5 Performance Results 31

5.1 Fact Base Statistics . . . 32

5.2 Performance Results . . . 32

5.3 Examples . . . 36

6 Conclusions and Future Work 39

References 41

Appendices 44

A Sample Fact Base for an Example Video 44

(8)

3.1 BilVideodatabase system architecture. . . 9

3.2 Web client - queryprocessor interaction. . . 13

3.3 Query processingphases. . . 13

4.1 Query optimizationprocess . . . 15

4.2 Internal node reordering algorithm . . . 18

4.3 (a) Initial query tree for Query 1 and (b) Query tree for Query 1 afterinternal node reordering . . . 20

4.4 (a) Initial query tree for Query 2 and (b) Query tree for Query 2 afterinternal node reordering . . . 20

4.5 Leafnode reordering algorithm . . . 22

4.6 The functionthat ndssubquery tree of a leaf node . . . 22

4.7 The functionthat reorders the located subquery tree . . . 24

4.8 The functionthat ndsif thereis a `NOT-OR'type node in atree 25 4.9 The functionthat ordersleaf nodes . . . 25

(9)

4.11 The functionthat sorts leafnodes . . . 27

4.12 The functionthat puts the elements tothe leaf nodes . . . 28

4.13 (a) Initial subquery tree for Query 1 and (b) Subquery tree for

Query 1after leaf node reordering . . . 29

5.1 (a) Initial query tree for Query 1 and (b) Query tree for Query 1

afteroptimization . . . 37

5.2 (a) Initial query tree for Query 2 and (b) Query tree for Query 2

(10)

5.1 The statisticsof the fact base . . . 32

5.2 Leafnode reorderalgorithmtest results (msecs) . . . 33

5.3 Query optimizationalgorithmtest results (msecs) . . . 34

5.4 Convergence to the optimal querytree; rst test results(msecs) . 35

5.5 Convergence to the optimal querytree; second test results (msecs) 35

(11)

Introduction

The interest for multimedia database systems has grown rapidly with the

ad-vances in computer technology. The research on content-based image retrieval

by visual features (color, shape and texture) and keywords [4, 5] has progressed

in time towards videodatabases dealing with spatio-temporaland semantic

fea-tures of video data. First, the techniques devised for image retrieval were used

forsupportingcontent-basedvideoretrieval. Thesetechniquesassumedthevideo

as a consecutive sequence of images ordered in time. Some video database

sys-tems such as VideoQ, KMED, QBIC and OVID [6, 7, 5, 8] were implemented.

Queryingvideoobjectsbymotionpropertieshasalsobeenstudied[16,17,18,19].

Buildingblocksfor multimediadatabase systems are multimediadatamodel,

multimedia storage management, query interface, and query processing and

retrieval. Data models used in multimedia Database Management Systems

(DBMSs)are di erentfromthoseusedinconventionalDBMSs, sonew modeling

techniques are required to represent the semantics of multimedia data. Besides,

a multimedia storage manager is needed and storage devices capable of storing

large volumes of data must be supported to achieve better performance. Query

interface in a multimedia database system must enable the user to construct

well-de nedqueries easily. Query processingandretrievalisalsoimportantsince

providingpowerfulquerying facilitiesonmultimediadata is a very crucialissue.

(12)

exact queries on conventional types of data but querying multimedia databases

requires additionaltechniques tosupport multimediadata types, likeimage,

au-dioand video. Extensions tothe conventional querylanguages are requiredthat

takeintoaccountoftheparticularcharacteristicsofmultimediadata. Inaddition

tothese, di erent queryoptimizationtechniques are required tobeimplemented

and integrated to the system.

Success of a database system depends on the e ectiveness of the query

op-timization module of the system. The input to this module is some internal

representation ofa querygiven by theuser. This representation isthe querytree

inourcase. The aimof queryoptimizationistoselectthe mosteÆcientstrategy

toaccesstherelevantdataandanswerthequery. LetSbetheset ofallstrategies

(querytrees) that can be used toanswera given query. Eachmembers of S has

acostc(s). The goalof anyoptimizationalgorithmisto nd amemberofSthat

has the minimum cost.

Query optimization has been a challenging research area starting from the

beginning of the relational database management systems. A summary of the

research e orts on query optimization and other related concepts in database

systems can be found in [10].

In this thesis, we study the query optimization problem in multimedia

database systems. Our work concentrates on reordering of query trees in

pro-cessing queries in a multimedia database system to achieve the minimum cost.

Weproposealgorithmsused forreorderingquerytrees. Thegoal ofthe

optimiza-tion algorithms is to change the order of processing subqueries contained in the

query tree in order to execute the parts that are more selective (i.e., result in

fewer frames and/or objects) rst. The query optimizationmodule contains two

typesofreorderings forquerytrees toensuremoreeÆcient processingofqueries.

The rst type is internal node reordering, which reconstructs the query tree by

reorderingthechildrenofinternalnodes. Thesecondtypeisleaf node reordering,

which restructures the query contents of the leaf nodes of the query tree. The

query optimizationalgorithmsare implemented asa part of the queryprocessor

(13)

The work done in this thesis constitutes a part of a video database system,

BilVideo, developed by Donderler et al. [1, 2, 3]. In this system, a rule-based

spatio-temporalmodelfor videos and avideoquery processor,whichcan answer

spatial,temporal,trajectory,motionand object queriesfor videos, are proposed.

The work done inthis thesis is integrated intothe query processor of BilVideo.

1.1 Organization of the Thesis

The remainder of the thesis is organized as follows. In Chapter 2, related work

onmultimediaquery optimizationis discussed. The videodatabase system, into

which query optimization module is integrated, is described in Chapter 3. In

Chapter4,ourqueryoptimizationalgorithmsarepresented. Performanceresults

arediscussedinChapter5. Conclusionsofourworkandfutureresearchdirections

are given in Chapter 6. Fact base of the example database and the query sets

(14)

Related Work

BasicprinciplesofqueryoptimizationindatabasesystemsareexplainedbyJarke

and Koch [10]. In their paper, awide variety of approaches are proposed to

im-prove theperformance ofqueryprocessing thatincludelogic-based andsemantic

transformations, fast implementation of basic operations, and combinatorial or

heuristic algorithmsfor generating alternative access plans and choosing among

them. These methods are presented in the framework of a general query

evalu-ation procedure using relational calculus representation of queries. In addition

to these methods, nonstandard query optimization issues are also discussed in

the paper. According to Jarke and Koch, the goals of query transformation

are: (1)the construction of astandardized startingpointfor query optimization

(standardization), (2) the eliminationof redundancy(simpli cation),and (3)the

construction of expressions that are improved with respect to evaluation

perfor-mance(amelioration). Thetransformationrulesforthegeneralqueryexpressions

referenced in the paperare also validfor our query expressions.

Chaudhuri [13] focuses primarily onthe optimization of SQL queries in

rela-tionaldatabase systems. Accordingtothe paper, the two key componentsof the

queryevaluationcomponentof anSQL database system are the queryoptimizer

and the queryexecution engine. The paperdiscusses the System-R optimization

framework, search space that is considered by optimizers, cost estimation and

(15)

uses statistical summaries of data that have been stored. It also determines the

statistical summary of the output data stream and estimated cost of executing

the operationgiven anoperatorand thestatisticalsummary foreachofitsinput

data streams. The idea of collectingstatistical summaries for cost estimation is

alsoused in our query optimizationmodule.

The survey of query evaluationtechniques for large databases by Graefe [11]

describes query evaluation techniques for both relational and postrelational

databasesystems,includingiterativeexecutionofcomplexqueryexecutionplans,

the duality of sort- and hash- based set-matching algorithms, types of parallel

query execution and their implementation, and special operators for emerging

database application domains. According to the survey, query optimization is

a special form of planning, and employingtechniques from arti cial intelligence

such as plan representation, search including directed search and pruning,

dy-namicprogramming, branch-and-bound algorithms, etc.

Semanticqueryoptimizationfortreeandchain queriesbySunandYu[9]

pro-videsane ectiveandsystematicapproachforoptimizingqueriesbyappropriately

choosing semanticallyequivalenttransformations. Basically,thereare two

di er-ent types of transformations: transformations by eliminating unnecessary joins,

andtransformationsbyadding/eliminatingredundantbene cial/nonbene cial

se-lection operations (restrictions). An algorithm is proposed by Sun and Yu to

minimizethe numberof joinsintree queries. They claim thatthe important

op-erationsin semanticquery optimizationare the detection of acontradiction, the

eliminationofasmanyunnecessaryjoinsaspossible,andtheaddition/elimination

of bene cial/nonbene cial redundant restrictions.

Alternative plan generation methods for multiple query optimization by

Menekse et al. [12] focus on generating a number of alternative plans in such

a way that the sharing between queries is maximized and an optimal execution

plan with minimalcost is obtained. They state that a globalexecution plan can

beconstructedbychoosingoneplanfor eachqueryandthen mergingtheseplans

together. Twoalgorithmsforalternativeplangenerationhavebeenimplemented,

(16)

alternativeplangeneration isalsoproposedtoeliminateuselessalternativeplans

by introducing asharing factor concept.

The paper by So er and Samet [14] presents optimization methods for

pro-cessing of pictorialqueries speci ed by pictorialquery trees. Their optimization

strategyforcomputingtheresultofthepictorialquerytreeistochangetheorder

ofprocessingindividualqueryimagesinordertoexecutethe partsthat aremore

selective. The selectivity of a pictorial query is based on matching selectivity,

contextual selectivity, and spatial selectivity. Matching and contextual

selectiv-ity are computed based on the statistics stored as histograms in the database

thatindicatethe distributionof classi cationsandcertaintylevelsinthe images.

These histograms are constructed when populatingthe database. Selectivity of

anindividualpictorialquery(leaf)iscomputedbycombiningthesethree

selectiv-ity factors. The querylanguage used intheir system has di erentcharacteristics

fromthequerylanguagethatweused. Theirquerylanguageincludesonlyspatial

relations in the pictorial query tree and they reorder the tree according to the

statistics stored for these spatial relations. Our query language has more

com-plex features, enabling the user to query spatio-temporal relations that will be

described in the next section. In the query optimization module of our system,

fact base statistics are used to reorder spatial relations. In addition to this,

re-orderingalgorithmsfor other types of nodes such asinternalnodes that contain

operatorsare added.

Mahalingamand Candan propose techniques for performingquery

optimiza-tionindi erenttypesofdatabases,suchasmultimediaandWebdatabases,which

rely on top-k predicates [15]. Top-k predicates are the k predicates that return

the most relevant portion of all possible results. They propose an optimization

model that takes into account di erent binding patterns associated with query

predicatesand considers thevariationsinthe queryresult size,dependingonthe

execution order. Their optimizationmodel assigns a value (to be minimized) to

all partial or complete plans in the search space. It also determines the output

size of the data stream for every operator and predicate in the plan. So, the

(17)

out-ouroptimizationalgorithm. Themajordi erenceoftheiroptimizationalgorithm

from ours is that the number of query results can alsochange depending on the

query execution order in their work, whereas it is independent from the query

(18)

BilVideo: A Video DBMS

In this chapter, a video database system, BilVideo [1, 2, 3] to which the work

in this thesis is integrated, is described. BilVideo is a video database

man-agement system that supports spatio-temporal and semantic queries on video

data. Aspatio-temporalquerymaycontainanycombinationofspatial,temporal,

object-appearance,external-predicate,trajectory-projectionandsimilarity-based

objecttrajectoryconditions. Thesystemhandlesspatio-temporalqueriesusinga

knowledge-base, which consistsof a factbase and comprehensive set of rules

im-plemented in Prolog, while utilizing an object-relational database to respond to

semantic(keyword, event/activity,andcategory-based), color,shapeandtexture

video queries. The organization of this chapter is as follows: The architecture

of BilVideo is given in Section 3.1. The video query language of BilVideo is

described in Section 3.2. The query types are presented in Section 3.3. Query

processingissues in BilVideoare discussed inSection 3.4.

3.1 Video Database System Architecture

Figure 3.1 illustrates the overall architecture of BilVideo. The system is built

on a client-server architecture and the users access the video database on the

(19)

Video Clips

Fact−Extractor

Visual Query Interface

Users

WEB Client

Query Processor

Knowledge−Base

Extracted Facts

Video Annotator

Feature Database

Raw Video Database

Object−Relational DBMS

Results

Query

(File System)

Figure3.1: BilVideo database system architecture.

Queryprocessorliesintheheartofthe system. It isresponsibleforanswering

user queries in a multi-user environment. Query processor communicates with

the object-relational databaseOracle 1

and theknowledgebase. Semanticdata is

storedintheOracledatabaseandfact-basedmetadataisstoredintheknowledge

base. Videodataandrawvideodataarestoredseparately. Semanticpropertiesof

videosusedforkeyword, activity/eventandcategory-basedqueriesonvideodata

are stored inthe feature database. These features are generated and maintained

byavideoannotatortool. Theknowledge-baseisusedtoanswerspatio-temporal

queries. The facts-base isgenerated by the fact-extractor tool.

Therulesusedforqueryingthe videodata,calledquery rules,haveassociated

framenumbers. A secondset ofrules, calledextraction rules, wasalsocreated to

work with frame intervals to extract spatio-temporal relations from video data.

Extracted spatio-temporal relations are converted to facts with frame numbers

of the keyframes in the knowledge-base. These facts are used by the query rules

for queryprocessing.

1

(20)

3.2 Video Query Language

The query languagehas four basic statementsfor retrieving information:

select video from all [where condition];

select video from videolist where condition;

select segment from range where condition;

select variable from range where condition;

The target of a query is speci ed in select clause. A query may return

videos (video), segments of videos (segment), or values of variables (variable)

with/withoutsegments ofvideos where the values are obtained. Variablesmight

be used for object identi ers and trajectories. If the target of a query is video

(video),theusersmay alsospecifythemaximumnumberofvideos tobereturned

as aresult. The range of aquery is speci ed in from clause. The range may be

eitherthe entire videocollectionoralistof speci c videos. Query conditions are

given inthe where clause.

 Supported operators: Thequerylanguagesupportslogicalandtemporal

op-erators to be used in query conditions. Logical operators are and, or and

not. Temporaloperatorsarebefore,meets, overlaps,starts, during, nishes

and their inverse operators. In additionto these,the query languagehas a

trajectory-projectionoperator,project,whichisusedtoextract

subtrajecto-riesofvideoobjectsonagiven spatialcondition. Thelanguagealsohasthe

operators`=' and `!=', which can be used for assignmentand comparison.

 Aggregate functions: The aggregate functions of the query language are

average, sum and count. They take a set of intervals (segments) as input

and return a time value in minutes for each video clip satisfying given

conditions.

 External predicates: The query language has a condition type external,

(21)

pred-spatialpredicates. Ifanexternalpredicate isto beused, factsand/orrules

relatedto the predicate should be addedto the knowledge-base.

3.3 Query Types

The query language supports spatio-temporal, semantic and low-level queries.

Di erent querytypes that can be speci ed by the query languageare as follows:

 Object queries: This typeof queries may beused toretrieveobjects, along

with videosegments where the objects appear.

 Spatial queries: This type of queries may be used to query videos by

spa-tial properties of objects de ned with respect to each other. Supported

spatial properties for objects can be grouped into mainlythree categories:

topologicalrelationsthatdescribeneighborhoodandincidencein2D-space,

directionalrelationsthatdescribeorderin2D-space, and3D-relationsthat

describeobjectpositionsonz-axisofthethreedimensionalspace. Thereare

eight distinct topological relations: equal, cover, covered-by, inside, touch,

disjoint, overlapand,contains. Directionalrelationsare west, south,north,

east, northwest, northeast, southwest and, southeast. 3D relations are

in-frontof, behind, strictlyinfrontof, strictlybehind, touchfrombehind,

touched-frombehind and samelevel.

 Similarity-based object-trajectory queries: This type ofqueriesmay beused

to query videos to nd out the object and/or time interval of an object

havinga trajectory inthe videotoa given direction.

 Temporal queries: This type of queries is used to specify the order of

oc-currence for conditions intime.

 Aggregate queries: This type of queries may be used to retrieve statistical

data about objects and events in video data. The three aggregate

(22)

 Low-levelqueries: Thistypeofqueriesisusedtoqueryvideodatabyvisual

properties such ascolor, shape and texture.

 Semantic queries: This type of queries is used to query video data by

se-mantic features. In the system, videos are partitioned intosemanticunits,

which form a hierarchy. This semantic video hierarchy contains three

lev-els: video, sequence and scene. Videos consist of sequences, and sequences

consist of scenes that need not be consecutive in time. With this

seman-tic data model, three types of queries will be answered which are video,

event/activityand object.

3.4 Query Processing

Figure 3.2 illustrates how the query processor communicates with Web clients

and the underlyingsystem components toansweruser queries. Figure 3.3shows

the phases of query processing for spatio-temporal queries. Web clients make

a connection request to the query request handler, which creates a process for

each request passing a new socket for communication between the process and

the Web client. Then, user queries are sent to the processes created by the

query request handler. The queries are transformed intoSQL-like textual query

language expressions beforebeing senttothe serverif they are speci edvisually.

After receiving the query from the client, each process calls the query processor

with a query string and waits for the query answer. When the query processor

returns,the processcommunicatesthe answertotheWebclientissuingthequery

and exits. The query processor rst groups spatio-temporal, semantic, color,

shape and texture query conditions into proper types of sub-queries.

Spatio-temporal subqueries are reconstructed as Prolog-type knowledge-base queries.

Semantic, color, shape and texture sub-queries are sent as SQL queries to an

object relational database. Query processor integrates the intermediate results

and returns them to the query request handler, which communicates the nal

resultstoWebclients. Thephasesofqueryprocessingforspatio-temporalqueries

(23)

Web Client

(Java Applet)

User Query

Query Result

Set

Query Request

Handler

User Query

Query Result

Set

Query

Processor

(C++)

(C++)

Figure3.2: Web client -query processor interaction.

DECOMPOSER

Query

PARSER

LEXER

QUERY

EXECUTOR

QUERY

Result Set

Query

Tokens

Parse Tree

Query Tree

Query Execution Phase

Query Decomposition Phase

Query Recognition Phase

Figure 3.3: Query processing phases.

1. Query recognition: The lexical analyzer partitions a query into tokens,

which are passed to the parser with possible values for further

process-ing. Theparserassignsstructuretothe resultingpiecesand createsaparse

treetobeusedasastartingpointforqueryprocessing. Thisphaseiscalled

query recognition phase.

2. Query decomposition: The parse tree generated after the query

recogni-tion phase is traversed in a secondphase, whichis called query

decomposi-tion phase, to construct a query tree. The query tree is constructed from

the parse tree decomposing a query into three basic types of subqueries

which are Prolog subqueries (directional, topological,3D-relation, external

predicateand object-appearance) that canbedirectlysent tothe inference

engineProlog, trajectory-projection subqueriesthat are handled by the

tra-jectoryprojector, and similarity-based object-trajectory subqueries that are

processed by the trajectory processor. Maximal subqueries are subqueries

thatareformedbygroupingprologtypepredicates. Aqueryisdecomposed

insuch away that minimum numberof subqueries are formed.

3. Query execution: The input for the query execution phase is a query tree.

Thisquerytreeistraversedinpostorderinqueryexecutionphase,executing

(24)

processed in this phase and nal answers to user queries are formed after

(25)

Query Optimization

The aim of the query optimization algorithms designed and implemented for

BilVideo is to process more selective subqueries earlier than the others. The

algorithmsrestructure the initialquery tree andconstruct anoptimal querytree

inwhichthemoreselectivesubqueriesareexecutedearlierbythequeryprocessor.

The query optimizationprocess is outlinedin Figure4.1.

The queryoptimizationprocess implementedduring queryexecutionhas two

basicparts, which areinternal node reorderingand leaf nodereordering. In

addi-tion tothese parts,the statistics collected forthe videois read froma lebefore

executingthe leaf node reordering algorithm. These statisticsare used to

deter-mine the selectivities of relations in the condition part of the query. Selectivity

of arelationis inversely proportionaltothe numberof factsstored forthat

rela-tion. Internalnodereorderingalgorithmreordersthechildrenofinternalnodesby

placingrightchildren of`AND' nodes which are moreselectivethan leftchildren

to the left of their parents. Leaf node reordering algorithm deals with the leaf

InternalNodeReorder(queryt ree );

ReadStatistics();

LeafNodeReorder(querytree) ;

(26)

nodes. Every leafnodeinthe querytree hasacontentwhich storesthesubquery

to be executed. Leaf node reordering algorithm restructures these contents. It

usesthe subquerytreesconstructedforeachofthesecontentsinthe construction

of theinitialquerytree. This algorithmsorts the relationsinthe contentsof the

leafnodes whichare connectedby `AND'operatorsaccordingtotheirselectivity.

More selectiveoperationsare executed earlierthan the othersbythe reorderings

of this algorithm.

This chapter is organized as follows: In Section 4.1, our query tree structure

is explained. In Section 4.2, the internalnode reordering algorithmis described.

Finally,the leaf node reordering algorithmis presented inSection4.3.

4.1 Structure of the Query Tree

In our multimedia database model, a query is represented by a query tree

con-tainingthe spatio-temporalrelationshipsbetween thedata thatis tobeselected.

The condition in the where clause of the query is kept in this query tree. The

condition part can contain spatial relationships. Other functions that can take

placeintheconditionpartareobjecttrajectoryandprojecttypequeryfunctions.

Trajectoryqueries nd out theobjectand/or timeintervalof anobjecthavinga

trajectory in the video to a given direction. Project queries are used to extract

sub-trajectories of videoobjectsona given spatialcondition. The boolean

(logi-cal) operators of the query languageare and, or, not,The operators that can be

included in aquery are categorized intothree types:

1. AND:and

2. NOT-OR:not, or

3. TEMPORAL: before, during, meets, overlaps, starts, finishes, and

their inverse operators, ibefore, iduring, imeets, ioverlaps, istarts,

(27)

Therearetwotypesofnodesinthequerytree: internalnodesthatcontainthe

operators de ned above and leaf nodes that contain spatio-temporalsubqueries.

These subqueries have three types:

1. PlainProlog Queries(PPQ): Spatial subqueries processedby Prolog,

2. Trajectory Queries(TRQ): Object-trajectory subqueries, and

3. Project Queries(PRQ): Project subqueries.

4.2 Internal Node Reordering Algorithm

Inthe querytree, theinternalnodes arereordered rst. Internalnodereordering

algorithmplaces the more selective nodes as left children of their parents, since

theleftchildofaparentisprocessed rst. Theproposedalgorithmiteratesonthe

query tree and restructures the tree to get the optimal internal node structured

query tree. The internal node reordering algorithmisgiven in Figure4.2.

Theinternalnodereorderingalgorithmiteratesonthequerytreeandreorders

the children of `AND'typed nodes suchthat:

 The `AND', `TEMPORAL', `PPQ', `PRQ', `TRQ' type child nodes must

beonleftiftheother childis`NOT-OR'type. Since`NOT-OR'typenodes

combineresults fromtwo di erent result sets, they are found out tobethe

least selective compared to the other nodes.

 The `AND'typechild nodes must be onleft if the other childis

`TEMPO-RAL'type. Thisisbecauseofthefactthat`AND'typenodesareprocessed

faster than the `TEMPORAL' type nodes.

 The `PPQ' type child nodes with zero global variables must be on left if

the other child is `PRQ' or `TRQ' type. This is because of the fact that

(28)

InternalNodeReorder(QueryN ode qnode)

// Process the nodes which have children both on left and right

if(qnode->Left != NULL and qnode->Right != NULL)

begin

type=qnode->Type

ltype=qnode->Left->Type

rtype=qnode->Right->Typ e

// Reorder the children of `AND' nodes

if (type==AND)

begin

// `AND', `TEMPORAL', `PPQ', `PRQ', `TRQ' type child

// nodes must be on left if the other child is

// `NOT-OR' type

if (ltype==NOT-OR and

(rtype==AND or rtype==TEMPORAL or rtype==PPQ

or rtype==PRQ or rtype==TRQ))

exchange(qnode->Left, qnode->Right)

// `AND' type child nodes must be on left

// if the other child is `TEMPORAL' type

else if (ltype==TEMPORAL and rtype==AND)

exchange(qnode->Left, qnode->Right)

// `PPQ' type child nodes with zero global variables

// must be on left if the other child is

// `PRQ' or `TRQ' type

else if ((ltype==PRQ or ltype==TRQ) and

((rtype==PPQ) and (gvcount(qnode->Right)= =0)) )

exchange(qnode->Left, qnode->Right)

// `PRQ', `TRQ' type child nodes must be on left if

// other child is 'PPQ' type with global variables

else if (((ltype==PPQ) and (gvcount(qnode->Left)>0))

and (rtype==PRQ or rtype==TRQ))

exchange(qnode->Left, qnode->Right)

// `PRQ' type child nodes must be on left

// if the other child is `TRQ' type

else if (ltype==TRQ and rtype==PRQ)

exchange(qnode->Left, qnode->Right)

// `TRQ' type child nodes with zero global

// variables must be on left if the other

// child is `TRQ' type with global variables

else if ((ltype==TRQ) and (gvcount(qnode->Right)> 0)

and (rtype==TRQ) and (gvcount(qnode->Right)== 0))

exchange(qnode->Left, qnode->Right)

end

end

// call the function recursively for left and right subtrees

InternalNodeReorder(qnode ->L eft)

(29)

 The `PRQ', `TRQ' type child nodes must be on left if the other child is

`PPQ' type with global variables. This is because of the fact that `PRQ'

and `TRQ'type nodes are found out to be moreselective than `PPQ' type

nodes with global variables.

 The`PRQ'typechildnodesmustbeonleftiftheotherchildis`TRQ'type.

Thisisbecauseofthe factthat the subqueryinthe `PRQ'node can havea

variable tobe used by the subquery contained inthe `TRQ' node.

 The `TRQ' type child nodes with zero global variables must be on left if

the other childis `TRQ'type with globalvariables. This is due tothe fact

that `TRQ' type nodes with zero global variables are more selective than

`TRQ'type nodes with global variables.

The query tree is restructured using the above rules because the nodes that

are being placed to left found out to be more selective in the experiments. The

gvcountfunctioninthealgorithm ndsouttheglobalvariablecountofaparticular

node.

4.2.1 Examples

Some query tree examples are given in this part. In each example, the initial

query tree and the query tree afterinternalnode reordering are shown.

Query 1:

select segment, X, Y

from video

where ((west(X,Y) and disjoint(X,Y) and X != Y)

or Z=project(X, [west(X,a)])) and

(west(X,Y) and X=car1 and appear(Y) and south(Y,X))

In the query tree, the children of the root `AND' node are exchanged since

(30)

west(X,Y) and

disjoint (X,Y) and

X != Y

Z = project (X,

[west(X,a)]

west(X,Y) and X=car1

and appear(Y) and

south(Y,X)

OR

AND

west(X,Y) and X=car1

and appear(Y) and

south(Y,X)

west(X,Y) and

disjoint (X,Y) and

X != Y

Z = project (X,

[west(X,a)]

AND

OR

(a) (b)

Figure4.3: (a)InitialquerytreeforQuery1and(b)Query treeforQuery1after

internal node reordering

Query 2:

select segment, X, Y

from video

where ((west(X,Y) before disjoint(X,Y)) and

((appear(Y) before touch(X,Y)) and

(X != car1 and Z=project(X, [west(X,a)])))

BEFORE

west(X,Y)

appear(Y)

touch(X,Y)

X != car1

Z=project(X,

[west(X,car1)])

AND

AND

AND

BEFORE

disjoint(X,Y)

AND

X != car1

appear(Y)

touch(X,Y)

west(X,Y)

disjoint(X,Y)

Z=project(X,

[west(X,car1)])

BEFORE

AND

BEFORE

AND

(a) (b)

Figure4.4: (a)InitialquerytreeforQuery2and(b)Query treeforQuery2after

internal node reordering

Inthequerytree,thechildrenoftheroot`AND'nodeareexchangedsincethe

typeofthe leftchildis`TEMPORAL'and thetypeof therightchildis`AND'in

theinitialquerytree. Thechildrenofthe`AND' node whichisachildofthe root

node are alsoexchanged sincethe typeof theleft childis`TEMPORAL'and the

(31)

4.3 Leaf Node Reordering Algorithm

After the internal node reordering, the leaf nodes are reordered for each

deep-est internal node. Fact base statistics for each video is kept in a separate le.

The number of each spatio-temporal relation in the video is stored in this le.

So the numbers of south, northwest, southwest, equal, cover, inside, touch,

dis-joint,overlap,infrontof,behind,strictlyinfrontof, strictlybehind,touchfrombehind,

touchedfrombehind and samelevel facts are included in the le. These fact base

statisticsare used inleaf node reordering algorithm. In this algorithm,the facts

in the leaf nodes are sorted starting from the fact with the least number in fact

basestatistics letothefactwiththelargestnumber. `PPQ'and`PRQ'typeleaf

nodes are reordered according tothese statistics. These leafnodes contain

max-imal subqueries that can be directly sent to the inference engine. So subquery

trees for these maximal subqueries must be constructed to reorder leaf nodes.

This construction is implemented within the query tree construction part. As a

result, subquery trees for each maximal subquery inthe `PPQ' and `PRQ' type

leaf nodes are built and kept in a list data structure. The leaf node reordering

algorithmisgiven in Figure4.5.

Thisalgorithmiteratesonthequerytree. Stepsofthealgorithmareasfollows:

1. Findthe `PPQ' and `PRQ' type leaf nodes.

2. Findthe subquerytrees of these nodes inthe subquery list.

3. Reorderthese subquery trees.

4. Get the content of the reordered subqueries.

5. Replacethe contents of the leaf nodes with this content.

As itcan beseen fromthe algorithm,the condition parts of the `PRQ'typeleaf

nodesare replacedonly. Thefunctionsusedinthealgorithmare explainedinthe

(32)

LeafNodeReorder(QueryNode qnode,QueryTree qtree)

// Iterate on the tree if node is not null

if(qnode != NULL)

begin

type=qnode->Type()

queryid=qnode->getQID(I NORD ER)

// locate `PPQ' and `PRQ' leaf nodes

if (type==PPQ or type==PRQ)

begin

// find the subquery tree of

// the nodes in subquery list

tmpppq=FindPPQinList(qt ree, queryid)

// reorder the subquery tree

reorderAlg(tmpppq->ppqn ode)

// get the reordered subquery

getSubquery(tmpppq->ppq node )

// set the content of the node

if (type==PPQ)

set content of qnode as subquery

else if type==PRQ

set content of the condition part of

qnode as subquery

end

end

// call the function recursively for left and right subtrees

if(qnode->Left != NULL)

LeafNodeReorder(qnode->Lef t,qt ree)

if(qnode->Right != NULL)

LeafNodeReorder(qnode->Rig ht,q tree )

Figure 4.5: Leaf node reordering algorithm

FindPPQinList(QueryTree qtree, int queryid)

// locate the subquery tree of the leaf node with

// id=queryid in the subquery list tmpppq

tmpppq=qtree->headppq

for (int i=1; i<qtree->ppqcount ; i++)

if (queryid != tmpppq->queryid)

tmpppq=tmpppq->nextppq

(33)

FindPPQinList function isusedforlocatingthesubquerytree ofaparticular

leaf node inthe subquery list (see Figure4.6).

The reorderAlg function iterates on the subquery tree which is located in

the subquery tree list and restructures this query tree (see Figure 4.7). This

algorithm rst locates the highest `AND' typenode in the subquery tree, if this

node has left and right children and the left child is `NOT-OR' typed and the

right one is `AND' typed, it exchanges the left and right nodes. If children are

`PPQ'or`AND'typed andthereisno`NOT-OR'typenodebelowthesechildren,

this subtree is called maximal AND subtree and it is reordered according to fact

base statistics. If children are `PPQ' or `AND' typed and there is at least one

`NOT-OR' type node below these children, the algorithm nds out if the right

child is a maximal AND subtree or not. If it is a maximal AND subtree then it

exchanges the child with the left child. If the algorithmlocatesa maximal AND

subtree it does not recurse because it has already reordered all the nodes in the

subtree, otherwise it recurses.

ThereIsNoOrNot function returns 0 if there is a `NOT-OR' type node in a

tree and returns 1if all the nodes are `AND'typed (see Figure 4.8).

OrderLeafNodesfunctionordersamaximalANDsubtree. It rstputstheleaf

nodesintoanarraystructure, sortsthe arrayaccordingtothefactbase statistics

and puts the leaf nodes back to the tree (see Figure 4.9).

GetLeafNodes function gets leaf nodes of a tree and puts the contents and

globalvariablecountsof thenodestoanarraystructuretobeusedinthe sorting

procedure(see Figure 4.10).

SortLeafNodes functionsortstheleafnodes accordingtothefactbase

statis-tics. Itorderstherelationsintheincreasingnumberofstatistics(seeFigure4.11).

Thegetnum functiongetsthe statisticsof aparticularrelationfromthe statistics

le of the video. After sorting the relationsaccording to the statistics, the

func-tion puts the relations that query an inequality between any two objects in the

(34)

reorderAlg(QueryNode qnode)

// Iterate on the subquery tree located

// in the subquery tree list

norecurse=0

if(qnode!= NULL)

begin

type=qnode->Type

// locate the highest `AND' node on the subquery tree

if (type==AND)

if(qnode->Left != NULL and qnode->Right != NULL)

begin

ltype=qnode->Left->Type

rtype=qnode->Right->Type

// exchange left and right children

// if the left child is `NOT-OR' type

// and the right child is `AND' type

if (ltype==NOT-OR and rtype==AND)

exchange(qnode->Left, qnode->Right)

// If children are `PPQ' and `AND' typed and

// there is no `NOT-OR' type node below these

// children order the leaf nodes of this subtree

// else if there is no `NOT-OR' type node in the

// right subtree put this subtree to left

else if ( (ltype==AND and rtype==AND)

or(ltype==AND and rtype==PPQ)

or(ltype==PPQ and rtype==AND)

or(ltype==PPQ and rtype==PPQ) )

if (ThereIsNoOrNot(qnode)== 1) begin OrderLeafNodes(qnode) norecurse=1 end else if (ThereIsNoOrNot(qnode->Righ t)= =1) exchange(qnode->Left, qnode->Right) end

// call the function recursively for left and right

// subtrees if a maximal `AND' subtree is not located

if (norecurse != 1) begin reorderAlg(qnode->Left) reorderAlg(qnode->Right ) end end

(35)

ThereIsNoOrNot(QueryNode root)

// return 0 if there is at least one `NOT-OR'

// type node in the tree return 1 otherwise

if(root->Left != NULL) begin if (root->Left->Type==NOT-OR) return 0 if (ThereIsNoOrNot(root->Left )==0 ) return 0 end if(root->Right != NULL) begin if (root->Right->Type==NOT-OR ) return 0 if (ThereIsNoOrNot(root->Righ t)== 0) return 0 end return 1

Figure4.8: The function that nds if there is a`NOT-OR' type node ina tree

OrderLeafNodes(QueryNode qnode)

// get the leaf nodes of the maximal AND subtree

// sort the leaf nodes according to the fact base statistics

// put the leaf nodes back to the tree

leafcounter=0

GetLeafNodes(qnode,nodesa rr)

SortLeafNodes(nodesarr)

leafcounter=0

PutLeafNodes(qnode,nodesa rr)

(36)

GetLeafNodes(QueryNode qnode,nodedata nodesarr[])

// get the leaf nodes of the tree and put their contents

// and global variable counts to the array nodesarr

if(qnode->Left != NULL)

if (qnode->Left->Type==PPQ)

begin

nodesarr[leafcounter].ncon tent =qn ode- >Lef t->C ont ent

nodesarr[leafcounter].ppqf lag= gvc ount (qno de-> Lef t)

leafcounter++

end

if(qnode->Right != NULL)

if (qnode->Right->Type==PPQ)

begin

nodesarr[leafcounter].n cont ent= qno de-> Righ t->C ont ent

nodesarr[leafcounter].p pqfl ag= gvcount(qnode->Right)

leafcounter++

end

// call the function recursively for left and right subtrees

if(qnode->Left != NULL)

GetLeafNodes(qnode->Left, nodesarr)

if(qnode->Right != NULL)

GetLeafNodes(qnode->Right, nodesarr)

(37)

SortLeafNodes(nodedata nodesarr[])

// sort the leaf nodes according to the fact base

// statistics

for (i=1; i<leafcounter; i++)

begin

for (j=i; j>0 and getnum(nodesarr[j])

<getnum(nodesarr[j-1]);j --)

exchange(nodesarr[j],no desa rr[j -1] )

// put the relations that query an inequality

// between any two objects in the video

// to the end of the order

for (i=0; i<leafcounter; i++)

if ((nodesarr[i].ncontent.fi nd(" !=" )) and

(nodesarr[i].ppqflag>1))

begin

shift nodesarr left starting from i+1 to j

put nodesarr[i] to the end of the array nodesarr

end

end

Figure4.11: The function that sorts leaf nodes

PutLeafNodes function puts the elements of an array structure to the leaf

nodes of a tree. So, the nodes of the unsorted tree are replaced with the sorted

nodes. (see Figure 4.12)

4.3.1 Examples

Some query examples are given inthis part. The initialqueries and the queries

afterleafnodereorderingaccordingtothefactbasestatisticsareshown. The

rela-tionsinthequeryexamplesarereorderedassumingthat(south facts< samelevel

facts < west facts < overlap facts < disjoint facts < appear facts) in the fact

base.

Query 1:

select segment, X, Y

(38)

PutLeafNodes(QueryNode qnode,nodedata nodesarr[])

// put the elements of the array nodesarr to the

// leaf nodes of the tree with the root qnode

if(qnode->Left != NULL)

begin

if (qnode->Left->Type==PPQ)

begin

qnode->Left->setContent(n odes arr[ lea fcou nter ].nc ont ent)

leafcounter++

end

PutLeafNodes(qnode->Left ,nod esar r)

end

if(qnode->Right != NULL)

begin

if (qnode->Right->Type==PPQ)

begin

qnode->Right->setContent( node sarr [le afco unte r].n con tent )

leafcounter++

end

PutLeafNodes(qnode->Righ t,no desa rr)

end

(39)

where (samelevel(X,Y) and appear(X) and overlap(X,Y))

or (appear(X) and west(X, Y) and disjoint(X,Y))

Query 1 after leaf node reordering:

select segment, X, Y

from video

where (samelevel(X,Y) and overlap(X,Y)

and appear(X)) or (west(X,Y) and

disjoint(X,Y) and appear(X))

InitialsubquerytreeforQuery1andsubquerytreeforQuery1afterleafnode

reordering,whichare locatedinthe subquery tree list,are shown inFigure4.13.

AND

AND

appear(X)

west(X,Y)

disjoint(X,Y)

overlap(X,Y)

samelevel(X,Y)

appear(X)

AND

OR

AND

AND

appear(X)

disjoint(X,Y)

west(X,Y)

overlap(X,Y)

samelevel(X,Y)

appear(X)

AND

AND

AND

OR

(a) (b)

Figure 4.13: (a) Initial subquery tree for Query 1 and (b) Subquery tree for

Query 1 afterleaf node reordering

The relations in Query 2 are reordered in the second query, since samelevel

facts < overlap facts < appear facts and west facts < disjoint facts < appear

facts.

Query 2:

select segment, X, Y

from video

where disjoint(X,Y) and X != Y and west(X,Y)

(40)

Query 2 after leaf node reordering:

select segment, X, Y

from video

where X=car1 and south(Y,X) and west(X,Y)

and disjoint(X,Y) and appear(Y) and X != Y

The relations in Query 1 are reordered as it can be seen from the second

query,sincesouth facts<westfacts <disjointfacts <appear facts. Theequality

relations are executed atthe beginning of the condition part and the inequality

(41)

Performance Results

In this chapter, the performance results obtained for the query optimization

al-gorithm are presented. Performance tests have been conducted on an example

videothatisextracted fromtelevisionnews. Performancetests havebeen carried

out onLinux environmentusing the query processor of BilVideoimplemented in

C++. Theperformanceparametersthata ect queryoptimizationare asfollows:

1. Sizeof thequery: Whilethesize ofthequeryisbeing increased,the

perfor-mancegainobtained byour strategyalsoincreases. Forsmallsizedqueries,

therewillbesmallnumberof reorderingsbetween the nodes, sothe

perfor-mancegain will beless compared tothat with the largesized queries.

2. Size of the video: Size of the video is another parameter a ecting query

optimizationsince the size of the fact base is directly relatedwith the size

of the video. The performance gain will increase if the size of the video

increases.

The organization of this chapter is as follows: In Section 5.1 statistics of the

fact base used are provided. Example facts from this fact base can be found in

Appendix A. The performance test results are presented Section 5.2. Example

(42)

5.1 Fact Base Statistics

The fact base of the example video is created using the fact extractor tool of

BilVideo. The statistics of the videoare given in Table 5.1. These statistics are

used inthe optimizationalgorithmtoreorder the leaf nodes.

Table 5.1: The statisticsof the factbase

Type of relation Number

west 1055 east 1055 south 206 northwest 0 southwest 0 disjoint 1682 overlap 1235 inside 0 appear 10234 touch 9 touchfrombehind 37 strictlyinfrontof 184 infrontof 276 samelevel 487 5.2 Performance Results

Five querysets are used inthe performance tests. The rst queryset isused for

testing the Leaf Node Reordering algorithm. The second set is used for testing

the whole optimizationalgorithm. The third and fourth sets are constructedfor

testingthealgorithmondi erentreorderingsofthe samequery. Finally,the fth

set is used for testing the same query on di erent sizes of fact bases and result

sets. Thequerysets canbefoundinAppendixB. Optimizationoverheadgiven in

(43)

gain is de ned in Formula5.1. The rst set of results are given in Table 5.2.

performance gain=

(proc: timew=oopt: proc: timewith opt:)

proc:time w=o opt:

: (5.1)

Table 5.2: Leaf node reorder algorithmtest results (msecs)

query time time optimization performance

without with overhead gain

opt. opt. 1 310 263 1.0 0.15 2 1002 609 1.0 0.39 3 512 264 1.0 0.48 4 490 291 1.0 0.41 5 508 217 1.0 0.57 6 423 261 1.0 0.38 7 2027 259 1.0 0.87 8 752 708 2.0 0.06 9 303 258 1.0 0.15 10 2030 1603 3.0 0.21 11 225 214 1.0 0.05 12 270 215 1.0 0.20

These results show that leaf node reordering algorithm increases the

perfor-mance of the query processor. There are di erent performance gains for each

queryintheset. Thisisbecausethe performance gaindepends onthe size ofthe

queryandthedi erencebetweentheinitialquerytreeandtheoptimalquerytree.

Thesizesofthe rst, ninth, eleventhandtwelfthqueriesaresmallsotheir

perfor-mance gains are at most 0.21. If the size of the query is small, the performance

gain isalso smallcompared tothe larger queries.

Leaf node reorderingalgorithmreduces the processing cost, becausethe

rela-tionsintheleafnodesareorderedstartingfromtherelationwiththesmallestsize

of outputtothe relationswith largersized outputs. Sothe unbound variables in

the nodes are rst bound with smaller sets of values and relationswith constant

parametersare executed earlier. This results inanincrease inperformance. The

(44)

Table 5.3: Query optimization algorithmtest results (msecs)

query time time optimization performance

without with overhead gain

opt. opt. 1 690 212 1.0 0.69 2 958 530 2.0 0.45 3 532 270 1.0 0.49 4 327 267 1.0 0.18 5 644 283 2.0 0.56 6 639 344 1.0 0.46 7 545 337 1.0 0.38 8 274 214 1.0 0.22 9 261 211 1.0 0.19 10 985 286 1.0 0.71 11 302 213 2.0 0.29 12 845 283 2.0 0.67

Theseresultsshowthattheoverallqueryoptimizationalgorithmincreasesthe

query processing performance. The factors that a ect the results obtained with

the the leaf node reordering algorithmdiscussed abovealsoa ect those with the

wholeoptimization algorithm.

The query optimization algorithm reduces the processing cost, because the

subqueries with larger selectivities are processed before the subqueries with

smaller selectivities. Forexample, if children of an `and' node are `or' and `and'

type internal nodes, the `and' type child is processed before the other which

results ina considerablegain in performance.

Asitismentionedpreviously,performancegaindependsonthesize and

com-plexity of the query. Another factor a ecting the performance is the di erence

between the initialquery tree and the optimal query tree. The third and fourth

performance tests are done using di erent reorderings of the same query. The

querytree convergesto the optimalquerytree startingfromthe rst query. The

thirdresult set that uses asimple Prologquery isgiven inTable 5.4. The fourth

(45)

Table 5.4: Convergence to the optimalquery tree; rst test results(msecs)

query time time optimization performance

without with overhead gain

opt. opt.

1 1327 256 2.0 0.81

2 341 256 2.0 0.25

3 305 255 1.0 0.16

4 253 253 1.0 0.00

Table 5.5: Convergence to the optimalquery tree; second test results (msecs)

query time time optimization performance

without with overhead gain

opt. opt. 1 1306 218 2.0 0.83 2 1213 220 1.0 0.82 3 663 218 2.0 0.67 4 647 219 3.0 0.66 5 563 220 2.0 0.61 6 345 222 2.0 0.36 7 324 219 2.0 0.32 8 219 219 2.0 0.00

Thesetworesultsets showthatwhen thequery treeconverges tothe optimal

query tree, the performance gain of the optimization algorithm decreases. This

alsojusti es the correctness of the optimization algorithm.

The last performance test is done for investigating the e ect of the query

result set size onperformance gain. A query is selected and its result set size is

decreased by decreasing the fact base size at each step. The results of this test

are given inTable 5.6.

As itcan beseen from the performance results, when the size of queryresult

set decreases, the performance gain of thequery doesnot change much,and it is

(46)

Table 5.6: Query result set size parameter test results (msecs)

size of time time performance

result without with gain

set opt. opt.

133 2533 786 0.69 120 2259 713 0.68 105 2067 665 0.68 94 2013 632 0.69 85 1960 616 0.69 74 1673 538 0.68 65 1399 449 0.68 45 1275 379 0.70 34 1209 353 0.71 27 830 281 0.66 20 688 251 0.64 11 669 231 0.65 2 650 208 0.68

The performance test results prove that the queryoptimizationmethod

imp-lemented for BilVideo improves the performance of the query processor. Since

the performance gain is observed to decrease when the query tree converges to

the optimal query tree, it can be said that the reordering heuristics used by the

algorithmare correct. Asaconclusion, itisshown that processingmoreselective

subqueriescontainedintheinternalnodes andleafnodesofthe querytree earlier

thanthe othersisvery usefulinoptimizingqueryprocessingtimesinmultimedia

database systems.

5.3 Examples

Somequeriesselectedfromthesetofqueriesusedintheperformancetestsare

dis-cussedinthispart. Theinitialquerytrees andthe querytreesafteroptimization

(47)

Query 1:

select segment, X, Y

from video

where (west(X,Y) and disjoint(X,Y) and X != car1

or Z = project(X,[west(X, car1)])) and (west(X,Y)

and T = project(X,[west(X, car1)]))

west(X,Y) and

disjoint (X,Y) and

X != car1

Z = project (X,

[west(X,car1)]

west(X,Y)

T = project (X,

[west(X,car1)]

OR

AND

AND

T = project (X,

[west(X,car1)]

west(X,Y)

west(X,Y) and

disjoint (X,Y) and

X != car1

Z = project (X,

[west(X,car1)]

AND

AND

OR

(a) (b)

Figure5.1: (a)InitialquerytreeforQuery1and(b)Query treeforQuery1after

optimization

The initialquery treeof Query 1(Figure5.1(a))is processed in985

millisec-onds and the optimized query tree (Figure 5.1 (b)) is processed in 286

millisec-onds. So, the performance gain is71%.

Query 2:

select segment, X, Y

from video

where (samelevel(X,Y) before disjoint(X,Y)) and

(infrontof(X,Y) and X != car1 and tr(X, [[west], [1]]))

The initialquery treeof Query 2(Figure5.2(a))is processed in845

millisec-onds and the optimized query tree (Figure 5.2 (b)) is processed in 283

(48)

samelevel (X,Y)

disjoint(X,Y)

tr(X, [[west],[1]])

BEFORE

AND

AND

infrontof (X,Y) and

X != car1

tr(X, [[west],[1]])

X != car1 and

infrontof (X,Y)

samelevel (X,Y)

disjoint(X,Y)

AND

BEFORE

AND

(a) (b)

Figure5.2: (a)InitialquerytreeforQuery2and(b)Query treeforQuery2after

(49)

Conclusions and Future Work

Query processingis essentialfor retrievingdata fromdatabase management

sys-tems and has been explored in the last30 years in the contest of relationaland

object-oriented database management systems. Query optimization constitutes

an important part of query processing, and it is a promisingresearch area since

the amountofdata that canbe managedbydatabase systems isgrowingrapidly

and new data typesare becomingwidely used. Also,new database management

systemssuchasmultimediadatabasesrequirenewtechniquesforqueryprocessing

and query optimization.

In this thesis, we have presented a query optimization method for video

databasesystems, whichwasimplementedonaparticular system,BilVideo. The

proposed optimization method has two parts: internal node reordering and leaf

node reordering. The children of the internal nodes of the query tree of a given

query are reordered using the internal node reordering algorithm which places

more selective children to the left of their parents. The contents of the prolog

and project type leaf nodes are reordered using the leaf node reordering

algo-rithm which makes use of statistical information to sort the relations forming

the contents of the leaf nodes. Therefore, our optimization method reorders the

query tree along two dimensions that results in a considerable improvement in

performance. The performance tests conducted on the query processor justi es

(50)

reordering and leafnode reordering.

Currently, the proposed optimizationalgorithmsare used by a query

proces-sor which uses linear processing methods. The algorithms can be adapted to

a parallel query processor as a future work which can result in an even better

performance. Another future work can be the use of geneticalgorithmsinquery

optimizationofBilVideo,astheyare becomingwidelyusedandacceptedmethod

fornew and diÆcultoptimization problems. This method must propose a tness

value function forthe query trees inthe solutionspace and adapt cross-overand

(51)

[1] M. E.Donderler, 

O.Ulusoy,U. Gudukbay, A Rule-based Approach to

Rep-resent Spatio-Temporal Relations in Video Data, International Conference

onAdvances inInformation Systems (ADVIS'2000), Izmir, Turkey, Lecture

Notes in Computer Science (Springer Verlag), eds. T. Yakhno, vol. 1909,

pp. 409-418,October2000.

[2] M. E. Donderler, 

O. Ulusoy, U. Gudukbay, A Rule-Based Video Database

SystemArchitecture, InformationSciences,vol.143, no.1-4,pp.13-45,2002.

[3] M. E. Donderler, 

O. Ulusoy, U. Gudukbay, Rule-Based Spatio-Temporal

Query Processing for Video Databases (Submittedto the VLDB journal).

[4] N.S. Chang, K.S. Fu. Query by pictorial example. IEEE Transactions on

Software Engineering,SE6, vol. 6,pp. 519-524, 1980.

[5] M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M.

Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker. Query

by image and video content: The QBIC System. IEEE Computer, vol. 28,

pp. 23-32, 1995.

[6] S. Chang,W. Chen, H. J. Meng, H. Sundaram, and D. Zhong. VideoQ:An

automated content-based video search system using visualcues. In Proc. of

ACM Multimedia, pp. 313-324,Seattle,Washington, USA, 1997.

[7] W.W.Chu,A.F.Cardenas,andR.K.Taira.Aknowledge-basedmultimedia

medicaldistributeddatabase system-KMED.InformationSystems,vol.20,

(52)

[8] E.OomotoandK.Tanaka.SemanticQueryOptimizationforTreeandChain

Queries. IEEE Transactions on Knowledge and Data Engineeering, vol. 6,

no. 1, February1994.

[9] W. Sun and C. T. Yu. OVID: Designand implementation of a videoobject

database system. IEEE Transactions on Knowledgeand Data Engineeering,

vol.5, pp. 629-643,1993.

[10] M.Jarke andJ.Koch.Query optimizationindatabasesystems. ACM

Com-puting Surveys, vol.16,no. 2,pp. 111-152, June 1984.

[11] G.Graefe.Queryevaluationtechniquesforlargedatabases.ACMComputing

Surveys, vol.25, no. 2,pp. 73-170,June 1993.

[12] G. Menekse, F. Polat, A. Cosar. Alternative Plan Generation Methods for

Multiple Query Optimization, ISCIS'98, Advances in Computer and

Infor-mation Sciences'98,eds. U. Gudukbay etal., pp. 246-253,1998.

[13] S. Chaudhuri. An Overview of Query Optimization in Relational Systems,

In Proc. of Principles of Database Systems'98, pp. 34-43, 1998.

[14] A.So er,H.Samet.QueryProcesssingandOptimizationforPictorialQuery

Trees,Visual InformationandInformationSystems- VISUAL99(D.P.

Hui-jsmans and A.W. M.Smeulders, Eds.), LectureNotes inComputer Science

1614, Springer, Berlin,1999, pp. 60-67.

[15] L. P. Mahalingam, K. S. Candan. Query Optimization in the Presence of

Top-k Predicates, Multimedia Information Systems 2001,pp. 31-40.

[16] R. H. Guting, M. H. Bohlen, M. Ervig, C. S. Jensen, N. A. Lorentzos, M.

Schneider,M.Vazirgiannis.Afoundationforrepresentingandquerying

mov-ingobjects, ACM TransactionsonDatabaseSystems,vol.25,no.1,pp.1-42,

2000.

[17] J. Z. Li, M. T. 

Ozsu, D. Szafron. Modeling of moving objects in a video

database, In Proc. of IEEE Multimedia Computing and Systems, pp.

(53)

[18] M. Nabil,A.H. Ngu, J.Shepherd. Modeligand retrievalof movingobjects,

Multimedia Tools and Applications, vol.13,pp. 35-71,2001.

[19] A. P. Sistla, O. Wolfson, S. Chamberlain, S. Dao. Modeling and querying

(54)

Sample Fact Base for an Example

Video

This is anexample video containing16,351 frames and 98salient objects. Some

salientobjectsinthevideoare tank1,car1,bodyguard1 andpowell. The video

isextracted fromtelevision news.

// Directional Relations west(tank1,car1,259). west(car1,car2,259). west(tank1,car1,272). west(car1,car2,272). west(car2,car3,272). west(tank1,car1,277). west(car1,car2,277). west(car2,car3,277). west(car3,car4,277). west(tank1,car1,280). west(car1,car2,280). west(car2,car3,280). west(car3,car4,280). west(car4,car5,280).

(55)

west(tank1,car1,282). west(car1,car2,282). west(car2,car3,282). west(car3,car4,282). west(car4,car5,282). west(car1,car2,287). west(car2,car3,287). west(car3,car4,287). west(car4,car5,287). west(car1,car2,298). west(car2,car3,298). west(car3,car4,298). west(car4,car5,298). west(car1,car2,303). west(car2,car3,303). west(car3,car4,303). west(car4,car5,303). west(tank3,tank4,366). west(tank3,tank4,386). west(tank3,tank4,395). west(tank3,tank4,408). west(tank5,israelisoldier1,409). west(tank5,israelisoldier1,435). west(tank5,israelisoldier1,456). west(palestinianofficer4,bodyguard2,484). south(palestinianofficer1,officialcar,527). south(palestinianofficer4,powell,527). south(palestinianofficer3,officialcar,531). south(palestinianofficer1,officialcar,531). south(palestinianofficer4,powell,531). south(palestinianofficer1,officialcar,535). south(palestinianofficer4,powell,535). south(palestinianofficer1,officialcar,542). south(palestinianofficer4,powell,542). south(palestinianofficer3,bodyguard1,542).

(56)

south(palestinianofficer1,officialcar,547). south(palestinianofficer4,powell,547). south(palestinianofficer3,bodyguard1,547). south(palestinianofficer1,officialcar,550). south(palestinianofficer4,powell,550). south(palestinianofficer3,bodyguard1,550). south(palestinianofficer4,officialcar,560). south(palestinianofficer3,bodyguard1,560). south(palestinianofficer4,palestinianofficer1,568). south(palestinianofficer4,officialcar,568). south(palestinianofficer3,bodyguard1,568). south(palestinianofficer4,officialcar,572). south(palestinianofficer3,bodyguard1,572). south(palestinianofficer4,officialcar,578). south(palestinianofficer3,bodyguard1,578). south(palestinianofficer4,officialcar,579). //Topological Relations disjoint(car2,car5,303). disjoint(car1,car5,303). disjoint(tank5,israelisoldier1,409). disjoint(tank5,israelisoldier1,435). disjoint(tank5,israelisoldier1,456). disjoint(palestinianofficer2,bodyguard1,484). disjoint(palestinianofficer3,powell,484). disjoint(powell,bodyguard2,484). disjoint(palestinianofficer2,palestinianofficer4,484). disjoint(palestinianofficer4,bodyguard2,484). disjoint(palestinianofficer2,palestinianofficer1,484). disjoint(bodyguard1,powell,484). disjoint(bodyguard1,bodyguard2,484). disjoint(palestinianofficer2,bodyguard2,484). disjoint(palestinianofficer2,powell,484). disjoint(palestinianofficer1,bodyguard2,484). disjoint(palestinianofficer3,bodyguard2,484).

(57)

disjoint(palestinianofficer2,bodyguard1,492). disjoint(palestinianofficer3,powell,492). disjoint(palestinianofficer3,palestinianofficer4,492). disjoint(bodyguard1,palestinianofficer4,492). disjoint(powell,bodyguard2,492). disjoint(palestinianofficer2,palestinianofficer4,492). disjoint(palestinianofficer4,bodyguard2,492). disjoint(palestinianofficer2,palestinianofficer1,492). disjoint(bodyguard1,powell,492). disjoint(bodyguard1,bodyguard2,492). disjoint(palestinianofficer2,bodyguard2,492). disjoint(palestinianofficer2,powell,492). disjoint(palestinianofficer1,bodyguard2,492). disjoint(palestinianofficer3,bodyguard2,492). disjoint(palestinianofficer3,powell,498). disjoint(palestinianofficer3,palestinianofficer4,498). disjoint(bodyguard1,palestinianofficer4,498). disjoint(powell,bodyguard2,498). disjoint(palestinianofficer2,palestinianofficer4,498). overlap(palestinianofficer4,powell,503). overlap(palestinianofficer1,powell,503). overlap(palestinianofficer4,palestinianofficer1,503). overlap(palestinianofficer4,officialcar,503). overlap(palestinianofficer3,palestinianofficer1,503). overlap(palestinianofficer3,bodyguard1,503). overlap(palestinianofficer3,officialcar,503). overlap(palestinianofficer2,officialcar,503). overlap(palestinianofficer1,officialcar,503). overlap(officialcar,powell,503). overlap(bodyguard1,officialcar,503). overlap(palestinianofficer1,bodyguard1,503). overlap(bodyguard2,officialcar,503). overlap(palestinianofficer2,bodyguard1,512). overlap(palestinianofficer3,powell,512). overlap(palestinianofficer4,powell,512).

Şekil

Figure 3.1: BilVideo database system architecture.
Figure 3.2: Web client - query processor interaction.
Figure 4.3: (a) Initial query tree for Query 1 and (b) Query tree for Query 1 after
Figure 4.5: Leaf node reordering algorithm
+7

Referanslar

Benzer Belgeler

If some features have only main effects on targets, RPFP makes predictions for those features by using the whole instance space instead of local region determined by

It is shown that these methods can be used for analyzing relatively large closed queueing networks with phase-type service distributions and arbitrary buffer sizes.. While

This thesis extends two approximative fixed–point iterative methods based on decomposition for closed queueing networks (QNs) with Coxian service dis- tributions and arbitrary

In the results and discussion section, repeated measure- ment results with different glycerol solutions at different temper- atures are shown in addition to the measurements with

Figure 2. Cumulative loss of post-crisis GDP compared to intertemporal equilibrium of EXP-1... would have resulted in a self-regulating economic cycle and a soft landing even

kuma*in kati cisimlerle qarpi*ip qarpi*madigi veya sivinin Kuma, tiiriu cisimler soz konusu oldugunda goz oniine hangi kati cisime yakin oldugu gibi sinamalar par,acik sistemi

Finally, we developed four software systems as a practical outcome: the Skynet parallel text retrieval system, the SE4SEE search engine, the Harbinger text classification system, and

Although the QBD platform is quite general and can be used for many stochastic models that pertain to different areas, in this paper, examples of QBD models pertaining to