• Sonuç bulunamadı

Story Link Detection in Turkish Corpus

N/A
N/A
Protected

Academic year: 2021

Share "Story Link Detection in Turkish Corpus"

Copied!
34
0
0

Yükleniyor.... (view fulltext now)

Tam metin

(1)

A. Can Polatkan, Güven Köse, Hamid Ahmadlouei, Yaşar Tonta

polatkan@informatik.uni-tuebingen.de

Story Link Detection in Turkish Corpus

guvenkose@googlemail.com hamid2026@googlemail.com tonta@hacettepe.edu.tr

University of Tübingen

Center for Bioinformatics Integrative Transcriptomics

Hacettepe University

Department of Information Management

11:30 - 11:45 Tower Room 1204 Session 6: Web Data Analysis

18.11.13 Web Intelligence’13 · Atlanta, USA

(2)

Motivation

Story Link Detection in Turkish Corpus WI’2013

(3)

Story Link Detection in Turkish Corpus WI’2013

Information Retrieval (IR) Systems aim to find the information in documents in different environments in order to

submit them to the interested users

(4)

Motivation

In the recent years IR systems mainly focus on Topic Detection and Tracking (TDT)

TDT aims to

detect new unreported stories organize them temporally

link incoming news items with previously detected stories on the same topic

monitor news streams online till stories peter out

Story Link Detection in Turkish Corpus WI’2013

Chen et al.

(5)

Motivation

TDT try to solve the tasks below:

First Story Detection Story Clustering

Topic Tracking

Story Link Detection

Story Link Detection in Turkish Corpus WI’2013

(6)

First Story Detection

find the first story about the topic

Story Link Detection in Turkish Corpus WI’2013

(7)

Story Clustering

Story Link Detection in Turkish Corpus WI’2013

put stories of the same topic together

(8)

Topic Tracking

given a few stories on the topic, find the rest

Story Link Detection in Turkish Corpus WI’2013

(9)

Story Link Detection

Story Link Detection in Turkish Corpus WI’2013

distinguish if two different stories are on the same subject or not?

?

(10)

Story Link Detection in Turkish Corpus WI’2013

source: news360.com

(11)

Story Link Detection in Turkish Corpus WI’2013

source: news360.com

(12)

Story Link Detection in Turkish Corpus WI’2013

Story link detection (SLD) algorithms play a key role in establishing linkages between stories discussing the same subject

Carrying out the SLD task successfully is expected to solve many problems in TDT

IR T-IR TDT SLD

Motivation

Allan et al.

(13)

Methodology

Story Link Detection in Turkish Corpus WI’2013

(14)

TDT Test Collection

Story Link Detection in Turkish Corpus WI’2013

We performed a study on Turkish news items

BilCol-2005 test collection contains 209,305 items 5 different Turkish news sources

5,882 news items classified under 80 topics 203,423 items classified as unknown

(15)

Retrieval Models used in the study

Story Link Detection in Turkish Corpus WI’2013

Retrieval models used in SLD are similar to traditional IR systems

Boolean Model

Vector Space Model Probabilistic Model Language Model Relevance Model

In this study, we used the Vector Space Model (VSM) and Relevance Model (RM) to carry the SLD tasks

(16)

We assessed performance by computing Precision

Recall

F-measure

Retrieval Evaluation

Story Link Detection in Turkish Corpus WI’2013

Assumption:

high precision, recall & f-measure = Better Results

Van Rijsbergen et al., Rennie et al.

(17)

Vector Space Model

Story Link Detection in Turkish Corpus WI’2013

Documents and queries are represented as vectors of index terms

Similarity between these vectors prove the document/

query matchup

Coefficients contained in the vectors highlight the

importance of each term to what extent it represents the documents and/or queries

(18)

Vector Space Model

Story Link Detection in Turkish Corpus WI’2013

Represent the query as a weighted tf.idf vector

Represent each document as a weighted tf.idf vector Compute the cosine similarity score for the query

vector and each document vector

Rank documents with respect to the query by score

(19)

Relevance Model

Story Link Detection in Turkish Corpus WI’2013

RM is the advanced version of the language model which is extensively used in SLD tasks

RM offers a new approach to the estimation of

probabilities when the necessary conditions of training data are absent

The probability distributions were compared on the basis of Kullback-Leibler to determine document similarity

(20)

Relevance Model

Story Link Detection in Turkish Corpus WI’2013

Represent the query as a probability distribution

Represent each document as a probability distribution Compute the Kullback-Leibner divergence score for the query and each document

Rank documents with respect to the query by score

(21)

Testing

Story Link Detection in Turkish Corpus WI’2013

(22)

Test Collection

Story Link Detection in Turkish Corpus WI’2013

The BilCol-2005 collection is divided into training (1/3 of news items) and test (2/3 of news items) sets

Tests carried through the Turkish corpus with 3,922 news items with known topic titles

135,609 news items with unknown topic titles

(23)

In order to identify the effects of the number of index terms on the match performance, tests repeated for 1, 2, 3, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 125, 150, 175, 200, 225, 250, 275, 300, 400, 500 and 1000 respectively TDT Test Collection

Story Link Detection in Turkish Corpus WI’2013

15 10

25 50 100 175 250 500 1000

(24)

Discussion and Conclusion

Story Link Detection in Turkish Corpus WI’2013

(25)

Test Results of Vector Space Model

Story Link Detection in Turkish Corpus WI’2013

Best Performance

Recall Precision F-measure Index Terms: 30 F-measure: 0.2970 Recall: 0.2642 Precision: 0.3393

(26)

Test Results of Relevance Model

Story Link Detection in Turkish Corpus WI’2013

Recall Precision F-measure

Best Performance

Index Terms: 4

F-measure: 0.1910 Recall: 0.1625 Precision: 0.2316

(27)

Tu rk is h

Test Results _ VSM vs RM

Story Link Detection in Turkish Corpus WI’2013

It appears that the selected best VSM method is more advantageous than the selected best RM method,

providing higher recall (%10.17) and precision (%10.77) values

VSM

%10

RM

(28)

Test Results _ AND combination of VSM and RM

Story Link Detection in Turkish Corpus WI’2013

Recall Precision F-measure Index Terms: 4

F-measure: 0.2216 Recall: 0.1504 Precision: 0.4183

Best Performance

(29)

Test Results _ OR combination of VSM and RM

Story Link Detection in Turkish Corpus WI’2013

Recall Precision F-measure

Best Performance

Index Terms: 15 F-measure: 0.2641 Recall: 0.2762 Precision: 0.2531

(30)

%1.2

%7.9

Test Results _ AND vs OR

Story Link Detection in Turkish Corpus WI’2013

The AND combination of the methods achieved a

%7.9 increase compared to the best case with the highest precision value

The OR combination of the methods achieved a

%1.2 increase compared to the best case with the highest recall value

(31)

Conclusion

Story Link Detection in Turkish Corpus WI’2013

SLD that drew special attention in TDT research is applied first time on a Turkish Corpus using two

different methods

VSM performs better than RM in identifying the similarities of news items on a Turkish Corpus

Retrieval performance of SLD algorithms can be increased to some extent by employing both VSM and RM models

(32)

Yaşar Tonta Authors

Güven

Köse Hamid

Ahmadlouei

Story Link Detection in Turkish Corpus WI’2013

(33)

Thank you for your attention!

Story Link Detection in Turkish Corpus WI’2013

Questions?

(34)

References

Story Link Detection in Turkish Corpus WI’2013

* please refer to the publication

Referanslar

Benzer Belgeler

of these fighters to the networks and from the strategies and tactics to the return of them. However, despite the increasing interest in various aspects of foreign fighters,

Uygulanacak diseksiyon yöntemi olarak sıcak diseksiyon yöntemi seçilmeli, ameliyat sonrası gelişebilecek enfeksiyon önlenmeli, ileri yaşta ve soğuk mevsimlerde kanama

Aslında bu durum, liberal feministlerin taleplerine ticari kültürün vermiş olduğu bir cevap olarak değerlendirilmektedir (Van Zoonen, 2002, s. Bu nokta, kadının ev ile

Anahtar Sözcükler: Meniere hastalığı, rekürren vestibülopati, cerrahi tedavi, retrosigmoid retrolabirentin vestibüler nörektomi VESTIBULAR NEURECTOMY FOR PERIPHERAL VERTIGO

taraça gibi uzanan büyük bahçesinden kocaman bir Müslüman mahallesi olan Fındıklı, gem ilerle örtülmüş Boğaz, üzerine bahçeler ve köyler saçılmış Asya

[r]

dence between states, and intemational cooperation.' The "Melian Dialogue," which is one of the most frequently commented upon parts of Thucydides' History, presents the

Dünya Turizm Örgütü’nün, “eğitim, sanat, kültür, festival turları, abidelerin, sit alanlarının ve doğal güzelliklerin ziya- retleri, hac gezileri gibi özellikle faaliyet