• Sonuç bulunamadı

Information-based approach to punctuation

N/A
N/A
Protected

Academic year: 2021

Share "Information-based approach to punctuation"

Copied!
1
0
0

Yükleniyor.... (view fulltext now)

Tam metin

(1)

Bilge

Say

Dept. of Computer Engineering and Information Science Bilkent University

Bilkent, Ankara 06533, Turkey Email: say@cs.bilkent.edu.tr

Punctuation marks have a special importance in bringing out the meaning of a text. There has been recent computa- tional work concentrating on punctuation marks in Natural Language Processing (NLP) mostly following Nunberg’s pioneering work (Nunberg 1990), in which he bridged the gap between descriptive linguistic treatments of actual us- age of punctuation and prescriptive accounts, by putting down the features of a “text grammar” for the orthographic sentence. Several grammars for syntactic parsing incorpo- rating punctuation were then shown by NLP researchers to reduce parse failures and ambiguities in parsing (Briscoe

1996). Nunberg’s approach to presenting punctuation (and other formatting devices) was partially incorporated into Natural Language Generation systems. However, little has been done on how punctuation marks bring semantic and discourse-based cues to the text and whether those cues can be exploited computationally. The aim of this thesis is to analyze, in an information-based framework, the semantic and discourse aspects of punctuation, drawing computa- tional implications for NLP systems. This will not only enable NLP software writers to make use of the punctuation marks effectively but also may reveal interesting linguistic phenomena in conjunction with punctuation marks.

Discourse Representation Theory (DRT) (Kamp and Reyle 1993) is taken as the theoretical framework of the thesis because DRT is a dynamic, information-based the- ory dealing with various semantic and discourse related phenomena. In particular, Asher’s (Asher 1993) extension to DRT, viz. Segmented Discourse Representation Theory (SDRT) with constituents called Segmented Discourse Rep- resentation Structures (SDRSs), proves valuable as SDRSs provide various devices to represent discourse structure and constraints on those representations for the resolution of ab- stract anaphora. Included within the definition of SDRSs are precise definitions (in terms of logic) of discourse relations and a defeasible logic for inferencing.

So far, a preliminary study has been done to show how pieces of discourse containing sentences with punctuation can affect discourse phenomena such as anaphora resolu- tion or behave contrary to the expectations of DRT or SDRT (Say and Akman 1996). A more detailed study based on Copyright @ 1997, American Association for Artificial Intel- ligence (www.aaai.org). All rights reserved.

818 DOCTORAL CONSORTIUM

observations from several computerized English corpora is being conducted on the usage of dashes (Say and Akman

1997). Sentences with dashes tend the favor certain dis- course relations more than the others in specific ways such as their parenthetical usage. Moreover, dashed sentences have characteristic features in terms of anaphora resolution and determination of focus and information structure.

Future work will involve similar corpus-based studies of several punctuation marks (semicolon, colon and parenthe- ses) and incorporating the findings into a model for semantic and discourse-wise implications of punctuation.

Acknowledgments

This work is being carried out under the supervision of Vat-01 Akman (Bilkent University). Many thanks to Akman, Ted Briscoe (Cambridge University), AAAI, and the Scientific and Technical Research Council of Turkey (TUBiTAK) for support.

References

Asher, N. 1993. Reference to Abstract Objects in Dis-

course. Dordrecht, Netherlands: Kluwer.

Briscoe, T. 1996. The Syntax and Semantics of Punc- tuation and Its Use in Interpretation. In Punctuation in Computational Linguistics, l-8. UCSC, Santa Cruz, CA:

SIGPARSE 1996 (Post Conference Workshop of ACL96). http://www.cogsci.ed.ac.uWhcrc/publications/wp-2.html. Kamp, H., and Reyle, U. 1993. From Discourse to Logic. Dordrecht, Netherlands: Kluwer.

Nunberg, G. 1990. The Linguistics of Punctuation. Num- ber 18 in CSLI Lecture Notes. Stanford, CA: Stanford University Press.

Say, B., and Akman, V. 1996. Information-Based Aspects of Punctuation. In Punctuation in Computa- tional Linguistics, 49-56. UCSC, Santa Cruz, CA: SIG- PARSE 1996 (Post Conference Workshop of ACL96). http://www.cogsci.ed.ac.uk/hcrc/publications/wp-2.html. Say, B., and Akman, V. 1997. A Case for Punctuation within Discourse Representation. Manuscript.

Referanslar

Benzer Belgeler

Fatih Timurhan Mektebi ve Süleymaniye Medresesi'nde eğitim gören 1857 doğumlu Mehmet (Efendi), babası Haşan Efendi'nin baharat ve çiğ kahve satan küçük dükkaruna çırak

For the purpose of preventing occupational burnout, various stress management interventions have been shown to help improve employee health and wellbeing in the workplace and lower

Sülüsan mekteplerde muallimler tarafından her gün devam jurnali tutularak özürsüz üç gün mektebe devam etmeyen çocukların köylerde muhtar ve ihtiyar meclisine ve

• Doğal Dil İşleme, NLP (Natural Language Processing) olarak bilinen Yapay Zeka ve Dil Biliminin bir alt kategorisidir.. • Türkçe, İngilizce, Almanca, Fransızca gibi

To experiment with the model as detailed in section “6.3 Things to explore with the logistic equations” of Dynamic Ecology you need to change values of the

Five-year intervals were used to draw the timeline, and all the items for each slice are shown in Fig. The distribution of the categories can be evaluated in four basic clusters,

The model forecasts the time, the place, the type and the reason of the possible accidents which could happen in future time, and in order to forecast these values it analysis

Kasrı, Yeşil Ev, Fenerbahçe Parkı, Soğukçeşme Sokağı, Malta Köşkü, Cedid Mehmet Efendi Medresesi, Soğukkuyu Medresesi, Büyükada Kültür Evi, Büyükada,