• Sonuç bulunamadı

TRANSLATION OF ‘WHAT TO EXPECT FROM NEURAL MACHINE TRANSLATION’ MACHINE TRANSLATION: A PRACTICAL IN-CLASS TRANSLATION EVALUATION EXERCISE’

N/A
N/A
Protected

Academic year: 2022

Share "TRANSLATION OF ‘WHAT TO EXPECT FROM NEURAL MACHINE TRANSLATION’ MACHINE TRANSLATION: A PRACTICAL IN-CLASS TRANSLATION EVALUATION EXERCISE’"

Copied!
65
0
0

Yükleniyor.... (view fulltext now)

Tam metin

(1)

HATİCE EBRAR KUL

2020

Lisans Bitirme Tezi

İstanbul, 2020

TRANSLATION OF ‘WHAT TO EXPECT FROM NEURAL MACHINE

TRANSLATION’

MACHINE TRANSLATION:

A PRACTICAL IN-CLASS TRANSLATION

EVALUATION EXERCISE’

İSTANBUL 29 MAYIS ÜNİVERSİTESİ EDEBİYAT FAKÜLTESİ

TRANSLATION OF ‘WHAT TO EXPECT FROM NEURAL MACHINE TRANSLATION’

(2)

Lisans Tezi

TRANSLATION OF ‘WHAT TO EXPECT FROM NEURAL MACHINE TRANSLATION: A PRACTICAL IN-CLASS

TRANSLATION EVALUATION EXERCISE’

HATİCE EBRAR KUL

ÇEVİRİBİLİM BÖLÜMÜ

İstanbul 29 Mayıs Üniversitesi, İstanbul

Haziran, 2020

(3)

TRANSLATION OF ‘WHAT TO EXPECT FROM NEURAL MACHINE TRANSLATION: A PRACTICAL IN-CLASS

TRANSLATION EVALUATION EXERCISE’

Hatice Ebrar Kul

Danışman: Nilüfer Alimen

İstanbul 29 Mayıs Üniversitesi Edebiyat Fakültesi

Lisans Bitirme Tezi Yönetmeliği Uyarınca Bölüm LİSANS BİTİRME TEZİ Olarak Hazırlanmıştır

(4)

TABLE OF CONTENTS

PREFACE ... iii

1.INTRODUCTION ... 1

2.AIM OF THE PROJECT ... 3

3.PREPARATION PROCESS... 4

3.1 Text Selection ... 4

3.2 Terminology ... 4

3.3 Methodology ... 5

4.COMMENTARY ... 7

5.CONCLUDING REMARKS ... 10

REFERENCES. ... 11

6.APPENDICES ... 12

APPENDIX I ... 12

APPENDIX II ... 34

APPENDIX III ... 59

(5)

BEYAN

Bu tezin yazılmasında bilimsel ahlak kurallarına uyulduğunu, başkalarının eserlerinden yararlanılması durumunda bilimsel normlara uygun olarak atıfta bulunulduğunu, kullanılan verilerde herhangi bir tahrifat yapılmadığını, tezin herhangi bir kısmının bu üniversite veya başka bir üniversitedeki başka bir tez çalışması olarak sunulmadığını beyan ederim.

HATİCE EBRAR KUL 02.06.2020

(6)

iii

PREFACE

Before starting, I would like to express my gratitude towards Asst. Prof. Dr. Nilüfer Alimen for being my supervisor and paving the way in order to achieve a nicer outcome and help me finalizing my work meticulously.

Then I would like to mention how hard and overwhelming this process must have felt due to the outbreak and I appreciate everyone’s hard work and congratulate for completing the process without any further delay.

Lastly, I want to thank to my family and friends who supported me wholeheartedly.

(7)

1

1. INTRODUCTION

It is clear that the world is going towards many technological changes, and apparently people who are able to keep up with that pace of improving would stay unshaken.

Nowadays, the field of translation is also associated with technology more than before.

As we all know, the modern world has a highly used translation tool called machine translation, and to some translators it is something to be avoided and to other translators it is the key to nicer opportunities.

In translators’ case, there are lots of ways for improving the quality of target texts with new technologies, such as Quality Assurance Tools. And no one could underestimate these tools’ aid. That’s why I believe that staying away from the technology would only harm the translators themselves.

On one hand, if people consider themselves as the users of the newly found technologies and tools, there is no doubt that they are going to succeed. On the other hand, if they fear from the idea of being replaced by machine translation or another tool and refuse being associated with them, presumably they are going to feel like falling behind.

So, what should we do for reducing these fears and encourage them to embrace the achievements and use them for their good? The article translated for this project enables us to see the latest technology being compared to the previous one and understand the available opportunities could be used in the near future.

Since the main points of the article are statistical machine translation and neural machine translation, I would like to give a short briefing about them.

(8)

2

Statistical machine translation (SMT) is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. (Koehn 2009, 27)

Neural machine translation (NMT) is an approach to machine translation that uses an artificial neural network to predict the likelihood of a sequence of words, typically modeling entire sentences in a single integrated model. (Blunsom and Kalchbrenner 2013, 1700-1709)

The paradigm for the machine translations were rule-based and example-based models before the statistical models. And after the statistical models we see neural paradigm, but we could not actually separate the statistical paradigm with the neural paradigm. Because the neural paradigm could be considered as a hybrid model of a statistical machine translation. Naturally, it is quite understandable to view these paradigms to be hard to teach trainees and students at a discrete level. So, a hands-on experience is a reasonable and logical idea to practice.

Thanks to some Machine Translation classes I attended in the previous year, I was able to catch the main points and the ideas that were implied in the article. Being a student in the translation studies department myself, I was not a total stranger to the subject but rather informed about the situation.

(9)

3

2. AIM OF THE PROJECT

I would like to mention the reason why I have chosen this article for my dissertation.

Previously I was not really aware of the technologies and tools used in translation until I started attending some technology classes at university and learn some specialized technologies used as translators’ aid; CAT tools for instance. At first, I was thinking all these machine translations etc. would make human translators obsolete in the future, but it turned out as a big twist on my mind and later I found myself getting a significant help from these technologies. Maybe they weren’t that bad and actually would help translators to become faster and available in terms of jobs and workflows. That could be one of the biggest reasons of why I chose this article.

In the article there were some points that tries to figure out the reasons that the translators act diffident towards machine translation, but as I view this situation, it is completely relatable for today’s worries of translators. The ‘increasing technologization of the profession is not a threat, but an opportunity to expand skill sets and take on new roles’. I also believe that, instead of fearing the new technology try it would be better to understand it (O’Brien 2012, 118).

In order to understand the new technology, it is a good idea to make some comparison between the old ones and the latest ones. In the article, one can see the differences between the MT paradigms and which points they have the same outcomes.

(10)

4

3. PREPARATION PROCESS

This part is going to be about some important steps that I have followed before the translation.

3.1 Text Selection

Before selecting the article to be translated, I had learned the criteria that needed to be considered in order to make a smooth planning. After some time of collecting articles that seem related to our field, I have discussed them with my thesis advisor and selected the most suitable one among my findings which is called What to expect from Neural Machine Translation: a practical in-class translation evaluation exercise by Joss

Moorkens. Actually, this article is from a book called The Interpreter and Translator Trainer. The article has an understandable language without ambiguousness or self-

contradictive meanings etc. therefore it is fluent and readable.

After downloading the article in a PDF form, I have converted it to a Word form in order to count the characters. There are approximately 35.000 characters without space and it is 15 pages long with its cover and references.

3.2 Terminology

Since it is a text about some field specific subject, it was important to examine the terms well in order to build a coherent context.

(11)

5

I have used an online term extractor to get a full terminology list. Thanks to some knowledge that I have on translation technologies I wasn’t a total stranger to the terms and examining them was rather easy.

And this examining process also continued in the translation process either, because building an error free term list requires a little bit of re-checking all along, I think.

For example, I have needed to change the abbreviations for some terms when I realized the general usage about them. When I first started, I thought that I was going to use NMT for neural machine translation but it turned out that actually it was alright to use its Turkish usage abbreviation as NMÇ. These also mean that checking the term list only was not sufficient and I continually re-checked related words and abbreviations all together.

On the same level of importance, I think that taking track of the same words in the same context is important besides terminology. Means that in order to be able to build a consistent target text as the in the source text, it would be much more readable and fluent as a text.

3.3 Methodology

Before starting the translation process, I have made a quick research on the machine translation paradigms old and new. And I tried to recall the information from the previous year. Thanks to numerous articles an internet the detailed research on the machine translation paradigms was a success. Also finding some parallel texts about the article was rather easy thanks to the rapid awareness of today’s knowledge on translation technologies. After getting the required information about the terminology, tools,

(12)

6

technological details, I tried to comply all of these together and build a consistent context in order to put a competent translation as the source text. Because when we examine the source text, it is clear that it has a smooth transition within and between the subjects with the help of the word choice.

(13)

7

4. COMMENTARY

In this part, I will try to give some information about the translation process, and some problems that I have faced during the translation process, and how I dealt with these problems.

The article enables us to examine different cohorts trying statistical machine translation and neural machine translation to compare them as an in-class exercise. Using different language pairs was also a plus point of the project, I think. Because limiting the idea with only a small amount of selection would affect the project in a bad way. From Spanish to Chinese there were opportunities for people to see the results and outputs from statistical machine translation and neural machine translation.

For this in-class exercise, the texts that should be selected before the practice, were bound to the cohort groups, meaning that the individuals were to pick the texts that going to be put into the machine translation. So, participants chose their texts freely without any restriction other than the technical descriptions, character limit, word count etc.

The cohort groups chose their texts mainly from Wikipedia and there was a theory about this issue. Let me explain this briefly, as students put their texts into machine translation, the results would differ according to the order. If the statistical machine translation is used first, then it will possibly result in a more accurate output. And that is related with the Wikipedia’s articles’ nature such a subject starts with a plain explanation and goes onto a more complicated and detailed descriptions. And I highly agree with that using neural or statistical machine translation first would change the findings.

(14)

8

Even though the order might pose a problem, the findings were mainly prone to praise neural machine translation. And these findings are all supported in the article with examples from students’ notes and so on.

So, if I were to turn back to core issues of the translation, one of the main problems was the abbreviation problem. Since they are relatively new terms, I thought that using their English abbreviation would make it. For instance, using STM for statistical machine translation or using NMT for neural machine translation was what I have planned for at first. But then I have learned that using its Turkish form was also common now, I changed them to İMÇ (istatistiksel makine çevirisi) and NMÇ (nöral makine çevirisi). Then for TQA tools, it was really hard to decide and I have asked people who could have knowledge about this issue and do some research. And what I found was there has not been confirmed yet. Means that, usages would differ from someone to another. Generally, for TQA, ‘çeviri kalite değerlendirmesi’ instead of ‘çeviri kalite kontrolü’ etc. was being used in Turkish and it stands for ‘ÇKD’, therefore I have decided to use that form. I could use its English form directly, but while it is that known in Turkish commonly, it would be better to use its translated form. That is why I went for ‘ÇKD’. For CAT tools,

‘bilgisayar destekli çeviri araçları’ and for MT ‘makine çevirisi’. Therefore, they stand for ‘BDÇ’ and ‘MÇ’.

Even if it does not look like that, punctuation problem was really a challenge for me on some specific parts. For example, in the title of the source text the punctuation goes as: What to expect from Neural Machine Translation: a practical in-class translation evaluation exercise with just a colon mark and after the punctuation mark only short letters are used in a contrary to the first letters. First, I did not alter the original

(15)

9

form and used it as it is but then, in order to make the target text more unique in translated version, I translated it as: Nöral Makine Çevirisinden Ne Beklemeli? Sınıf İçi Uygulamalı Bir Çeviri Değerlendirme Çalışması. There were also some French phrases from the

exercise findings. After using Google Translate and resulting with some mismatches, I overcame the ‘foreign words problem’ thanks to the help of my thesis advisor. By the way, in the source text, these French phrases and sentences weren’t translated and used in their original form. At first, I thought that giving their Turkish meanings in footnotes would work, but at the end we’ve decided to give their Turkish meanings in parenthesis.

Because this kind of usage would suit a translated text better in terms of building a smooth text. For instance, “(İMÇ çıktısında bulunan) hata tiplerini düzeltmenin ise zor olmadığını ileri sürmüştür. Daha sık karşılaşılan “cinsiyet uyumunu düzeltme” [ör.: par un camarade> par une camarade, (bir yoldaş tarafından > -kadın- bir yoldaş tarafından), nöral çıktıdan] ve “kelime dizilimindeki sorunları düzenlemeyi” [ör. et (supréme) extremement confiant en soi meme (de soi-confiant), (ve (yüce) kendinden aşırı derecede emin (kendine güvenen)), nöral çıktıdan] “oldukça kesin kelimeleri ve zaman problemlerini” [ör. Les sentiments pour l’un l’autre n’est (sont) jamais explicités, (Birbirlerine karşı duyguları asla açıklığa kavuşturulmamıştır), nöral çıktıdan], bulmaktan çok daha kolay olduğunu düşünen öğrenciler çoğunluktadır, and so on.

(16)

10

5. CONCLUDING REMARKS

The birth of the neural/hybrid machine translation goes back to 2013 and now in 2020 it continues to grow. As everyone knows, there is an understanding or thought about how these technologies would replace human translators. But it was all media exaggeration I think, because it still lacks of deciding and make choices on human level and needs a human brain to make these choices. So, where is neural machine translation going towards to? Well, I believe that demonstrating a hands-on experience as in the article was a really clever idea to show what was really going on in the industry. And it would also help to answer some questions on translators’ and trainees’ minds.

The aim is mentioned as to empower the translators and trainees about using translation technologies instead of fearing them. Because it is only natural that if one does not get to know something that will eventually be something avoided in the future, but it is nearly impossible to run away from that prediction. And I believe we should also empower other translators about using the technologies that the new era offers and make use of it as much as possible, that is the way to reach an improved state of a field and the same goes for translation either. I also hope that more people read about this kinds of articles and get more information about translation technologies and see that actually the translators are the ones needed for translation besides machine translation and I am rooting for the misconception about machine translation would decrease and not effect translators and translation students.

(17)

11

REFERENCES

O’Brien, S. 2012. “Translation as Human-Computer Interaction.” Translation Spaces 1:

101– 122.

Kalchbrenner, Nal; Blunsom, Philip, 2013. "Recurrent Continuous Translation Models".

Proceedings of the Association for Computational Linguistics: 1700–1709.

Koehn, Philip. 2009. “Statistical Machine Translation.” Cambridge University Press. 27.

(18)

12

6. APPENDICES

APPENDIX I

Source Text

“What to expect from Neural Machine Translation: a practical in-class translation evaluation exercise”

ABSTRACT

Machine translation is currently undergoing a paradigm shift from statistical to neural network models. Neural machine translation (NMT) is difficult to conceptualise for translation students, especially without context. This article describes a short in-class evaluation exercise to compare statistical and neural MT, including details of student results and follow-on discussions. As part of this exercise, students carry out evaluations of two types of MT output using three translation quality assurance (TQA) metrics: adequacy, post-editing productivity, and a simple error taxonomy. In this way, the exercise introduces NMT, TQA, and post-editing. In our module, a more detailed explanation of NMT followed the evaluation. The rise of NMT has been accompanied by a good deal of media hyperbole about neural networks and machine learning, some of which has suggested that several professions, including translation, may be under threat. This evaluation exercise is intended to empower the students, and help them understand the strengths and weaknesses of this new technology. Students’ findings using several language pairs mirror those from published research, such as improved

(19)

13

fluency and word order in NMT output, with some unpredictable problems of omission and mistranslation.

Introduction

Teaching third-level students about translation technology is complicated by the dynamic technological environment, with standard practices regularly overridden by new and updated tools and technologies. Where these technologies can help translators to

‘maintain high levels of productivity and offer value-added services’ (Olohan 2007, 59), it is incumbent on translation trainers to ensure that students are made aware of their usefulness in order to maximise their agency as translators, and to fulfil industry employment needs. One particularly disruptive change in recent years has been the inception of neural machine translation (NMT). Since the early 2000s, statistical MT (SMT) systems, trained on human translations, have become commonplace. Although the first wave of NMT publications appeared relatively recently (Bahdanau, Cho, and Bengio 2014; Cho et al. 2014), NMT has quickly gained a foothold in both academia and industry by outperforming statistical systems in competitions (Bojar et al. 2016) and in well- publicised research and deployment (Wu et al. 2016). NMT is also a statistical paradigm, with systems trained on human translations. Nonetheless, claims that NMT is, according to the title of a 2016 Google research paper, ‘bridging the gap between human and machine translation’ (Wu et al. 2016), or that Microsoft have achieved ‘human parity on automatic Chinese to English news translation’ (Hassan et al. 2018), have led to a great deal of media hyperbole about the potential uses of NMT and related displacement of human translators (Castilho et al. 2017a). In conjunction with widespread technological determinism in media reports about machine learning, this may lead translators to fear

(20)

14

NMT, and engender a sense of powerlessness due to a perception that the technology

‘gets precedence and is inevitable’ (Cadwell, O’Brien, and Teixeira 2017, 17). O’Brien, however, suggests that the ‘increasing technologisation of the profession is not a threat, but an opportunity to expand skill sets and take on new roles’ (2012, 118). The points of potential intervention in the SMT preparation, training, and post-editing processes that may benefit from the skills of the translator, as highlighted by Kenny and Doherty (2014), still hold true for NMT. With this point in mind, I contend that helping students to learn about new technologies, including NMT, is a positive and empowering intervention. MT tends to be perceived negatively by many translators due to low quality expectations, imposition of the technology without any choice to opt out, and fear of being replaced (Cadwell, O’Brien, and Teixeira 2017). It was originally envisaged that MT would replace human translators (Hutchins 1986), and although that intention may not still hold true (Way 2018), the perception remains. As MT has moved towards statistical methods, the MT process has become more difficult to explain. NMT can be particularly difficult for students and scholars to conceptualise, not least because neural networks are complex and NMT output can be unpredictable (Arthur, Neubig, and Nakamura 2016).1 This makes it all the more important for translation students to familiarise themselves with and demystify NMT output, and to become aware that, despite the hype about machine learning, that NMT output has many weaknesses as well as strengths. This article reports a practical in-class translation evaluation exercise to comparatively evaluate statistical and neural MT output for one language pair. It was set following a large-scale comparative evaluation carried out as part of the TraMOOC EU-funded project (Castilho et al. 2017b), and scaled down so that students could complete the evaluations within two hours. The exercise was designed based on the constructivist paradigm so that students

(21)

15

could independently learn about – and build an ‘evaluative awareness’ (Massey 2017) of– a cutting-edge translation technology, while also gaining experience at using several translation quality assessment (TQA) techniques. For most students, this was also a first experience of MT post-editing. The exercise was carried out with second year undergraduate students of translation and repeated with a cohort of Translation Studies PhD students in another university. Students’ reports showed many insights into what may be expected of each technology. A lecture on MT, incorporating NMT and published NMT evaluations from research and from industry, followed one week after the in-class comparative task for the undergraduate cohort, providing context for their findings, and explaining (at the level of concepts rather than mathematical equations) the processes of creating and training an NMT system that lead to the type of output that they had evaluated. Their findings meant that they had personal experience of the systems discussed, and also mirrored the findings of the large-scale, randomised TraMOOC evaluation across four language pairs, along with a number of other recently published comparative evaluations. The following section details the technological and industrial context of the emergence of NMT, including summarised results from previously published evaluations, and the potential employability benefits of learning about NMT.

Thereafter, the evaluation task is described, with results from both cohorts summarised, followed by some inclass feedback and discussion points that were raised by the exercise.

Technological and industrial context Although the application of neural networks to speech recognition became commonplace some years before, the first published NMT papers appeared in 2014.2 Researchers began to create single neural networks, tuned to maximise the ‘probability of a correct translation given a source sentence’ (Bahdanau, Cho, and Bengio 2014, 1) based on contextual information from the source text and

(22)

16

previously produced target text. New techniques were developed to cope with variable sentence lengths and to help with translation of unknown source words by breaking words into subword particles, (Sennrich, Haddow, and Birch 2016), improving the quality of NMT output. When NMT systems were entered into competitive MT environments in 2016, they scored above SMT for many language pairs (English-German and vice versa, English-Czech, English-Russian; see Bojar et al. 2016) despite the comparatively few years of NMT development, leading to great anticipation within the MT research community of a leap forward in quality.

Comparative evaluation and the state of the art for machine translation Several papers subsequently appeared, using automatic and human evaluation methods to compare NMT with SMT. All highlight the low number of word order errors found in NMT output and associated improved scores for fluency. Bentivogli et al. (2016, 265) found that NMT had

‘significantly pushed ahead the state of the art’. In their evaluation, technical post-editing effort (in terms of the number of edits) for English to German was reduced on average by 26% when using NMT rather than the bestperforming SMT system, with fewer overall errors, and notably fewer word order and verb placement errors. Wu et al. (2016) used automatic evaluation and human ranking of 500 Wikipedia segments that had been machine-translated from English into Spanish, French, Simplified Chinese, and vice- versa, concluding that NMT strongly outperformed other approaches, improving translation quality for morphologically rich languages. The publicity surrounding this publication and the subsequent move to NMT by Google Translate for these language pairs led to a spike in NMT hype (Castilho et al. 2017a). Several further language pairs have now moved to Google NMT, prompting Burchardt et al. (2017, 169) to note a

‘striking improvement’ in English-German translation quality. A detailed evaluation by

(23)

17

Popović (2017) of SMT and NMT output for EnglishGerman and vice-versa found that NMT produced fewer overall errors, and that NMT output contained improved verb order and verb forms, with fewer verbal omissions. NMT also performed strongly regarding morphology and word order, particularly on articles, phrase structure, English noun collocations, and German compound words. She reported, however, that NMT was less successful in translating prepositions, ambiguous English words, and continuous English verbs. Castilho et al. (2017a) found inconsistent evaluation results for adequacy and postediting effort, and comparatively higher numbers of identified errors of omission, addition and mistranslation in NMT output in several language pairs and three domains.

They conclude that, although NMT represents a significant improvement ‘for some language pairs and specific domains, there is still much room for research and improvement’ (110). They caution that ‘overselling a technology that is still in need of more research may cause negativity about MT’, and more specifically, a ‘wave of discontent and suspicion among translators’ (118).

Employability considerations

The outlook for employment in translation in the short-to-medium term looks to be relatively positive. The U. S. Bureau of Labor Statistics expect an 18% increase in jobs for translators and interpreters in the U.S.A. between 2016 and 2026, equating to 12,100 new positions. The demand for translators is predicted to vary depending on speciality or language pair, and opportunities ‘should be plentiful for interpreters and translators specializing in healthcare and law, because of the critical need for all parties to fully understand the information communicated in those fields’ (Bureau of Labor Statistics 2018). The translation industry as a whole continues to report growth, with an increasing

(24)

18

proportion of turnover coming from post-editing of MT (Lommel and DePalma 2016).

The likelihood is therefore that many graduate translators will have to work with MT output. Post-editors of NMT in Castilho et al. (2017b, 11) ‘found NMT errors more difficult to identify’, whereas ‘word order errors and disfluencies requiring revision were detected faster’ in SMT output. Familiarity with NMT output, especially considering the increasing importance of speed in the translator workplace (see Bowker and McBride 2017), should improve the efficiency of NMT error identification. At present, the move is underway for MT providers to entirely neural or neural hybrid MT systems. As mentioned, Google Translate has been increasingly adopting neural methods, Microsoft have released online NMT engines, Kantan MT has created an NMT product called NeuralFleet, the ModernMT project has moved to NMT,3 despite being led by one of the creators of the popular Moses SMT system (Koehn et al. 2007). This all suggests that professional linguists will soon find themselves in workflows that incorporate NMT to a greater or lesser extent. Learning about NMT should empower the translator ‘as an agent who is very much present throughout’ such a workflow, rather than taking a ‘limited or reductive role’ (Kenny and Doherty 2014, 290). Gaspari, Almaghout, and Doherty (2015) identified MT, TQA, and post-editing as underrepresented skills in translator training programmes generally. The exercise described in the following section incorporates each of these skills, and is also intended to contribute towards the translator’s technological competence (EMT Network 2017) and instrumental competence (Hurtado Albir 2007).

More specifically, student participants gained TQA experience using three metrics. The first employs the construct of adequacy, a functional measure of equivalence between source and target text, commonly used (along with fluency, both measured using a Likert- type scale) for human evaluations of MT quality. The second metric employs error

(25)

19

annotation using a simple typology of errors, common among research and industry models but littleused in academic scenarios. The third is post-editing effort, using one of the three categories of effort introduced by Krings (2001): temporal, technical, and cognitive effort. Temporal effort, or time spent post-editing, is commonly used, often to highlight the benefit of MT as a translation aid despite a lack of enthusiasm from translator users (Plitt and Masselot 2010; Guerberof 2012).

Evaluation task

This evaluation task was set for a cohort of 46 second-year translation undergraduate students at Dublin City University as part of a Computer-Aided Translation module with a two-hour time limit. These students had a small amount of translation experience in a classroom environment and no experience of post-editing. It contributed to 20% of their marks for the module, with the remainder awarded on the basis of results of a Translation Memory project. Following this module, students should be able to demonstrate awareness of appropriate tools that assist in the translation process, and awareness of the historical development of CAT tools and their importance in modernday translation practice. They should be able to apply translation memory and terminology management tools at a basic level, and to explain the interaction between translation memory and terminology management tools. Prior to this evaluation, students had learned about TQA and the history of MT, but had not yet learned about SMT and NMT in any detail. All students had used online MT without any knowledge of the MT paradigm employed.

Their attitudes to MT in the classroom were broadly positive, but were not measured using any tests or questionnaires. The task was repeated with a cohort of 9 participants from 14 attendees at a PhD Summer School, where postgraduate students and staff from

(26)

20

another university performed the evaluation in two one-hour blocks. The rationale for this was to see whether the exercise would be useful for a different group with a background in translation studies and more translation experience. The first cohort used the Microsoft Translator MT Try & Compare website,5 and the second cohort were given the option of using this same site or Google Translate for NMT and Google Sheets for SMT, as some of the participants’ language pairs were not included in the Microsoft site.

Task description

Students were given the following preamble:

The most popular Machine Translation (MT) paradigm over the past 15 years or so has been Statistical Machine Translation (SMT). More recently, there have been claims that Neural Machine Translation (NMT), a statistical paradigm that carries out more operations simultaneously, can produce improved output

You will compare SMT and NMT using three evaluation methods Post-editing effort

Adequacy Error typology

They were provided with a list of available languages for the exercise based on those available on the Microsoft. Within the allotted time, students were asked to choose a page from Wikipedia, copy 20 segments in one of the languages supported by the MT systems (again, their choice), aiming for two groups of 10 with similar sentence length. While Wikipedia articles could not be considered ‘appropriate, authentic texts’ (Kenny 2007, 204) for a translation workflow, they were suggested for a time-limited task to avoid time-

(27)

21

consuming terminology searches. Students could translate material on a topic that was familiar and interesting to them. The students had 10 segments translated using NMT and 10 using SMT, and copied the output to their documents making sure to clearly differentiate the segments produced by SMT and NMT systems. The students then carried out three evaluations based on the following instructions:

(1) Post-editing effort (time spent, measured using the computer clock, noting Total Editing Time in Word before and after evaluation, or ideally phone stopwatch). This is known as Temporal Post-Editing Effort. (2) Adequacy: How much of the meaning expressed in the source fragment appears in the translation fragment? (1) None of it (2) Little of it (3) Most of it (4) All of it (3) Error typology: Mark word order errors, mistranslations, omissions, and additions: ● Word order errors: incorrect word order at phrase or word level ● Mistranslations: incorrectly translated word, wrong gender, number, or case ● Omissions: word(s) from the ST has been omitted from the TT ● Additions: word(s) not in the ST have been added in the TT ● Tip: Use the highlighter tool and count the number of occurrences for each error.

Students were asked to work out their average post-editing time per segment for each MT paradigm over the ten segments, the average segment adequacy score for each paradigm, and the frequency of each error type by dividing the total number of occurrences by the number of segments (presumably 10).

Deliverables and marking criteria

Students were asked to submit a short report (up to 500 words), in which they were to state a preference of one MT system or the other for their language pair and to explaining their reasons, using examples from the exercise. They were also asked to suggest

(28)

22

scenarios in which these MT systems might be useful. The marking criteria provided for the exercise were as follows:

● Credible results for each evaluation: 5 ● Reasoned preference for an MT paradigm: 5

● Examples provided: 5 ● Appropriate scenario for MT system(s): 3 ● Originality and critical input: 2

The second cohort (PhD students) were not graded for the exercise and findings were presented orally, as the four-day summer school format was time-limited and no grades or credits were awarded.

Students’ results

The study by Castilho et al. (2017b) on which this evaluation task was loosely based found (for English to Portuguese, Greek, German, and Russian, in the educational domain) that fluency is improved and word order errors are fewer using NMT when compared with SMT. NMT produces fewer morphological errors and fewer segments that require any editing. The authors found no clear improvement for omission or mistranslation when using NMT, nor did they find any improvement in post-editing throughput. Students’ results, presented here with the caveat that this was not a controlled study, but rather an exercise for them to analyse MT output, followed the same lines as the published research study using professional translator participants, despite the different language pairs and domains. 93% of each cohort stated a preference for NMT.

Language pairs chosen by cohort one were English to French (8 students), French to English (19 students), German to English (13 students), Spanish to English (4 students), and English to Spanish (2 students). Cohort 2 chose Chinese to Spanish, Arabic to English, and English to Spanish (2), Chinese (2), Turkish, and Russian. In cohort one,

(29)

23

62% of undergraduate students spent less time post-editing the NMT output. Average rating for adequacy (the extent to which the meaning expressed in the source fragment appears in the translation fragment) for SMT was 2.95 and for NMT 3.46 (where 3 = most of it and 4 = all of it). Overall errors were fewer for NMT (10.60 as opposed to 18.79), and students found fewer word order errors (NMT: 1.75, SMT 3.91), fewer omission errors (NMT: 1.44, SMT 2.86), fewer addition errors (NMT: 1.02, SMT 1.66), and fewer mistranslations in NMT (NMT: 6.78, SMT 10.01). The results for cohort two were similar, with mean NMT adequacy rated at 2.9 and mean SMT adequacy rated at 2.0.

Further evaluation results for students in this cohort are in Table 1. The student evaluating Arabic-English output also rated errors from 1 (low) to 4 (high) for gravity, with NMT errors averaging at 1.8 and SMT errors averaging 3.7 The Chinese to Spanish translations were most likely translated in two stages (Chinese to English, English to Spanish) using English as a pivot language, due to the lack of availability of bilingual texts to use as MT training data in the Chinese-Spanish language pair. Although students mostly preferred the NMT output, it became clear to them that errors could still be expected using the neural paradigm. One student wrote that the ‘Neural Translation tool proved more effective, however of course it was not perfect and contained some errors which I spent 20 minutes in post editing time correcting.’ Another student found mistranslations via both MT engines,for example when‘the word “wetting” was used instead of “humidity”

in the neural machine translation.’‘Some words in both [paradigms] were also made plural when there was no reason to do so.’ In general, students were surprised by the high quality of NMT output: ‘I found NMT to produce surprisingly good results in the case of long sentences, consisting of multiple clauses (which are known to generally cause many problems during machine translation).’ One student suggested that NMT output was of

(30)

24

higher quality ‘due to the multiple operations it is capable of performing simultaneously.’

One of the students believed that German-English NMT was of a quality sufficient to make monolingual post-editing feasible ‘by someone with no knowledge of German at all, removing the need for a German-speaking post-editor.’ On the other hand, a student with a preference for SMT found NMT errors difficult to identify, whereas the ‘types of errors [found in SMT output] were not difficult to correct’. More common was the student who found repairs such as ‘correcting gender agreement (e.g.: par un camarade > par une camarade, from Neural [output]) and fixing flaws in word order (e.g. et (suprême) extrêmement confiant en soi même (de soi-confiant.) (w.o), from Neural [output])’ to be far easier than finding ‘extremely precise vocabulary and fixing tense issues (e.g. les sentiments pour l’un l’autre n’est (sont) jamais explicités, from Statistical [output]).’ One PhD student with a preference for NMT in English-Russian also wrote that it is ‘easier to spot mistakes in SMT, but easier to correct errors in NMT (they are fewer, but harder to spot at times)’. The average mark (out of 20) for the undergraduate cohort was 15.3 or 77%, showing that they engaged well with the exercise and crafted reports that demonstrated an excellent understanding of the strengths and weaknesses of NMT output for their chosen language pair. The average mark for the module overall was 62%, so their efforts in the comparative evaluation served to boost their mark overall, particularly for some of the weaker students.

Table 1. Cohort two findings for adequacy and error annotation.

Language pair

Mean ade-

quacy NMT errors SMT errors

AR-EN NMT 3.2 Stylistic, ‘awkward’ phrasing Compound errors, ‘gibberish’

SMT 2.1

EN-IT NMT 2.7 All mistranslations (9) Mostly mistranslations (25) SMT 2.1

EN-ES NMT 3.1 Mostly mistranslation (3) and omission (2) Mistranslation (9) and word order (3) SMT 1.9

EN-RU NMT 3.1 8 mistranslations, 2 omissions (elaborate, Mistranslations (11), word order (2), omission

(31)

25

SMT 1.7 not easy to detect) (2), addition (3)

ZH-ES NMT 2.4 Mistranslation (8), word order (3), Mistranslation (10), word order (8), omission (10) SMT 2.3 omission (7)

Feedback and discussion points

In discussions following on from this comparative evaluation, students expressed surprise at the high quality and fluency of NMT, particularly for morphologically complex languages that have proved troublesome for MT systems such as Arabic, Russian, and even German. They also noted, with some relief, problems of omission and mistranslation in NMT output. They did not consider NMT to be a threat to translators as of yet, but were concerned that improvements in MT quality over time may make it an attractive option in some scenarios such as for news translation or software documentation. Most of the students were nonetheless positive about the technology, and would be interested in working with NMT in future. They enjoyed post-editing MT output, and found it easier than translating from scratch. Two students noted that NMT had produced neologisms in their target languages (Spanish and Turkish), which initially looked like comprehensible compound words, yet they could not be found in dictionaries. This may be due to the process mentioned previously of training using words broken down into smaller chunks or subword units (Sennrich, Haddow, and Birch 2016), so as to better translate words that are rare or do not appear in the MT training data. In the follow-up lecture and discussion, students engaged with this complex topic, asking questions related to – and building on – their own experience of working with NMT output. Students discussed scenarios where use of NMT may be appropriate – for perishable texts, as a springboard for ideas during rush-jobs – and those where NMT would currently be highly inappropriate, such as where transcreation is required, for high risk and literary texts, for translation jobs that involve

(32)

26

regulatory compliance, for example. As suggested by Massey (2017), the areas of ethics and risk proved to be a useful starting point for further discussion: at present, machine learning copies human activities with an increasing level of intelligence but no consciousness, and as such cannot independently consider ethics or evaluate risk.

Limitations of this evaluation

As mentioned previously, this was not a controlled experiment, but rather an in-class exercise carried out as part of a standard computer-aided translation module. The purpose was not to publish a detailed comparative evaluation of MT paradigms. As such, there is little point in detailed data analysis. No information was requested from participants, nor any information about their language ability level. The undergraduate cohort had no experience of research design, and had not considered the importance of the order in which they had completed the task and effects of limited post-editing effect. When asked, most said that they had completed the SMT evaluation first, which may have caused their post-editing speed to be slower for that paradigm. Conversely, the task order may also have put the NMT evaluation at a disadvantage, as Wikipedia articles tend to begin simply, in the general domain, before becoming more complex and describing a topic in more detail, using domain-specific language. Also, even though students were asked to aim for two groups of ten sentences of similar length, several admitted afterwards that they had not taken this instruction into account in their evaluations.

Concluding remarks

Previous work has shown the benefits of hands-on experience working with MT in improving the levels of confidence and self-efficacy among translation students (Doherty and Kenny 2014). As NMT is a relatively new technology, and the technical barriers for

(33)

27

building and training NMT systems remains high, this comparative evaluation task was developed as a way for students to understand the level of translation quality that could be expected from current standard (SMT) and incoming (NMT) state-of-the-art automatic translation systems, using evaluation metrics that are standard in research and industry.

As NMT use becomes more commonplace, with availability in a wider range of languages, the opportunity opens to incorporate this technology into other translation modules, as suggested by Mellinger (2017). The two cohorts who took part in this evaluation gained experience at translation quality assessment using the construct of adequacy, a functional measure of equivalence between source and target text, and error annotation using a simple typology of errors. The students also experienced the task of post-editing, and were introduced to the concept of post-editing effort, using one of the three measures of effort, as developed by Krings (2001). Finally, the students gained hands-on experience of a cutting-edge MT paradigm that is only beginning to impinge on the profession of translation, but is highly likely to be disruptive in the coming years.

Rather than fearing the incoming technology, amidst widespread technological determinism, we hope that they will instead be empowered to discuss its relative merits and drawbacks from their own – albeit limited – personal experience.

Notes

1. See Forcada (2017) for an accessible introduction to the technology behind NMT.

2. Forcada and Ñeco (1997) had, in fact, suggested a method that was effectively a precursor to NMT some years before.

3. See https://github.com/ModernMT/MMT.

(34)

28

4. See Doherty et al. (2018) and Lommel (2018) for a further discussion of these.

5. From August 2017, the site at https://translator.microsoft.com/neural/ allowed users to compare NMT and SMT output. This changed in March 2018 so that users can test the research systems described in Hassan et al. (2018). At the time of writing (May 2018), users can still access Google SMT via Google Sheets. Due to these being free online tools, the MT systems involved are liable to change without warning.

6. Arabic, Chinese (simplified), English, French, German, Italian, Japanese, Korean, Portuguese, Russian, and Spanish.

7. Most Wikipedia topics chosen by the students were geographical locations, with less obvious choices including Russian Blue cats, Women in Nazi Germany, and Korean pop group Girls’ Generation.

Disclosure statement

No potential conflict of interest was reported by the author.

Funding

This research was supported by the ADAPT Centre for Digital Content Technology which is funded under the SFI Research Centres Programme (Grant13/RC/2106) and is co-funded under the European Regional Development Fund.

(35)

29 References of the Article

Arthur, P., G. Neubig, and S. Nakamura. 2016. “Incorporating Discrete Translation Lexicons into Neural Machine Translation.” In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 1557–1567.

Bahdanau, D., K. Cho, and Y. Bengio. 2014. “Neural Machine Translation by Jointly Learning to Align and Translate.” Computing Research Repository, abs/1409.0473.

Austin, TX: Association for Computational Linguistics. https://arxiv.org/abs/1409.0473.

Bentivogli, L., A. Bisazza, M. Cettolo, and M. Federico. 2016. “Neural versus Phrase- Based Machine Translation Quality: A Case Study.” In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP 2016), 257–267.

http://arxiv.org/abs/1608. 04631.

Bojar, O., R. Chatterjee, C. Federmann, Y. Graham, B. Haddow, M. Huck, A. Jimeno Yepes, et al. 2016. “Findings of the 2016 Conference on Machine Translation.” In Proceedings of the First Conference on Machine Translation, 131–198. Association for Computational Linguistics. Bowker, L., and C. McBride. 2017. “Précis-Writing as a Form of Speed Training for Translation Students.” The Interpreter and Translator Trainer 11 (4): 259–279. doi:10.1080/ 1750399X.2017.1359758.

Burchardt, A., V. Macketanz, J. Dehdari, G. Heigold, J.-T. Peter, and P. Williams. 2017.

“A Linguistic Evaluation of Rule-Based, Phrase-Based, and Neural MT Engines.” The Prague Bulletin of Mathematical Linguistics 108: 159–170. doi:10.1515/pralin-2017- 0017. Bureau of Labor Statistics. 2018. Occupational Outlook

(36)

30

Handbook.https://www.bls.gov/ooh/media-and-communication/interpreters-and- translators.htm.

Cadwell, P., S. O’Brien, and C. S. C. Teixeira. 2017. “Resistance and Accommodation:

Factors for the (Non-) Adoption of Machine Translation among Professional Translators.” Perspectives 26 (3): 301–321. doi:10.1080/0907676X.2017.1337210.

Castilho, S., J. Moorkens, F. Gaspari, I. Calixto, J. Tinsley, and A. Way. 2017a. “Is Neural Machine Translation the New State-of-The-Art?” The Prague Bulletin of Mathematical Linguistics 108: 109–120. doi:10.1515/pralin-2017-0013.

Castilho, S., J. Moorkens, F. Gaspari, R. Sennrich, V. Sosoni, P. Georgakopoulou, P.

Lohar, A. Way, A. Valerio Miceli Barone, and M. Gialama. 2017b. “A Comparative Quality Evaluation of PBSMT and NMT Using Professional Translators.” In Proceedings of MT Summit 2017. Nagoya: Asia-Pacific Association for Machine Translation.

Cho, K., B. van Merrienboer, D. Bahdanau, and Y. Bengio. 2014. “On the Properties of Neural Machine Translation: Encoder-Decoder Approaches.” In Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, 103–111, Doha, Qatar. Computing Research Repository, abs/1409.1259.

Doherty, S., F. Gaspari, J. Moorkens, and S. Castilho. 2018. “On Education and Training in Translation Quality Assessment.” In Translation Quality Assessment: From Principles to Practice, edited by J. Moorkens, S. Castilho, F. Gaspari, and S. Doherty, 95–106.

Berlin: Springer. doi:10.1007/978-3-319-91241-7_5.

Doherty, S., and D. Kenny. 2014. “The Design and Evaluation of a Statistical Machine Translation Syllabus for Translation Students.” The Interpreter and Translator Trainer 8

(37)

31

(2): 295–315. doi:10.1080/1750399X.2014.937571. EMT Network. 2017. European Master’s in Translation Competence Framework 2017. https://ec.

europa.eu/info/sites/info/files/emt_competence_fwk_2017_en_web.pdf

Forcada, M., and R. P. Ñeco. 1997. “Recursive Hetero-Associative Memories for Translation.” In Biological and Artificial Computation: From Neuroscience to Technology, edited by J. Mira, R. Moreno-Díaz, and J. Cabestany, 453–462. Berlin:

Springer. Forcada, M. 2017. “Making Sense of Neural Machine Translation.” Translation Spaces 6 (2): 291– 309. doi:10.1075/ts.6.2.06for.

Gaspari, F., H. Almaghout, and S. Doherty. 2015. “A Survey of Machine Translation Competences: Insights for Translation Technology Educators and Practitioners.”

Perspectives 23 (3): 333–358. doi:10.1080/0907676X.2014.979842.

Guerberof, A. 2012. “Productivity and Quality in the Post-Editing of Outputs from Translation Memories and Machine Translation.” PhD diss.Universitat Rovira i Virgili.

Hassan, H., A. Aue, C. Chen, V. Chowdhary, J. Clark, C. Federmann, X. Huang, et al.

2018. “Achieving Human Parity on Automatic Chinese to English News Translation.”

Computing Research Repository arXiv:1803.05567v1. https://arxiv.org/abs/1803.05567.

Hurtado Albir, A. 2007. “Competence-Based Curriculum Design for Training Translators.” The Interpreter and Translator Trainer 1 (2): 63 –195. doi:10.1080/

1750399X.2007.10798757.

Hutchins, W. J. 1986. Machine Translation: Past, Present, Future. Chichester: Ellis Horwood. Kenny, D. 2007. “Translation Memories and Parallel Corpora: Challenges for the Translation Trainer.” In Across Boundaries: International Perspectives on

(38)

32

Translation, edited by D. Kenny and K. Ryou, 192–208. Newcastle-upon-Tyne:

Cambridge Scholars Publishing.

Kenny, D., and S. Doherty. 2014. “Statistical Machine Translation in the Translation Curriculum: Overcoming Obstacles and Empowering Translators.” The Interpreter and Translator Trainer 8 (2): 276–294. doi:10.1080/1750399X.2014.936112.

Koehn, P., H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, et al. 2007. “Moses: Open Source Toolkit for Statistical Machine Translation”.

InProceedings of Annual Meeting of the Association for Computational Linguistics (ACL). Prague: Association for Computational Linguistics. Krings, H. P. 2001. Repairing Texts. Kent, OH: Kent State University Press. Lommel, A. 2018. “The Multidimensional Quality Metrics and Dynamic Quality Framework.” In Translation Quality Assessment:

From Principles to Practice, edited by J. Moorkens, S. Castilho, F. Gaspari, and S.

Doherty, 109–127. Berlin: Springer. doi:10.1007/978-3-319-91241-7_6.

Lommel, A., and D. A. DePalma. 2016. “Post-Editing Goes Mainstream.” Common Sense Advisory Report. Boston, MA: Common Sense Advisory. Massey, G. 2017.

“Machine Learning: Implications for Translator Education.” In Proceedings of CIUTI Forum 2017: Short- and long-term impact of artificial intelligence on language professions. doi:10.1515/les-2017-0021.

Mellinger, C. D. 2017. “Translators and Machine Translation: Knowledge and Skills Gaps in Translator Pedagogy.” The Interpreter and Translator Trainer 11 (4): 280–293.

doi:10.1080/ 1750399X.2017.1359760.

(39)

33

O’Brien, S. 2012. “Translation as Human-Computer Interaction.” Translation Spaces 1 (1): 101– 122. doi:10.1075/ts.1.05obr. Olohan, M. 2007. “Economic Trends and Developments in the Translation Industry.” The Interpreter and Translator Trainer 1 (1):

37–63. doi:10.1080/1750399X.2007.10798749.

Plitt, M., and F. Masselot. 2010. “A Productivity Test of Statistical Machine Translation PostEditing in A Typical Localisation Context.” The Prague Bulletin of Mathematical Linguistics 93: 7–16. doi:10.2478/v10108-010-0010-x.

Popović, M.2017. “Comparing Language Related Issues for NMT and PBMT between German and English.” The Prague Bulletin of Mathematical Linguistics 108: 209–220.

doi:10.1515/ pralin-2017-0021.

Sennrich, R., B. Haddow, and A. Birch. 2016. “Neural Machine Translation by Jointly Learning to Align and Translate.” In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 1715–1725.Berlin: Association for Computational Linguistics. Way, A. 2018. “Traditional and Emerging Use-Cases for Machine Translation.” In Translation Quality Assessment: From Principles to Practice, edited by J. Moorkens, S. Castilho, F. Gaspari, and S. Doherty, 159–178. Berlin: Springer.

doi:10.1007/978-3-319-91241-7_8.

Wu, Y., M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, and M. Krikun, et al. 2016. “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation.” Computing Research Repository arXiv:1609.08144.

https://arxiv.org/ abs/1609.08144.

(40)

34

APPENDIX II

Target Text

Nöral Makine Çevirisinden Ne Beklemeli? Sınıf İçi Uygulamalı Bir Çeviri Değerlendirme Çalışması

ÖZET

Makine çevirisi son zamanlarda istatistiksel modellerden nöral ağ modellerine doğru yönlendirilmektedir. Çeviri öğrencileri için nöral makine çevirisini (NMÇ) kavramsallaştırmak, hele de bunu herhangi bir bağlama oturtmadan yapmak zordur. Bu makalede, istatistiksel ve nöral MÇ karşılaştırması yapmak adına, öğrenci sonuçlarının detayları ve devamındaki tartışmaları da içeren kısa bir sınıf içi değerlendirme çalışması betimlenmektedir. Bu çalışmanın bir parçası olarak öğrenciler, üç çeviri kalite kontrol (ÇKD) ölçütü kullanarak iki tür MÇ çıktısının değerlendirmesini yapmıştır, bu ölçütler yeterlilik, post-edit üretkenliği ve temel bir hata sınıflandırmasıdır. Bu şekilde, alıştırma NMÇ'yi, ÇKD'yi ve post-edit'i tanıtmaktadır. Bölümümüzde NMÇ'nin daha detaylı bir açıklaması değerlendirmenin hemen ardından sunulmaktadır.

NMÇ'nin doğuşu, nöral ağlar ve makine öğrenimi hakkında epey bir medya abartmasını da beraberinde getirdi. Bunlardan bazılarına göre, belki de çevirmenlik mesleği dahil olmak üzere pek çok meslek tehdit altında olabilirdi. Bu değerlendirme egzersizi öğrencileri yüreklendirmeyi ve bu yeni teknolojinin güçlü ve zayıf yönlerini anlamalarına yardımcı olmayı amaçlamaktadır. Öğrencilerin çeşitli dil çiftlerini kullanarak elde ettiği bulgular, NMÇ çıktısında gelişmiş akıcılık

(41)

35

ve kelime sırası gibi yayınlanmış araştırmalardan elde edilen bulgulara, bazı öngörülemeyen yok sayma ve yanlış çeviri sorunlarıyla ayna tutmaktadır.

Giriş

Standart uygulamaların yerini sürekli olarak yeni ve güncel araç ve teknolojilerin almasıyla oluşan dinamik ve teknolojik çevrede, yükseköğretim öğrencilerine çeviri teknolojilerini öğretmek hayli karmaşık bir hal almıştır. Bu teknolojiler, çevirmenlere üst düzey üretkenliğin sürdürülmesinde yardımcı olabilmekte ve katma değerli hizmetler sunmaktadır (Olohan 2007, 59). Çeviri eğitmenlerinin üstüne düşense öğrencilerin, çevirmen olarak kendi failliklerini çoğaltmak ve endüstrideki istihdam ihtiyacının karşılanması adına ne denli faydalı olabileceklerinin farkına varmalarını sağlamaktır.

Nöral makine çevirisinin (NMÇ) başlaması son zamanlarda ezber bozan bir dönüşümle olmuştur.

2000'lerin başından beri, insan çevirisi üzerine geliştirilmiş istatistiksel MÇ sistemleri (İMÇ) sıkça kullanılır hale gelmiştir. NMT'nin ilk tanıtımlarının ardından hayli az bir zaman geçmesine rağmen (Bahdanau, Cho ve Bengio 2014; Cho ve ark. 2014) istatistiksel sistemleri, müsabakalarda ve incelemelerde geride bırakarak ve daha çok yayılarak (Wu ve ark. 2016) hem akademik çevrede hem de endüstride sağlam bir yer edinmiştir. NMÇ insan çevirisi üzerine geliştirilmiş sistemleriyle, aynı zamanda istatistiksel durumun bir örneğidir. Bununla beraber 2016 tarihli bir Google araştırma makalesinin iddiasına göre, NMÇ, 'insan ve makine çevirisi arasında köprü kuruyor” (Wu ve ark.2016), ya da başka bir iddiaya göre “Microsoft, otomatik Çince-İngilizce haber çevirilerinde insan seviyesine ulaştı” (Hassan ve ark. 2018). Bunun sonucunda NMÇ'nin potansiyel kullanımı ve buna bağlı olarak insan çevirmenlerin yerinin alınması hakkında

(42)

36

birçok medya abartması ortaya çıkmıştır (Castilho ve ark. 2017a). Makine öğrenimiyle ilgili medya raporlarında yer alan teknolojik gerekliliğin geniş kitlelere yayılmasıyla birlikte, çevirmenler NMÇ'den korkabilir ve çevirmenlerde teknolojinin “üstünlük kazandığı ve bunun kaçınılmaz olduğu” (Cadwell, O'Brien ve Teixeira 2017,17) algısı nedeniyle bir güçsüzlük hissi meydana gelebilir. Ancak O'Brien'a göre “bu mesleğin giderek teknolojikleşmesi bir tehdit değil aksine beceri skalasını genişletmek ve yeni roller üstlenebilmek için bir fırsattır (2012, 118). Kenny ve Koherty’nin (2014) İMÇ'deki hazırlık, eğitim ve post-edit süreçlerindeki olası müdahale noktalarında çevirmenin yeteneklerinden faydalanılabileceği yönündeki vurgusu, NMÇ için de geçerlidir. Bu düşünceyle, öğrencilere NMÇ dahil yeni teknolojileri öğrenme konusunda yardımcı olmanın pozitif ve yüreklendirici bir girişim olduğu kanısındayım.

MÇ, pek çok çevirmen tarafından düşük kalite beklentisi, dışında kalma seçeneği olmaksızın bu teknolojiye maruz kalma ve yerlerini kaptırma korkusu gibi nedenlerden ötürü negatif bir şekilde algılanmıştır (Cadwell, O'Brien ve Teixeira 2017). Başlangıçta MÇ'nin insan çevirmenlerin yerini alacağı öngörülmekteydi (Hutchins 1986); ve her ne kadar bu düşünce geçerliliğini yitiriyor gibi olsa da (Way 2018), aynı algı varlığını sürdürmektedir. MÇ, istatistiksel yöntemlere doğru ilerledikçe MÇ sürecini açıklamak da zorlaştı. NMÇ'nin özellikle öğrenciler ve akademisyenler için kavramsallaştırması kolay değil, hem de hiç kolay değil çünkü nöral ağlar karmaşıktır ve NMÇ çıktısı öngörülemeyebilir (Arthur, Neubig ve Nakamura 2016). Bu sebeple çeviri öğrencileri için NMÇ çıktısıyla haşır neşir olup bu işin gizemini çözmenin önemi daha da artmakla birlikte MÇ ile ilgili yanıltıcı haberlere rağmen NMÇ çıktısının güçlü yönleri yanında zayıf yönlerinin de olduğunun farkında olmalıdırlar.

(43)

37

Bu makale karşılaştırmalı olarak tek bir dil çifti için istatistiksel ve nöral MÇ çıktılarını değerlendirmek adına uygulamalı bir sınıf içi değerlendirme çalışması raporu çıkartıyor.

TraMooc AB destekli projenin bir parçası olarak büyük ölçekli bir karşılaştırmalı değerlendirme akabinde başlatılmıştır (Castilho ve ark. 2017). Ardından öğrenciler değerlendirmeleri iki saat içinde tamamlayabilsinler diye kapsam daraltılmıştır. Bu çalışma yapılandırıcı paradigmaya göre dizayn edilmiştir. Böylelikle öğrenciler bağımsız olarak en yeni çeviri teknolojileri hakkında bilgi sahibi olabilecek ve “değerlendirmeci farkındalık” (Massey 2017) oluşturabileceklerdi. Aynı zamanda çeşitli çeviri kalite değerlendirme tekniklerini (ÇKD) kullanmada tecrübe kazanmış olacaklardı. Çoğu öğrenci için aynı zamanda MÇ post-edit de yeni bir tecrübe olmuştur. Bu çalışma lisans ikinci sınıf çeviri öğrencileri ile gerçekleştirilmiş ve başka üniversitede bir grup çeviribilim doktora öğrencisiyle tekrarlanmıştır. Öğrenci raporlarında, her bir teknolojiden neler beklenebileceğine dair pek çok görüş vardır. MÇ hakkında, NMÇ ve araştırma ve endüstriden yayımlanmış NMÇ değerlendirmelerini içeren bir ders yapılmıştır. Sınıf içi karşılaştırma görevinden bir hafta sonra lisans grubu için gerçekleştirilmiştir. Bulgularını bir bağlama oturtmak ve oluşturma sürecini ve değerlendirmesini yaptıkları tarzda bir çıktıya etken olan bir NMÇ sistem eğitimini (matematiksel formüllerden ziyade kavramlar seviyesinde) açıklama amaçlamıştır.

Öğrencilerin bulguları, bahsi geçen sistemlerle kişisel bir tecrübe yaşadıkları anlamına gelmekteydi ve bazı diğer son zamanlarda yayımlanmış karşılaştırmalı değerlendirmelerle birlikte, rasgele seçilmiş büyük ölçekli TraMooc değerlendirme sonuçlarına dört dil çifti üzerinden ayna tutmaktaydı.

Bir sonraki bölüm NMÇ'nin ortaya çıkmasının teknolojik ve endüstriyel bağlamının detaylarını içermektedir. Aynı zamanda daha önce yayımlanmış değerlendirmelerin özet

(44)

38

sonuçları ve NMÇ'yi öğrenmenin kişiye sağlayabileceği potansiyel iş olanakları hakkında da bilgi vermektedir. Sonrasında iki grubun da özet sonuçlarıyla beraber değerlendirme görevi tanımlanmıştır. Akabindeyse sınıf içi dönüt ve çalışmadan doğan tartışma noktaları aktarılmıştır.

Teknolojik ve endüstriyel bağlam

Her ne kadar nöral ağların ses tanımaya uygulanması birkaç yıl önce genel geçerleşse de ilk yayımlanmış NMÇ yazıları 2014'te meydana çıkmaya başlamıştır. Araştırmacılar

"verilen bir kaynak cümlenin doğru çeviri olasılığını" olabildiğince yükseltmeye ayarlı, tekli nöral ağlar oluşturmaya başlamıştır (Bahdanau, Cho ve Bengio 2014,1). Bu ayar, kaynak metnin bağlamsal bilgisine ve daha önce çevrilmiş kaynak metne dayalı olarak yapılmaktaydı. Farklı cümle uzunluklarına ayak uydurabilmek ve bilinmeyen kaynaklı kelimelerin çevirisine yardımcı olmak adına daha küçük kelimelere ayrılarak yapılması için NMÇ çıktısının kalitesini artıracak yeni teknikler geliştirilmiştir. (Sennrich, Haddow ve Birch 2016) 2016'da NMÇ sistemleri rekabetçi MÇ çevreleriyle tanıştığı zaman, birçok dil çiftinde (İngilizce-Almanca ve Almanca-İngilizce, İngilizce-Çekçe, İngilizce- Rusça; Bojar ve ark. 2016) MÇ'nin üstünde bir başarı kaydetmiştir. Hem de diğerlerine kıyasla henüz birkaç yıl içinde gelişmesine rağmen NMÇ, kalitede büyük bir sıçrayış olacağı konusunda MÇ araştırma toplulukları için müthiş bir beklentiye etken olmuştur.

Makine çevirisi üzerine karşılaştırmalı bir değerlendirme ve son teknoloji

Daha sonra NMÇ'yi İMÇ ile kıyaslamak için otomatik olarak ve insanların gerçekleştirdiği değerlendirme yöntemleri kullanılarak çeşitli yazılar ortaya çıkmıştır.

Hepsinde NMÇ çıktısındaki kelime diziliminde bulunan hataların az olduğu belirtilmiş ve akıcılık konusunda yüksek puanlar kaydedilmiştir. Bentivogli ve arkadaşlarının

Referanslar

Benzer Belgeler

structure made out of stages that were attached to long spokes which converged at a central sun. This big construct was then tilted vertically, at a roughly 45 degree angle, in

We reordered English phrases in order to get monotonic phrase alignments and so monotonic morpheme alignments and in- troduced two different levels of language models: a

Avrupa Komisyonu’nun 2013 yılı Türkiye ilerleme raporunda; adli yardım şartlarının kolaylaştırılması olumlu bir gelişme olarak değerlendirilmişse de; söz

Besides negative refraction, scientists have shown that metamaterials can also be used to achieve Balmost magical[ applications such as subwavelength imaging, superlenses,

 En üst düğümde yer alan özellikler tüm nesneler için ortaktır. Dolayısıyla duruma göre özellik vektöründen çıkarılmasına karar verilebilir.  En altta

Furthermore, we present a more elementary proof of Steinberg’s theorem which says that the group order is a lower bound for the dimension of the coinvariants which is sharp if and

It can be concluded that the translator should be very careful while selecting from the translation ecology and translating the culinary culture as he has considerable potential

If the method of explicitation is not properly applied or not applied where necessary, this leads to a translation error called under-translation (Delisle, 2013: 214), which can