Testing and assessment of speaking skills in preparatory classes

(1)

YABANCI DİLLER EĞİTİMİ ANABİLİM DALI İNGİLİZCE ÖĞRETMENLİĞİ BİLİM DALI

TESTING AND ASSESSMENT OF SPEAKING SKILLS IN PREPARATORY CLASSES

YÜKSEK LİSANS TEZİ

DANIŞMAN

DOÇ. DR. HASAN ÇAKIR

HAZIRLAYAN AHMET ÖNAL

(2)

(3)

(4)

(5)

ACKNOWLEDGEMENTS

First and foremost, I would like to express immense gratitude to my supervisor Assoc. Prof. Dr. Hasan ÇAKIR for his support, guidance, and patience throughout my research study. I could never have achieved this without his encouragement.

I am also very grateful to my colleagues Gökhan POLAT, Ayşe Nur ÖRÜMCÜ, Ayşe Naldemir PALA and Rabia YALÇIN at the School of Foreign Languages, Suleyman Demirel University for their valuable support and comments. I would like to thank to our teaching assistant Fielding Francis JEZREEL for helping me with the proofreading of this study.

Special thanks go to my friends who shared time with me and provided me with the support to overcome the number of intellectual and psychological challenges during the completion of this thesis.

Finally, I also wish to express my gratitude and special thanks to my family who shared all the difficulties in this study and throughout my life.

(6)

T.C.

SELÇUK ÜNİVERSİTESİ Sosyal Bilimler Enstitüsü Müdürlüğü

Adı Soyadı Ahmet ÖNAL Numarası 074218021006 Ana Bilim /

Bilim Dalı Yabancı Diller Eğitimi Anabilim Dalı/ İngilizce Öğretmenliği Bilim Dalı

Ö

ğrencinin _Danışmanı _{Doç. Dr. Hasan ÇAKIR}

Tezin Adı Hazırlık Sınıflarında Konuşma Yeteneğinin Ölçme ve Değerlendirmesi

ÖZET

Bir yabancı dil öğrenmenin her tür akademik çalışma alanı için temel gerek olduğu gerçeğini belirtmeye gerek kalmamıştır. Bu gerçekle bağlantılı olarak, günümüzde pek çok üniversite, öğrencilerinin yaklaşık bir yıl süren yabancı dil hazırlık eğitimi almasını şart koşmaktadır. Ülkemizin Milli Yabancı Dil Eğitim politikasında karşılaşılan temel sorun, yabancı dili kullanma veya konuşma becerisindeki eksikliktir. Diğer bir deyişle, konuşma yeteneği genellikle önce öğretmenler tarafından ve daha sonra da öğrenciler tarafından göz ardı edilmektedir. Öğretim ve değerlendirme süreçleri her zaman yakın bir ilişki içerisinde olmuştur ve konuşma yeteneğinin değerlendirmesi de yabancı dil öğretmenlerince ihmal edilmiştir. Bu çalışmanın amacı konuşma becerisinin ölçülmesi konusunda bazı uygulanabilir yöntemler sunmaktır.

Çalışmanın ilk bölümü konunun önemi, hipotez, çalışmanın amaç ve yöntemi hakkında bilgi içermektedir.

İkinci ve Üçüncü bölümler konu ile ilgili kapsamlı bir literatür taraması içermektedir. Konuşma becerisinin ölçme ve değerlendirmesinin yanı sıra

(7)

günümüzde kullanılan strateji ve yöntemler hakkında konu ile alakalı uzmanların görüşleri bu bölümde ele alınmaktadır.

Dördüncü bölüm, durum çalışmasına konu olan SDÜ (Süleyman Demirel Üniversitesi), YDYO (Yabancı Diller Yüksek Okulu) bünyesinde gerçekleştirilen ölçme ve değerlendirme uygulamalarını derlemek için hazırlanmış olan görüşmelerin uygulama ve yorumlanmasını kapsamaktadır.

Beşinci ve son bölümde ise, görüşmeler sonucunda elde edilen veriler ışığında değerlendirme yapılmış olup, yabancı dil öğretimi konusunda bireylere ve kurumlara öneriler bulunmaktadır.

Anahtar Kelimeler: Test yöntemleri, Test güvenilirliği, Konuşma,

(8)

T.C.

SELÇUK ÜNİVERSİTESİ Sosyal Bilimler Enstitüsü Müdürlüğü

Adı Soyadı Ahmet ÖNAL Numarası 074218021006 Ana Bilim /

Bilim Dalı Yabancı Diller Eğitimi Anabilim Dalı/ İngilizce Öğretmenliği Bilim Dalı

Ö

ğrencinin _Danışmanı _{Doç. Dr. Hasan ÇAKIR}

Tezin İngilizce Adı Testing And Assessment Of Speaking Skills In Preparatory Classes

SUMMARY

There is no need to say that learning a foreign language has become the primary necessity for all fields of academic study. In line with this fact, most universities stipulate that their students take intensive preparatory classes for about a year. The main problem encountered in our national language education policy has been the inability to use or speak the target language. In other words, speaking skills are usually ignored first by the teachers and then by the students. The processes of teaching and testing always have a close relationship and the testing of the speaking skills is also neglected by foreign language teachers. This study is aimed to offer some practical ways of testing speaking skills.

The first chapter of the study contains some information about the importance of the subject, hypothesis, goal and type of the study.

The second and third chapters include a comprehensive review of literature related to the subject. Introduced in these chapters are the views and opinions of experts and methodologists about testing and assessment of speaking as well as strategies and methods in use today.

(9)

The fourth chapter includes the application and interpretation of the interviews designed to compile relevant data as to the assessment procedures of the case of the study, namely SDU (Suleyman Demirel University), SOFL (School of Foreign Languages).

In the fifth (last) chapter there are some evaluations and discussions, with the help of the data collected through the interviews, and some suggestions for individuals and institutions related to foreign language teaching.

(10)

TABLE OF CONTENTS

BİLİMSEL ETİK SAYFASI i

YÜKSEK LİSANS TEZİ KABUL FORMU ii

ACKNOWLEDGEMENT iii

ÖZET iv

SUMMARY vi

TABLE OF CONTENTS viii

ABBREVIATIONS xiii

APPENDICES xiv

LIST OF TABLES xv

CHAPTER I INTRODUCTION 1

1.1. Background to the Study 1

1.2. Statement of the Problem 2

1.3. Hypothesis 2

1.4. Purpose and Importance of the Study 3

1.5. Method of the Study 3

CHAPTER 2 LITERATURE REVIEW – PART I 5

2.1. Introduction 5

2.2. Evaluation, Assessment, and Testing 6

2.3. Speaking Skill 7

2.4. Basic Types of Speaking 9

2.5. Planned and Unplanned Speech 11

2.6. Testing Performance 11

2.7. Assessing Speaking Skill 12

2.8. Why Test Speaking? 18

2.9. Differences between Writing and Speaking Skills 19 2.10. The Relationship between Teaching and Testing Processes 20 2.11. Prejudices and Problems about Testing Speaking Skill 22

(11)

2.12. Planning Evaluation 22 2.13. What Should Teachers Know about Assessment? 23 2.14. What Should Students Know about Speaking Tests? 25

2.15. Backwash 26

2.16. Qualities of Language Tests 27

2.16.1. Validity 29

2.16.1.1. Content Validity 29

2.16.1.2. Criterion-related Validity 30

2.16.1.3. Validity in Scoring 30

2.16.1.4. Face Validity 31

2.16.1.5. How to make tests more valid 31

2.16.2. Reliability 31

2.16.2.1. Scorer Reliability 32

2.16.2.2. How to Make Tests More Reliable 33

2.16.3. Reliability and Validity 36

2.16.4. Practicality 37

CHAPTER 3 LITERATURE REVIEW – PART II 39

3.1. Kinds of Tests and Testing 39

3.1.1. Proficiency Tests 39

3.1.2. Achievement Tests 40

3.1.3. Diagnostic Tests 41

3.1.4. Placement Tests 41

3.1.5. Discrete-Point versus Integrative-Testing 42 3.1.6. Norm-referenced versus Criterion-referenced Testing 42 3.1.7. Objective Testing versus Subjective Testing 43 3.1.8. Direct versus Indirect Testing 44 3.1.9. Formative versus Summative Assessment 45

3.2. Stages of Test Development 45

3.3. Characteristics of Raters 47

(12)

3.5. Informal Assessment of Speaking 49

3.6. Multiple Measures Assessment 50

3.7. Task Types in Testing Speaking Skill 51

3.7.1. Open-ended Response Tasks 53

3.8. Conducting Low-Level Speaking Exams 54

3.9. Speaking Test Formats 56

3.9.1. Oral Interview 56

3.9.2. Variations on the Framework 59 3.9.2.1. Information-gap Activities Requiring Student-student

Interaction 59

3.9.2.2. Picture Cue 60

3.9.2.3. Prepared Monologue 61

3.9.2.4. Role Play 61

3.9.2.5. Oral Presentations 62

3.9.2.6. Debate on a Controversial Topic 63 3.9.2.7. Mini-situations on Tape 63 3.9.2.8. The Free Interview / Conversation 64 3.9.2.9. The Controlled Interview 64 3.9.2.10. Sentence/Dialogue Completion Tasks 65 3.9.2.11. Giving Instructions and Directions 65 3.9.2.12. Paraphrasing/Summarizing 66

CHAPTER 4 COLLECTION AND INTERPRETATION OF DATA 67

4.1. Introduction 67

4.2. A Brief Introduction of the SDU, SOFL 68

4.3. Data Collection 70 4.4. Data Analysis 71 4.4.1. Question 1 71 4.4.2. Question 2 71 4.4.3. Question 3 72 4.4.4. Question 4 72

(13)

4.4.5. Question 5 73 4.4.6. Question 6 73 4.4.7. Question 7 74 4.4.8. Question 8 74 4.4.9. Question 9 74 4.4.10. Question 10 75 4.4.11. Question 11 75 4.4.12. Question 12 76 4.4.13. Question 13 76 4.4.14. Question 14 77 4.4.15. Question 15 77 4.4.16. Question 16 77 4.4.17. Question 17 78 4.4.18. Question 18 78 4.4.19. Question 19 79 4.4.20. Question 20 79 4.4.21. Question 21 79 4.4.22. Question 22 80 4.4.23. Question 23 80 4.4.24. Question 24 80 4.4.25. Question 25 81 4.4.26. Question 26 81 4.4.27. Question 27 81 4.4.28. Question 28 82 4.4.29. Question 29 82 4.4.30. Question 30 82 4.4.31. Question 31 83 4.4.32. Question 32 83

(14)

CHAPTER 5 CONCLUSION AND SUGGESTIONS 84 5.1. Introduction 84 5.2. Conclusion 84 5.3. Suggestions 86 BIBLIOGRAPHY 87 ÖZGEÇMİŞ 114

(15)

ABBREVIATIONS SDU: Suleyman Demirel University

SOFL: School of Foreign Languages FLE: Foreign Language Education ELT: English Language Teaching TLU: Target Language Use

FCE: First Certificate in English examination TOEFL: Test of English as a Foreign Language

(16)

APPENDICES

Appendix 1. Sample Checklist for Oral Interview 90

Appendix 2. Sample Checklist for Oral Presentation 91

Appendix 3. Interview Form 92

(17)

LIST OF TABLES

(18)

CHAPTER I INTRODUCTION 1.1. Background to the Study

The importance of receiving a proper education is accepted by every walk of life, and when the issue of education is broached, the first thing that comes to one’s mind is the evaluation process. The significance of the evaluation process stems from the fact that as a result of this process, learners either pass or fail the teaching programme. This is not as straightforward as it looks because passing or failing a particular exam may come to mean that the candidate is accepted (or not) for a vacant position in a company or in some situations the candidate is given (or not) permission to enter a country. As teachers or lecturers, most of us experience that during the first or second week of instruction, students start asking questions about the date, format or difficulty level of the exam(s) they are required to take so as to pass that particular course. This being the case, arriving at valid and reliable decisions about the performance of the students is probably the most challenging task a teacher has to carry out during the whole process of the instruction.

The process of evaluation is not such a big deal when the aim is to evaluate or measure a concrete entity. For instance, measuring the height or weight of a given object is fairly easy as long as you have the measuring tape in your hand. However, in the domain of foreign language education, the assessor has to measure the knowledge of the learners, which is quite difficult due to the fact that the thing to be measured is not something tangible. There are too many variables that affect the performances of the test takers and direct measurement of ‘foreign language knowledge’ is not possible. Accordingly, the evaluator has to think and develop several methods and procedures in order to fulfill his/her aim. Within this study, the researcher aims to deal with the variables which affect the process of evaluation and discuss the methods and procedures that can be employed to assess the performance of the foreign language learners.

(19)

1.2. Statement of the Problem

One well-known fact about Foreign Language Education (FLE) is that the students experience great difficulty in developing oral fluency. There are many reasons behind this well-known fact and, probably the most striking one, is that ‘speaking skill’ is somehow neglected in FLE. Although it is true that people listen and speak more often than they read and write in their daily lives, educators omit ‘listening’ and more often ‘speaking skills’ not only throughout the teaching process but also in the testing process. The teacher clearly gives a message to the students if s/he does not integrate the oral language into the overall testing process. On getting this sort of a message, the students understandably do not attach much importance to improving their oral fluency and, as a result, they encounter problems in speaking the target language.

As can be understood, the problem mainly stems from the teachers’ not attaching due importance to the teaching and testing of oral language in the first place. In general, the teachers do not assess the students’ oral fluency because preparing and administering valid and reliable speaking tests is a highly complex endeavor. It is much easier to prepare a test of reading or grammar than a test of speaking. Most of the educational institutions in our country do not attach the required importance to the ‘speaking skill’ and the learners educated inside these institutions complain that they are not able to speak the target language fluently.

1.3. Hypothesis

The importance given to speaking must be inherent not only in the teaching process but also in the assessment process. The students can not be expected to pay due importance to improving their speaking skills unless their oral fluency is tested. The more the students are exposed to speaking activities within the classroom and assessed in terms of their speaking skill, the more fluent and accurate they become in speaking the target language.

(20)

In this respect, several questions related to the procedure of teaching and testing speaking have been designed by the researcher and these questions are included in the interview form. It is hypothesized that if the items in the interview form are followed by any educational institution, it is possible to assume that institution as satisfactory or successful in teaching and improving its students’ speaking skills.

1.4. Purpose and Importance of the Study

According to the findings of a study conducted by Köksal (2004) on ‘Teachers’ Testing Skills in ELT’ in Turkey, most of the foreign language teachers in our schools prepare and administer language tests which are far from satisfactory. The reasons underlying this situation are:

Teachers’ lack of training in testing (test construction, administration and assessment).

Testing and teaching do not overlap. Teachers teach something but test something else. In other words, instructional objectives are not taken into account when choosing test tasks.

Tests focus on recognition but not production. Most importantly, testing focuses on the learning itself, not the outcomes of the learning.

Therefore, the main purpose of this study is to help teachers of English to prepare and administer more valid, reliable and practical language tests by providing necessary background and theoretical knowledge about language testing.

1.5. Method of the Study

This paper consists of five chapters and is intended to provide relevant information on testing speaking skill and suggest ways to prepare and administer

(21)

more valid, reliable and practical speaking tests. Suleyman Demirel University (SDU), School of Foreign Languages (SOFL) is chosen as the case for this study and the data as to the assessment procedures adopted by the school are collected as a result of the interviews conducted with three of the academic staff in the school. The data obtained in this way is then interpreted in terms of the suggested methods for testing and assessment of speaking skills in the literature review parts of the study.

The first chapter of this paper involves general information about the purpose and importance of the study and declares the hypothesis, purpose, importance and method of this study.

The second and third chapters include a comprehensive review of literature related to the subject matter of the study. Views and opinions of experts and methodologists about testing and speaking tests as well as test formats in use today are introduced in these chapters.

The fourth chapter identifies the testing procedures employed by the SOFL throughout the 2008 / 2009 Academic Year and these procedures are interpreted in terms of the views put forward by the experts and methodologists.

The fifth chapter consists of the conclusion of the study and includes suggestions about how to develop better speaking tests.

(22)

CHAPTER 2

LITERATURE REVIEW – PART I 2.1. Introduction

The concept of testing and assessment, in general, is inherent in our daily lives. Before deciding on doing (or not doing) something, people usually embark on a process of testing and assessment. At the end of this sophisticated process, people arrive at a decision and each sound decision is based on a process of testing and assessment. For example, before someone leaves for work in the morning, s/he takes several factors into consideration and then decides to leave at a certain time. This decision is the result of an assessment process. With regard to the context of education, teachers prepare and apply certain types of formal or informal examinations and, in the end; some students pass the course while others fail it. Most of the time, this crucial decision is made by the teacher or some other administrative bodies and the students are either positively or negatively affected by this decision. According to Fulcher, G. & Davidson F.:

“Testing and assessment are part of modern life. Schoolchildren around the world are constantly assessed, whether to monitor their educational progress, or for governments to evaluate the quality of school systems. Adults are tested to see if they are suitable for a job they have applied for, or if they have the skills necessary for promotion. Entrance to educational establishments, to professions and even to entire countries is sometimes controlled by tests. Tests play a fundamental and controversial role in allowing access to the limited resources and opportunities that our world provides. The importance of what we test, how we test, and the impact that the use of tests has on individuals and societies cannot be overstated.” (2007: p. XIX)

At this point, it should be noted that this process affects the majority of people either as teachers or learners because most of the families include learners or teachers among their members. Considering the significance of this process, this

(23)

paper has been designed to introduce and explain the fundamentals of testing and assessment of foreign language teaching/learning and more specifically the essentials of testing and assessment of speaking skill in foreign language classes.

2.2. Evaluation, Assessment, and Testing

Although most teachers use them interchangeably, the terms ‘evaluation’, ‘assessment’, and ‘testing’ refer to distinct but relevant concepts. They can all be used in the discipline of education but there stands a subtle distinction in the denotations they signify.

The widest basis to gather data in education is the all-inclusive term

evaluation. Evaluation is related to all the factors that form and affect the learning

process such as syllabus objectives, course design, and materials. Evaluation is not limited to student achievement or language assessment, but it deals with all aspects of teaching and learning. Weiss states that “Evaluation can be defined as the systematic gathering of information for the purpose of making decisions”. (Reporter: Bachman, Lyle F., 1990: p. 22)

On the other hand, the scope of assessment is narrower compared to evaluation in that; when assessing, the teacher is interested in the student and what the students performs, therefore the assessment forms a crucial part of the evaluation process. By assessment, information as to the learner’s language ability and achievement is collected in several ways. According to Coombe, C., et al. “…

assessment is an umbrella term for all types of measures used to evaluate student

progress.” (2007: p. XV)

As regards the testing, it is possible to define it as a procedure to gather information about students’ behavior and so, it can be regarded as a subcategory of assessment. Brown D. H. states that:

(24)

“Tests are prepared administrative procedures that occur at identifiable times in a curriculum when learners muster all their faculties to offer peak performance, knowing that their responses are being measured and evaluated. Assessment, on the other hand, is an ongoing process that encompasses a much wider domain. Whenever a student responds to a question, offers a comment, or tries out a new word or structure, the teacher subconsciously makes an assessment of the student’s performance. Tests, then, are a subset of assessment; they are certainly not the only form of assessment that a teacher can make. Tests can be useful devices, but they are only one among many procedures and tasks that teachers can ultimately use to assess students.” (2004: p. 4)

Carroll states that “…a psychological or educational test is a procedure designed to elicit certain behavior from which one can make inferences about certain characteristics of an individual.” (Reporter: Bachman, Lyle F., 1990: p. 20)

“In summary, evaluation includes the whole course or program, and information is collected from many sources, including the learner. While assessment is related to the learner and his or her achievements, testing is part of assessment, and it measures learner achievement.” (Coombe, C., et al. 2007: p. XV) Bachman, Lyle F. concludes by stating: “…then, not all measures are tests, not all tests are evaluative, and not all evaluation involves either measurement or tests.” (1990: p. 24)

2.3. Speaking Skill

Four skills (Listening, Speaking, Reading and, Writing) in FLE are classified into two main groups, that is, oral and written language. In their daily lives, it is the oral language that people need and use more often. During the day, people listen and speak more often than they read or write. If we, as teachers of foreign language, want our students to communicate in the target language, then we have to take this fact into consideration and design our teaching and assessment in line with this fact. Another way to classify the four skills is that, when people read or listen to something, they need to comprehend the message whereas they produce some sort of

(25)

output when they write or speak. As anyone can guess, it is usually much more challenging to produce or create language. Anyway, when people speak, it means that they produce language orally.

“Speaking (in short turns) involves the reciprocal ability to use both receptive and productive skills in a continual encoding and decoding of developing message(s). This form of communication involves due observance of accepted routines and the continuous evaluation and negotiation of meaning on the part of the participants. Management of interaction may well be required with opportunity for agenda management and the need for conventions of turn-taking to be respected by participants.” (Weir, Cyril J., 1993: p. 33-34)

Two main features of spoken interaction are time pressure and spontaneity. When people speak to each other, there is not much space for long silence and only a limited amount of silence is tolerated. This feature of spoken interaction needs to be integrated into the test of speaking ability. Another feature of spoken interaction, spontaneity, refers to the situations in which the speaker has no time to prepare. In this case, the speaker has to speak extemporaneously and short speaking turns are more common. The speaker generally tends to make use of loosely-strung-together phrases, rather than complete and neat sentences. On the other hand; long speaking turns, such as oral presentations and lectures, require some kind of prior planning and the teacher or the tester needs to specify how much time is to be allowed for the preparation process.

Speaking necessitates the simultaneous use of many different abilities and each of these abilities develops at different rates. When the process of speaking is analyzed, it is possible to argue that the speaker needs to make use of pronunciation, grammar, vocabulary, fluency (the ease and flow of the speech) and comprehension (assuming that the speaker responds to oral stimuli).

Çakır, H. makes mention of a problem encountered by students when they try to speak:

(26)

“In speaking, the problem is not always directly related with language competence. Students simply do not know how to present and develop a topic. The knowledge on presenting a subject will enhance their communicative competence. It will be surprising to see students confidently conversing on a subject if they have an adequate practice on general subjects.” (2008: p. 1408)

However, due to the fact that it is much more difficult and time-consuming to test oral skills, most teachers and examination bodies either handle listening and speaking superficially or completely exclude these skills from their assessment and testing practices. This has a negative washback effect, in that the students, accordingly, don’t try to improve their oral skills and even ignore the oral language altogether. Having focused on written language and grammar during the learning process, the students are not able to communicate in real life in the target language and this situation makes it harder and harder to succeed in FLE. Madsen Harold S. claims that: “The testing of speaking is widely regarded as the most challenging of all language exams to prepare, administer and score. For this reason, many people don’t even try to measure the speaking skill. They simply don’t know where to begin the task of evaluating spoken language.” (1983: p. 147)

2.4. Basic Types of Speaking

It is possible to talk about five distinct types of speaking and each of these speech types is arranged in an order. This arrangement is said to be indicative of the proficiency level of the foreign language speaker and, in general, assessment criteria adopted by the examiners are based on this arrangement.

Imitative Speech refers to the ability to simply parrot back or imitate a word

or phrase or possibly a sentence. Although this is a purely phonetic level of oral production, disregarding the meaning of utterances altogether, a number of prosodic, lexical, and grammatical properties of language may be included in the criterion performance.

(27)

Intensive Speech involves the production of short stretches of oral language

designed to demonstrate competence in a narrow band of grammatical, phrasal, lexical, or phonological relationships (such as prosodic elements – intonation, stress, rhythm, juncture). Examples of intensive assessment tasks are directed response tasks, reading aloud, sentence and dialogue completion; limited picture-cued tasks including simple sequences; and translation up to the simple sentence level.

Responsive Speech, as the name suggests, requires responding to either a

written or an oral stimulus. “Responsive assessment tasks include interaction and test comprehension but at the somewhat limited level of very short conversations, standard greetings and small talk, simple requests and comments, and the like.” (Brown, H. D., 2004: p. 141-142) There is a spoken stimulus and the test taker is required to respond to this question in an appropriate way. There may also be one or two follow-up questions.

Interactive Speech is similar to responsive speaking tasks but the main

difference is the length and complexity of the interaction. “Interaction can take the two forms of transactional language, which has the purpose of exchanging specific information, or interpersonal exchanges, which have the purpose of maintaining social relationships.” (Brown, H. D., 2004: p. 141-142)

Extensive Speech (or Monologue) tasks include speeches, oral presentations,

and story-telling activities. During these activities, the opportunity for oral interaction from listeners is either highly limited or ruled out altogether because the speaker is required to speak for a previously decided period of time. “Language style is frequently more deliberative and formal for extensive tasks.” (Brown, H. D., 2004: p. 141-142)

(28)

2.5. Planned and Unplanned Speech

In some situations such as lectures, conference presentations and expert discussions, the speaker is to deliver a planned and possibly rehearsed formal speech. In this type of speech, the speaker is expected to use complex grammatical features and a high degree of written language influence may be observed. Louma, S. (2004: p. 12-13) states that “Unplanned speech, in contrast, is spoken on the spur of the moment, often in reaction to other speakers” and involves short idea units and incomplete sentences.

2.6. Testing Performance

Because they demand significant resources to implement, and they are costly, performance-based elements in large-scale testing are usually restricted to a small number of controlled task types, usually involving writing and speaking. Especially in the case of testing speaking, this is without any doubt true. On the other hand, the activities and assessment are almost entirely performance-based, and completely integrated inside the classroom. This is hardly surprising, “…because it is a social learning environment that encourages interaction, communication, achieving shared goals and providing feedback from learner to learner as well as teacher to learner. A particular feature of the classroom context is collaboration between learners.” (Fulcher, G., Davidson, F., 2007: p. 29)

There are two main reasons for considering assessment of writing and speaking samples extremely important. First of all, the best way to assess learners’ writing or speaking performance in the real world is to get students to write or speak in the test. Secondly, without including writing and speaking in a test, the test will not have the required washback on the teaching process.

However, it is not so easy to define ‘the real world’, and to recreate that world in the test. Another obstacle is how to score the limited sample that we are able to collect in the test, so that the score carries meaning for the target domain and is

(29)

relevant to any instructional context that is preparing learners to become competent communicators within that domain.

Fulcher, G., Davidson, F. argue that “When judges look at a piece of writing or a sample of speech they must focus on a particular part of what is written or said. The judges’ reaction, whether presented as a score, summary or comment, is an abstraction. One of the key tasks is to ensure that the abstractions we create are the most useful for test purpose.” (2007: p. 249)

2.7. Assessing Speaking Skill

The most important channel of communication in daily life is speaking and it is necessary to simulate real-life situations in which students engage in conversation, ask and answer questions, and give information on a topic when assessing speaking skills of the students. Coombe, C., et al. point out that:

“In an academic English program, the emphasis may shift to participating in class discussions and debates or giving academic presentations. In a business English course, students might develop telephone skills, make reports, and interact in common situations involving meetings, travel, and sales. Whatever the teaching focus, valid assessment should reflect the course objectives.” (2007: p. 112)

The work of any language tester is far more difficult than, let’s say; a sociolinguist’s, a discourse analyst’s or a phonologist’s in that, they do not create ‘sociolinguistic things’, ‘discourses’, or ‘spoken utterances’. Language testing is about doing. Language testers are supposed to create tests. According to Madsen, H. S. (1983), another factor which makes the job of the language tester even more difficult is that; the nature of the speaking skill itself is not well defined and this results in a disagreement on what criteria to adopt in evaluating oral production. Since the elements of speaking are numerous and not easy to identify, there is a disagreement on how to weigh each factor.

(30)

Weir, Cyril J. says that in testing Speaking skill:

“… to determine whether learners can communicate orally, it is necessary to get them to take part in direct spoken language activities. We are no longer interested in testing whether candidates merely know how to assemble sentences in the abstract: we want candidates to perform relevant language tasks and adapt their speech to the circumstances, making decisions under time pressure, implementing them fluently, and making any necessary adjustments as unexpected problems arise.” (2005: p. 103)

Heaton, J. B. summarizes the issue of testing speaking in a more comprehensive manner:

“Testing the ability to speak is a most important aspect of language testing. However, at all stages beyond the elementary levels of mimicry and repetition it is an extremely difficult skill to test, as it is far too complex a skill to permit any reliable analysis to be made for the purpose of objective testing. Questions relating to the criteria for measuring the speaking skills and to the weighting given to such components as correct pronunciation remain largely unanswered. It is possible for people to produce practically all the correct sounds but still be unable to communicate their ideas appropriately and effectively. On the other hand, people can make numerous errors in both phonology and syntax and yet succeed in expressing themselves fairly clearly. Furthermore, success in communication often depends as much on the listener as on the speaker: a particular listener may have a better ability to decode the foreign speaker’s message or may share a common nexus of ideas with him or her, thereby making communication simpler. Two native speakers will not always, therefore, experience the same degree of difficulty in understanding the foreign speaker.” (1990: p. 88)

The preparation, administration and scoring of speaking ability tests are generally acknowledged to be the most challenging of all language tests, because the nature of speaking skills is not well defined. The most problematic area of speaking ability tests is deciding on the criteria to evaluate oral communication. The main aim

(31)

of speaking ability tests is to determine whether the test takers have the ability to communicate accurately and effectively in real-life situations. Unfortunately, these tests necessitate a great amount of time to administer and score.

Weir, Cyril J. emphasizes the communicative aspect of speaking skills by stating:

“Testing speaking ability offers plenty of scope for meeting the criteria for communicative testing, namely that: tasks developed within this paradigm should be purposive, interesting and motivating, with a positive washback effect on teaching that precedes the test; interaction should be a key feature; there should be a degree of intersubjectivity among participants; the output should be to a certain extent unpredictable; a realistic context should be provided and processing should be done in real time. Perhaps more than in any other skill there is the possibility of building into a test a number of the dynamic characteristics of actual communication”. (1990: p. 73)

In real life contexts, people usually listen to a verbal stimuli and speak in response to that stimuli. “From a pragmatic view of language performance, listening and speaking are almost always closely interrelated. Only in limited contexts of speaking (monologues, speeches, or telling a story and reading aloud) can we assess oral language without the aural participation of an interlocutor.” (Brown, Douglas H., 2004: p. 140)

The examiner(s) may adopt two main sets of assessment criteria in speaking exams: a scale of overall or global competence, and a scale of competence in particular sub-skills of speaking. These sub-skills may consist of discourse

management, interactive communication, grammar and vocabulary, and pronunciation.

Burgess, S., Head, K. (2005: p. 105-108) state that: “Discourse management involves the ability to control language over more than a single utterance, and to

(32)

express ideas and opinions in coherent, connected speech.” The examiner(s) should not expect the candidates to be fully fluent, but it is important that the candidates go on speaking despite some hesitation and searching for words. “Interactive

communication is the ability to engage in conversation or discussion. The main skills

of interactive communication are appropriate turn-taking, initiating, and responding at the required speed and in the correct rhythm.” (2005: p. 105-108) The examiner(s) may also assess the range and accuracy of the grammatical and lexical forms used by the candidate(s). Since it is a speaking test and not a grammar or a vocabulary test, the examiner(s) should try to concentrate more on the speaking skills of the candidate(s). “Pronunciation is assessed in all speaking exams, in relation to both production of individual sounds and control of prosodic features (stress, rhythm, and intonation).” (2005: p. 105-108)

The channel of communication also needs to be considered because it makes a great difference on the performance of the test-taker whether s/he is required to carry out a face to face conversation or asked to simulate a telephone conversation with an interlocutor in a different room. Unless the test-takers are expected to perform as air traffic controllers in their future careers, this kind of a testing procedure will not be a proper one.

The context in which the communication process takes place also carries significance. “The important role of context as a determinant of communicative language ability is paramount. Language cannot be meaningful if it is devoid of context (linguistic, discoursal and sociocultural).”(Weir, Cyril J., 1993: p. 36) Therefore, a meaningful and acceptable context needs to be presented to the test takers during the testing process.

Prior to the designing stage of the speaking test, it is necessary to decide whether to focus more on accuracy or fluency. It is recommended by Coombe, C., et al. (2007) to focus equally on fluency and accuracy. As long as there is no breakdown in the communication process, it is advisable to ignore or play down the problems. Reflecting this within your assessment criteria prior to the exam and

(33)

deciding upon the assessment criteria with your colleagues (if they will also participate in the speaking test) will enhance the reliability of the test.

In order to test oral production skills of the learners, it is necessary to have appropriate criteria for the assessment. The micro- and macro-skills of speaking explained below will certainly be extremely useful for creating such assessment criteria. Producing smaller chunks of language such as phonemes, morphemes, words, collocations, and phrasal units make up the micro-skills of speaking. Larger elements like fluency, discourse, function, style, cohesion, nonverbal communication and strategic options are included within the macro-skills of speaking. Micro-skills of speaking are:

Producing differences among English phonemes and allophonic variants.

Producing chunks of language of different lengths.

Producing English stress patterns, words in stressed and unstressed positions, rhythmic structure, and intonation contours.

Producing reduced forms of words and phrases.

Using an adequate number of lexical units (words) to accomplish pragmatic purposes.

Producing fluent speech at different rates of delivery.

Monitoring one’s own oral production and using various strategic devices – pauses, fillers, self-corrections, backtracking – to enhance the clarity of the message.

(34)

Using grammatical word classes (nouns, verbs, etc.), systems (e.g., tense agreement, pluralization), word order, patterns, rules, and elliptical forms.

Producing speech in natural constituents: in appropriate phrases, pause groups, breathe groups, and sentence constituents.

Expressing a particular meaning in different grammatical forms. Using cohesive devices in spoken discourse.

Macro-skills, on the other hand, are:

Appropriately accomplishing communicative functions according to situations, participants, and goals.

Using appropriate styles, registers, implicature, redundancies, pragmatic conventions, conversation rules, floor-keeping and –yielding, interrupting, and other sociolinguistic features in face-to-face conversations.

Conveying links and connections between events and communicating such relations such as focal and peripheral ideas, events and feelings, new information and given information, generalization and exemplification.

Conveying facial features, kinesics, body language, and other nonverbal cues along with verbal language.

Developing and using a battery of speaking strategies, such as emphasizing key words, rephrasing, providing a context for interpreting

(35)

the meaning of words, appealing for help, and accurately assessing how well your interlocutor is understanding you.

(Brown, H. D., 2004: p. 142-143)

2.8. Why Test Speaking?

Speaking is a complex skill in that the speaker has to use different abilities such as pronunciation, grammar, vocabulary, fluency, and comprehension simultaneously. There are also some contextual and interactional factors to be considered. Likewise, testing speaking, of course, has its own difficulties; however every effort should be made to pay as much attention in the assessment of speaking skill as the other language skills. As known, English has now become a global language which is studied by a lot of people worldwide. In communicative language teaching, speaking is viewed as the most neglected skill and so it should be emphasized in the language curriculum and “…if we value communication skills, we must assess them or we send a double message to our students about what we consider to be important.” (Coombe, C., et al. 2007: p. 113)

According to Birjandi, P., et al.:

“… we test in order to give the learners a sense of achievement, to end their dissatisfactions and frustrations, to foster learning through diagnosing the problematic areas, to enhance learning by making the learners aware of the course objectives, to adjust the learner’s personal goals, to give promotion, and to show the effectiveness and efficiency of instruction, etc. Tests may gauge the teacher’s ability and, in general, they serve a two-fold instructional purpose: as a guide to the learners, and as a guide to the teacher.” (2001: p. 9-10)

(36)

2.9. Differences between Writing and Speaking Skills

A common misconception among most people is that the language of speech is very much the same as the language of writing. Buck, G. (2001) argues that this is not true. Although speech and writing are both variants of the same linguistic system, there are some considerable differences between them and these differences need to be taken into consideration both in teaching and testing procedures.

Writing and speaking are the two productive skills and there are many similarities as well as differences between them. First of all, speaking is more ephemeral than writing unless the performances of the test takers are recorded. While the students use full, complex, and well-organized sentences in writing, they tend to utter incomplete, simple and loosely organized sentences. The test takers make use of discourse markers to help the reader when they write; however, when speaking, they frequently use fillers to facilitate their speech. They do not know who will read and score their written texts, but most of the time there is face-to-face communication in speaking tests.

Weigle, S.C. (2002) supports the view that written language is permanent and it is possible to read and reread it many times, but oral language is transitory and needs to be processed in real time. Writers have the chance to plan, review and revise their output whereas speakers have to plan, formulate and deliver their utterances within a few seconds except for certain special circumstances such as oral presentations or lectures. Another outstanding characteristic of written language is that it usually involves complex and longer sentences whereas oral language often has shorter sentences connected by coordinators. On the other hand, oral language has certain advantages over written language. For instance, the speaker may make use of many devices such as stress, intonation, pitch, volume and pausing to enhance the message. However, in written language, these devices are not available. Moreover, in written language there stands a gap between the writer and the reader in terms of time and space and this removes the shared context between the participants of the communication. Çakır, H. states on this issue:

(37)

“While speaking, the students, I have often observed in my class, struggle to make up sentences like those of a book. Almost few native speakers of English or man of letters speak in perfect sentences. Bookish delivery is appropriate to books, exactly to written medium of language. The participants in the written communication do not see each other. The reader is not able to ask any question to the writer of the book when he does not understand a given point. The nature of written communication involves relatively longer and grammatically perfect sentences. The literary work is therefore verbatim and redundant in style.” (2008: p. 1414)

Another difference between speaking and writing is that idea units rather than sentences are usually encountered in speech. Idea units refer to short phrases and clauses, grammar of which is much simpler and more basic than that of written language with its long sentences and dependent and subordinate clauses. As the process of speaking occurs in real time, both the speaker and the listener have memory limitations and this creates the need for simpler and shorter sentences or idea units. Buck, G. supports this view by adding:

“One important point, for example, is that people do not usually speak in sentences. Rather, spoken language, especially in informal situations, consists of short phrases or clauses, called idea units, strung together in a rather loose way, often connected more by the coherence of the ideas than by any formal grammatical relationship. The vocabulary and the grammar also tend to be far more colloquial and much less formal. There are many words and expressions that are only used in speech, never in writing.” (2001: p. 9)

2.10. The Relationship between Teaching and Testing Processes

According to Farhady H. (2006), there exists a close relationship between teaching and testing methods. The teaching method adopted by the teacher makes it necessary to employ corresponding testing methods. In other words, what the teacher teaches determines what s/he will test and likewise, what s/he tests will affect what

(38)

s/he teaches. In this context, it is possible to predict that a teacher will utilize translation and essay type items if s/he adopts the Grammar Translation method. Likewise, the Audio-Lingual method will probably require discrete-point test items and the Notional-Functional Approach will entail functional test items.

Hughes argues that there is cooperation between the two and the breakdown in the cooperation process will lead to harmful backwash:

“The proper relationship between teaching and testing is surely that of partnership. It is true that there may be occasions when the teaching programme is potentially good and appropriate but the testing is not; we are then likely to suffer from harmful backwash. … But equally there may be occasions when the teaching is poor or inappropriate and when testing is able to exert a beneficial influence. We cannot expect testing only to follow teaching. Rather, we should demand of it that it is supportive of good teaching and, where necessary, exerts a corrective influence on bad teaching.” (2003: p. 2)

The relationship between testing and teaching is so close that it is virtually impossible to work in either field without being constantly concerned with the other. One reason for designing tests is to reinforce learning and to motivate the student or to assess the student’s performance in the language. “In the former case, the test is geared to the teaching that has taken place, whereas in the latter case the teaching is often geared largely to the test.” (Heaton, J. B., 1990: p. 5)

Testing is a natural step in teaching and learning processes. The widest view on this issue is that teaching and testing are the two sides of the same coin; that is, they are very closely interrelated. “… testing itself can be a teaching device. Testing is not an end in itself. Rather, it is another step forward.” (Birjandi, P. et al. 2001: p. 2)

(39)

2.11. Prejudices and Problems about Testing Speaking Skill

It is notoriously difficult to test speaking due to several factors such as lack of time, number of students, lack of available tests, and administrative difficulties. Resource requirements, reliability, high subjectivity in the scoring process, and practicality issues for reliability of the scoring are other main challenges. As there are many issues to be considered, many language teachers do not even attempt to assess speaking.

Great care should be taken by the teachers when they design and apply a test in order not to cause problems or inconveniences. Birjandi, P., et al. argue that:

“Testing is a delicate and complex responsibility … because decisions are going to be made based on the test results. That is, appropriate decision making requires exact information, and exact data may be obtained through accurate testing and evaluation. If the decisions are not appropriate, they may influence pupils’ future lives badly.” (2001: p. 1-2)

2.12. Planning Evaluation

Without proper planning, evaluation will certainly become useless and ineffective. The issue of evaluation should be taken into consideration right at the beginning of the school year and it should be an integral part of planning each lesson or unit. As mentioned above, teaching and evaluation processes should be considered together in order to ensure that the teaching process lends itself to the evaluation process and that the results of evaluation can direct ongoing instructional planning. If the planning of evaluation process is not integrated into the instructional planning at the beginning of the school year, there will be problems such as lack of time for testing and lack of reliability, etc. When planning evaluation, the following questions need to be answered by the tester:

(40)

What will I assess? When will I assess? How will I assess?

How will I record the results of my assessment?

(Genesee, F., Upshur, John A., 1996: p. 44-45) The answers to the questions above make up the outline of the evaluation process. Without that outline, the test will not assess what it aims, and therefore, be a poor test. Louma, S. calls this outline ‘test specifications’ and explains:

“Anyone who develops a speaking assessment will have ideas about what kind of speaking it will focus on, how the assessment will be done, and what the rating criteria will be. The written version of these ideas is called the test specifications, or specs for short. The specifications contain the developers’ definition of the construct(s) assessed in the test, and detailed definitions of the tasks and rating criteria to guide the development of comparable tasks and the delivery of fair ratings. The specifications record the rationale for why the assessment focuses on certain constructs, and how the tasks and criteria operationalise them.” (2004: p. 113)

2.13. What Should Teachers Know about Assessment?

The close relationship between teaching and testing means that for an effective teaching process to take place, it is necessary to design a proper test. Most of the time, it is the classroom teacher rather than a body such as a testing office, who is responsible for designing, administering and scoring the test, which makes it compulsory for the classroom teacher to be a good tester. In order to write effective

(41)

test items, administer the test without experiencing problems and score the tests reliably, the classroom teacher needs to know the following tips:

Teachers should be skilled in choosing assessment methods appropriate for instructional decisions.

Teachers should be skilled in developing assessment methods appropriate for instructional decisions.

Teachers should be skilled in administering, scoring and interpreting the results of both externally-produced and teacher-produced assessment methods.

Teachers should be skilled in using assessment results when making decisions about individual students, planning teaching, developing curriculum and institutional improvement.

Teachers should be skilled in developing, using and evaluating valid student grading procedures which use student assessments.

Teachers should be skilled in communicating assessment results to students, educational decision makers and other concerned stakeholders. Teachers should be skilled in recognizing unethical, illegal and

otherwise inappropriate assessment methods and uses of assessment information.

(Brindley, G., 2000: p. 128) Furthermore, Köksal suggests in his online article (2004) that the teachers shouldn’t use a technique not used in the teaching process as a test technique so as to have a positive washback effect of testing on language learning and teaching. Another point to consider is that teachers should test learners’ writing skills by

(42)

having them write and their speaking skills by having them speak, which is known as ‘construct validity’.

2.14. What Should Students Know about Speaking Tests?

The candidates feel a great deal of anxiety prior to entering speaking tests and this will certainly have a negative impact on their performance. Most of the time, they perform worse than they would otherwise do. However, knowing what to expect in a speaking exam may allay the anxiety on the part of the test-takers and thus, the test becomes more reliable. “It is important that students know beforehand exactly what test procedure will be adopted and how the examiners will behave.” (Burgess, S., Head, K., 2005: p. 103-105)

A sample video in terms of both the exam procedure and the level of performance required can be shown to students and in this way, the students may get a clue as to what to expect. Moreover, the candidates generally feel nervous on entering the room to take the test. First of all, the examiner(s) should do their best to put the candidate at ease by welcoming him/her and showing him/her where to sit. The examiner(s) should give each candidate enough time and try not to interrupt him/her even if s/he misunderstands and wanders away from the actual topic. Burgess, S., Head, K. suggest the following list to be followed by the test takers in order to improve their performance on the test:

Arrive in good time.

Find someone to talk to in English while you are waiting to go into the exam room.

When the examiner greets you, reply appropriately.

When you sit down, take your time to make yourself comfortable. The examiner will not begin until you say you are ready.

(43)

Listen carefully to what the examiner says, and ask him/her politely to repeat it if there is something you do not understand.

Show the examiner what you can do. Do not just give one-word answers to questions.

If you are taking the exam together with another candidate, be sure to give each other enough time to speak.

Remember that the examiner wants you to do well. He/She is not trying to scare you or trip you up.

At the end of the exam, thank the examiner, and say goodbye. (2005: p. 11)

2.15. Backwash

‘Backwash’ (also known as ‘Washback’) is known as the effect of testing on teaching and learning and can be harmful or beneficial. “If a test is regarded as important, if the stakes are high, preparation for it can come to dominate all teaching and learning activities. And if the test content and testing techniques are at variance with the objectives of the course, there is likely to be harmful backwash.” (Hughes, A., 2003: p. 1) Danesi argues that a test has harmful backwash when training for a particular test comes to dominate classroom work and teachers teach one thing and the test then concentrates on another one or teachers end up teaching to the test.

For example; if the students are following a curriculum concentrating on the language skills such as writing and speaking, but the students are to be tested only by multiple choice items, then they are likely to practice and focus on multiple choice items and ignore writing or speaking. This is something that contradicts the motto –

(44)

test what you teach – and a common example of what is called negative or harmful backwash.

In contrast; imagine that, for the situation above, a new test was devised and the students were required to perform tasks similar to the ones which they would need in their future studies, then this change would have an instant effect on the teaching process: the syllabus would be adapted to the test, the text-books would probably be changed, classes would have to be conducted in a different way. This would be a case of positive or beneficial backwash.

Weir, Cyril J. argues that it is not possible to ignore the importance of the washback effect of tests on teaching. “Problems are always likely to arise if the achievement tests do not correspond to the teaching that has preceded them. Students nurtured in a heavily structuralist approach are unlikely to perform well on tasks they have not previously met, such as spontaneous spoken interaction tests.” (1993: p. 6)

It should be noted that without integrating speaking tests into the overall evaluation process, it is not possible to make the students regard it as an important skill. Unless they view it as such, they will probably not attach much importance to it and accordingly, they will not try to improve their speaking ability. So as to produce positive backwash, the language teacher needs to test his/her students’ speaking skills.

2.16. Qualities of Language Tests

When designing a language test, the most important consideration should be its usefulness, and this is generally defined in terms of six test qualities: reliability,

validity, authenticity, interactiveness, impact, and practicality. Each of these

contributes to test usefulness; therefore it is not sensible to evaluate them independently of each other. Moreover, as each testing situation is unique, the relative importance of these different qualities will vary from one testing situation to another, hence it is better to evaluate test usefulness for specific testing situations.

(45)

It is not possible to prescribe the balance of these test qualities because it can only be determined for a given testing situation. It is crucial that any one quality not be ignored at the expense of others. Instead, as Bachman, Lyle F., Palmer, Adrian S. put it, “… we need to strive to achieve an appropriate balance, given the purpose of the test, the characteristics of the TLU (Target Language Use) domain and the test takers, and the way we have defined the construct to be measured.” (1996: p. 38)

Harris D. talks about three qualities which a good test should possess: “All good tests possess three qualities: validity, reliability, and practicality. That is to say, any test that we use must be appropriate in terms of our objectives, dependable to our particular situation. To be sure, there are other test characteristics which are also of value, but these three constitute the sine qua non, without any one of which a test would be a poor investment in time and money. Whether the teacher is constructing his own test or is selecting a standard instrument for use in his class or school, he should certainly understand what these concepts mean and how to apply them.” (1969: p. 13)

Hughes, A. underlines that “Each testing situation is unique and sets a particular testing problem”. Unfortunately, a particular language test that proves ideal for one purpose may be useless for another. “… and so the first step must be to state this testing problem as clearly as possible.” Whatever test or testing system we then create should be one that:

consistently provides accurate measures of precisely the abilities in which we are interested;

has a beneficial effect on teaching;

is economical in terms of time and money

(46)

2.16.1. Validity

Validity is simply defined as the ability of a given test to measure what it is supposed to measure. Weir, Cyril J. defines it as “… the extent to which a test can be shown to produce data, i.e., test scores, which are an accurate representation of a candidate’s level of language knowledge or skills.” (2005: p. 12) Brown, James D. states that validity is “… the degree to which a test measures what it claims, or purports, to be measuring.” (2005: p. 220) Last but not least, Hughes, A. argues that “… a test is said to be valid if it measures accurately what it is intended to measure.” (2003: p. 26)

Language ability or knowledge is not a concrete and tangible entity, which makes it impossible to measure directly. This being the case, language teachers have no option but to attempt to make inferences about one’s ability through indirect measurement. “A fundamental question, then, is the extent to which this indirect measurement approximates a true indication of language ability. Technically speaking, this property of a test is called validity.” (Farhady, H., 2006: p. 150)

2.16.1.1. Content Validity

According to Heaton, J. B., “This kind of validity depends on a careful analysis of the language being tested and of the particular course objectives. The test should be so constructed as to contain a representative sample of the course, the relationship between the test items and the course objectives always being apparent.” (1990: p. 160) This clearly means that, a grammar test, for instance, needs to contain items relating to the knowledge or control of grammar. “In order to judge whether or not a test has content validity, we need a specification of the skills or structures, etc. that it is meant to cover.” (Hughes, A., 2003: p. 26-27) Then it is possible to arrive at a conclusion as to the content validity of the test after comparing the test specifications and the test content. Content validity of a test has great significance because if the test lacks content validity, it can be tagged as an inaccurate measure of

(47)

what it claims to measure. Furthermore, a test lacking in content validity will have a harmful backwash effect.

2.16.1.2. Criterion-related Validity

Criterion-related validity refers to “… the degree to which results on the test agree with those provided by some independent and highly dependable assessment of the candidate’s ability. This independent assessment is thus the criterion measure against which the test is validated.” (Hughes, A., 2003: p. 27-28) According to Birjandi, P., et al., “… two tests are correlated and the coefficient of correlation is the validity. Therefore, this is an empirical method of determining validity.” (2001: p. 93) For example; there are 50 students to be tested in an oral interview. Providing each student with 45 minutes for the interview would have higher validity but, for practical reasons, it turns out that only 10 minutes will be devoted to each student. The problem is whether devoting 10 minutes instead of 45 minutes to each student will have a negative effect on the validity of the test. One way to overcome this problem is choosing at random a sample of all students and exposing them to the 45 minute oral interview. This is the criterion against which the shorter test will be judged. The students’ performances on both tests are compared and if the result is a high level of agreement, then the 10 minute test may be considered to have criterion-related validity.

2.16.1.3. Validity in Scoring

During the preparation stage of the exam, the designer pays great attention to the validity of the test. However this is not sufficient to make the test valid. In other words, “… if a test is to have validity, not only the items but also the way in which the responses are scored must be valid. It is no use having excellent items if they are scored invalidly.” (Hughes, A., 2003: p. 32-33) For example, a reading test may require short written responses. When scoring the test, the scorer may take the spelling and grammar into consideration, but this will make the test invalid.

(48)

2.16.1.4. Face Validity

Face validity is related to the appearance of the test and “… if a test item looks right to other testers, teachers, moderators, and testees, it can be described as having at least face validity. It is, therefore, often useful to show a test to colleagues and friends.” (Heaton, J. B., 1990: p. 159) Birjandi, P., et al. suggest that “For example, a test of reading comprehension which contains dialect words which might be unknown to the students lacks face validity.” (2001: p. 92)

2.16.1.5. How to make tests more valid

When a test is administered, it will certainly affect many people directly and indirectly, so maximum care should be taken regarding the validity of the test. A test which lacks validity is certainly a poor tool to measure the intended skills’ of the learners and will cause more problems rather than benefit. Hughes, A. advises the following to make the tests more valid:

“First, write explicit specifications for the test which take account of all that is known about the constructs that are to be measured. Make sure that you include a representative sample of the content of these in the test. Second, whenever feasible, use direct testing. If for some reason it is decided that indirect testing is necessary, reference should be made to the research literature to confirm that measurement of the relevant underlying constructs has been demonstrated using the testing techniques that are to be employed. Third, make sure that the scoring of responses relates directly to what is being tested. Finally, do everything possible to make the test reliable. If a test is not reliable, it cannot be valid.” (2003: p. 33-34)

2.16.2. Reliability

Another important quality of a good test is reliability and a reliable test is, as Madsen, Harold S. suggests “… one that produces essentially the same results

(49)

consistently on different occasions when the conditions of the test remain the same.” (1983: p. 179) According to Coombe, C., et al., “reliability refers to the consistency of test scores, which simply means that a test would offer similar results if it were given at another time.” (2007: p. XXXIII) If a test is said to be reliable, then it means that, one should be able to depend on the results it produces.

It is clear that each test taker will get different scores on a test if the test is administered at different occasions such as on Tuesday instead of Monday. Even if the test is well-designed, the conditions of administration are more or less identical, the scoring process does not require any kind of judgment on the part of the scorers and no learning or forgetting takes place during the one-day interval, it is not possible to expect every test taker to get the same score on the Tuesday as s/he got on the Monday. This type of uniform behavior cannot be expected by human beings because they usually do not behave in the same way on every occasion even if the context seems to be identical. In this respect, it is impossible to have complete trust in any set of test scores. According to Hughes, A.:

“This is inevitable and we must accept this. What we have to do is construct, administer and score tests in such a way that the scores actually obtained on a test on a particular occasion are likely to be very similar to those which would have been obtained if it had been administered to the same students with the same ability, but at a different time. The more similar the scores, the more reliable the test is said to be.” (2003: p. 36)

2.16.2.1. Scorer Reliability

Some kinds of tests (objective tests), like the multiple choice tests, do not call for any kind of personal judgment on the part of the scorer(s). However, some other kinds of tests (subjective tests), such as oral interviews, require some degree of judgment on the part of the scorer(s) even if there are checklists or guidelines to base their scores on. It is highly possible for the same scorer to give different grades to the same performance at different times, which is called intra-rater reliability. Similarly;