Öneriler - ANKARA ÜNİVERSİTESİ EĞİTİM BİLİMLERİ ENSTİTÜSÜ

4. BÖLÜM

4.2. Öneriler

belirlenmiştir. Özellikle çok kategorili puanlama yapılıp, LORDIF analizi kullanıldığında daha yüksek istatistiksel güç oranlarının elde edilmiş olduğu görülmüştür.

Genel olarak, iki kategorili ve çok kategorili puanlama modelleri karşılaştırıldığında, çok kategorili puanlama yapılması durumunda I. Tip hata oranlarının daha düşük, istatistiksel güç oranlarının ise daha yüksek olduğu belirlenmiştir. Bu doğrultuda çok kategorili puanlamanın, yetenek kestiriminde daha başarılı olduğu ve DMF sonuçlarında puanlama modellerinin etkisi olabileceği ifade edilmiştir.

temel neden hakkında bilgi verememektedirler. Bu nedenle DMF üzerine çalışılan farklı gruplardaki bireylerin yanıtlama davranışları, bir maddeye yanıt verirken gerçekleştirdikleri düşünme süreçleri ve tüm bunlarla DMF arasındaki ilişkiye odaklanan çalışmalar gerçekleştirilebilir.

Bu araştırma kapsamında, iki kategorili puanlama modeli için MH ve LORDIF, çok kategorili puanlama modeli için ANOVA ve LORDIF DMF analizleri kullanılmıştır. Başka araştırmalar için de, farklı DMF belirleme analizleri kullanılarak karşılaştırmalar yapılabilir. Aynı zamanda Rasch ve KPM modelleri kullanılmış olan bu araştırma dışında diğer MTK modelleri de kullanılarak DMF kapsamında karşılaştırmalar gerçekleştirilebilir.

Bu araştırmada çok kategorili puanlama modeli için 4-3-2-1-0 olmak üzere beş kategori ele alınmıştır. Başka araştırmalarda kategori sayıları arttırılarak ya da azaltılarak, iki kategorili puanlamayla DMF kapsamında karşılaştırmalar yapılabilir.

Aynı zamanda farklı çok kategorili puanlama modelleri arasında da karşılaştırmalar gerçekleştirilebilir.

Bu araştırmada 600, 1200 ve 2400 örneklem büyüklükleri ve 1:1, 1:2 örneklem büyüklüğü oranları kullanılmıştır. Yapılacak olan araştırmalarda daha küçük ve daha büyük örneklem büyüklükleri ve örneklem büyüklüğü oranları ele alınarak çalışma tekrarlanabilir.

Araştırma kapsamında 20 madde olan test uzunluğu sabit bir koşul olarak ele alınmıştır. Başka araştırmalar için manipüle edilen bir koşul olarak ele alınarak, test uzunluğunun DMF kapsamında I. Tip hata ve istatistiksel güç oranlarına etkisi incelenebilir.

Bu araştırma kapsamında yetenek dağılımı bir koşul olarak ele alınmamıştır.

Yapılacak olan araştırmalarda yetenek dağılımı da farklı puanlama modelleri ve DMF kapsamında incelenebilir.

Simülasyon kapsamında gerçekleştirilen bu araştırmada 100 tekrar yapılmıştır.

Yapılacak olan araştırmalarda tekrar sayısı arttırılabilir. Bu araştırma kapsamında I. Tip hata ve istatistiksel güç oranları üzerinde durulmuştur. Ancak II. Tip hata kullanılarak da DMF kapsamında hem farklı puanlama modellerinin hem de tekniklerin karşılaştırılması yapılabilir.

Çok kategorili puanlama modellerinden istenilen verimi elde edebilmek için, nitelikli maddeler ve nitelikli madde seçeneklerinin yer aldığı testlere ihtiyaç vardır. Bu nedenle bu testleri geliştirmek ve nitelikli madde yazabilmek için teknikler geliştirilmeli, böylelikle çok kategorili puanlama modelinin kullanılması yaygınlaştırılmalıdır.

Araştırma kapsamında genel olarak çok kategorili puanlama modeli kapsamında yapılmış olan analizler sonucu I. Tip hata oranlarının daha düşük, istatistiksel güç oranlarının ise daha yüksek olduğu belirlenmiştir. Yani gerçekte DMF’li olarak belirlenen maddelerin DMF’li olarak ortaya çıkma olasılığı çok kategorili puanlama modelinde daha yüksektir. Benzer şekilde gerçekte DMF’li olmayan maddelerin DMF’li olarak belirlenme olasılığı da çok kategorili puanlama modelinde daha düşüktür. Bu doğrultuda DMF kapsamında çok kategorili puanlama modelinin daha sağlıklı sonuçlar verdiği ifade edilebilir. Bu nedenle, bireylerin yeteneğine ilşikin bilgi kaybını daha aza indiren, tam bilgi, kısmi bilgi ve yanlış bilgi ayrımlarını yapan puanlama modellerinin kullanılması önerilmektedir.

KAYNAKLAR

Abu-Sayf, F.K. (1979). The scoring of multiple-choice tests: A closer look. Educational Technology, 19, 5-15.

Ackerman, T. A., Gierl, M. J. and Walker, C. M. (2003). Using multidimensional item response theory to evaluate educational and psychological tests. Educational Measurement: Issues and Practice, 22, 37-53.

Atar, B. (2007). Differential item functioning analyses for mixed response data using IRT likelihood-ratio test, logistic regression, and GLLAMM procedures (Doctoral dissertation). The Florida State University, Florida ABD.

Awuor, R. A. (2008). Effect of unequal sample sizes on the power of DIF detection: An IRT-based Monte Carlo study with SIBTEST and Mantel-Haenszel procedures (Doctoral dissertation). Virginia Polytechnic Institute and State University, Virginia.

Bauer, D., Holzer, M., Kopp, V. and Fischer, M. R. (2011). Pick-N multiple choice-exams: a comparison of scoring algorithms. Advances in health sciences education, 16(2), 211-221.

Ben-Simon, A., Budescu, D.V. and Nevo, B. (1997). A comparative study of measures of partial knowledge in multiple-choice tests. Applied Psychological Measurement, 21 (1), 65-88.

Bilican, S. (2014). Çok kategorili puanlanan maddelerde madde işlev farklılığının mantel test ve olabilirlik oran testi ile karşılaştırılması (Doktora Tezi). Ankara Üniversitesi, Ankara.

Bolt, D. M. (2002). A Monte Carlo comparison of parametric and nonparametric polytomous DIF detection methods. Applied Measurement in Education, 15,113-141.

Camilli G. and Shepard L. A. (1994). Methods for identifying biased test items (volume 4). California: SAGE Publications. Inc.

Carter, N. T. (2011). Applications of differential functioning methods to the generalized graded unfolding model (Doctoral dissertation). Bowling Green State University, ABD.

Chaimongkol, S., Huffer, F. W. and Kamata, A. (2007). An explanatory differential item functioning (DIF) model by the WinBUG 1.4. Songklanakarin Journal of Science and Technology, 29(2), 449-458.

Chang, S. H., Lin, P. C. and Lin, Z. C. (2007). Measures of partial knowledge and unexpected responses in multiple-choice tests. Educational Technology &

Society, 10(4), 95-109.

Cherkas, B. M. and Roitberg, J. (1993). Humanizing the multiple‐choice test with partial credit. International Journal of Mathematical Education in Science and Technology, 24(6), 799-812.

Chevalier, S. A. (1998). A review of scoring algorithms for ability and aptitude tests.

Paper presented at the annual meeting of the Southwestern Psychological Association. New Orleans.

Choi, S. W., Gibbons, L. E. and Crane, P. K. (2011). LORDIF: An R package for detecting differential item functioning using iterative hybrid ordinal logistic regression/item response theory and Monte Carlo simulations. Journal of Statistical Software, 39(8), 1–30.

Chung, H., Kim, J., Cook, K. F., Askew, R. L., Revicki, D. A. and Amtmann, D.

(2014). Testing measurement invariance of the patient-reported outcomes measurement information system pain behaviors score between the US general population sample and a sample of individuals with chronic pain. Quality of Life Research, 23(1), 239-244.

Clauser, B. E. and Mazor, K. M. (1998). Using statistical procedures to identify differential item functioning test items. Educational Measurement: Issues and Practice, 17, 31-44.

Cook, K. A. (1999). A comparison of three polytomous item response theory models in the context of testlet scoring. Journal of Outcome Measurement, 3:1–20.

Cook, K. F., Bamer, A. M., Amtmann, D., Molton, I. R., and Jensen, M. P. (2012). Six patient-reported outcome measurement information system short form measures have negligible age-or diagnosis-related differential item functioning in individuals with disabilities. Archives of Physical Medicine and Rehabilitation,93(7), 1289-1291.

Covic, T., Pallant, J. F., Conaghan, P. G. and Tennant, A. (2007). A longitudinal evaluation of the Center for Epidemiologic Studies-Depression scale (CES-D) in a rheumatoid arthritis population using Rasch analysis. Health and Quality of Life Outcomes, 5(1), 1.

Crane, P. K., Gibbons, L., Ocepek-Weiklson, K., Cook, K., Cella, D., Narasimhalu, K., Hays, R. D. and Teresi, J. A. (2007). A comparison of three sets of criteria for determining the presence of differential item functioning using ordinal logistic regression. Quality of Life Research, 16(1), 69–84.

Crane, P.K., Belle, G. and Larson, E. B. (2004). Test bias in a cognitive test: differential item functioning in the CASI. Statistics in Medicine, 23:241–256.

Dainis, A. M. (2008). Methods for identifying differential item and test functioning: an investigation of Type I error rates and power (Doctoral Dissertation). James Madison University Assessment and Measurement, Harrisonburg, Virgina.

DeAyala, R. J. (1993). An introduction to polytomous item response theory models.

Measurement & Evaluation in Counseling & Development, 25(4).

DeMars, C. E. (2009). Modifcation of the Mantel-Haenszel and Logistic Regression DIF procedures to incorporate the SIBTEST regression correction. Journal of Educational and Behavioral Statistics, 34, 149- 170.

Diedenhofen, B. and Musch, J. (2015). Empirical option weights improve the validity of a multiple-choice knowledge test. European Journal of Psychological Assessment.

Dodeen, H. (2004). Stability of differential item functioning over a single population in survey data. Journal of Experimental Education, 72, 181- 193.

Doğanay, B. (2012). Çoklu atama yöntemlerinin Rasch modelleri için performansının benzetim çalışması ile incelenmesi (Doktora Tezi). Ankara Üniversitesi, Ankara.

Duncan, S. C. (2006). Improving the prediction of differential item functioning: A comparison of the use of an effect size for logistic regression DIF and Mantel-Haenszel DIF methods (Doctoral Dissertation). Educational Psychology, Texas A&M University, College Station, Texas.

Ellis, B. B. and Raju, N.S.(2003). Test and item bias: What they are, What they aren’t, and How to detect them. Web: http://files.eric.ed.gov/fulltext/ED480042.pdf adresinden 17 Mayıs 2016’da erişilmiştir.

Embretson, S.E. and Reise, S.P. (2000). Item response theory for psychologists.

London: Lawrence Erlbaum Associates.

Ercikan, K., Arim, R., Law, D., Domene, J. and Lacroix, S. (2010). Application of Think Aloud Protocols for examining and confirming sources of differential item functioning identified by expert reviews. Educational Measurement: Issues and Practice, 29, 24–35.

Erdem Keklik, D. (2012). İki kategorili maddelerde tek biçimli değişen madde fonksiyonu belirleme tekniklerinin karşılaştırılması: Bir simülasyon çalışması (Doktora Tezi). Ankara Üniversitesi, Ankara.

Feng, Y. (2008). Difference in gender differential item functioning patterns across item format and subject area on diploma examinations after change in administration procedure (Doctor Dissertation). University of Alberta, Edmonton, Alberta, Canada.

Fidalgo, A. M. and Bartram, D. (2010). A comparison of the LR and DFIT frameworks of differential functioning applied to the generalized graded unfolding model.

Applied Psychological Measurement, 34(8) 600–606.

Fidalgo, A. M.; Mellenberg, G. J. and Muniz, J. (2000). Effects of amount of DIF, test length, and purification type on robustness and power of Mantel-Haenszel procedures. Methods of Psychological Research Online, 5(3), 43–53.

Fieo, R., Mukherjee, S., Dmitrieva, N. O., Fyffe, D. C., Gross, A. L., Sanders, E. R., Romero, H.R., Potter, G.G., Manly, J.J., Mungas, D. M. and Gibbons, L. E.

(2014). Differential item functioning due to cognitive status does not impact depressive symptom measures in four heterogeneous samples of older adults.

International journal of geriatric psychiatry, 30(9), 911-918.

Finch, W. H. and French F. B. (2007). Detection of crossing differential item functioning: a comprasion of four methods. Educational and Psychological Measurement, 67(4), 565-582.

Fischer, G. H. (1993). Notes on the Mantel-Haenszel procedure and another chi squared test for the assessment of DIF. Methodika, 7, 88-100.

Frary, R. (1989). Partial credit scoring methods for multiple choice tests. Applied Measurement in Education, 2 (1), 79-96.

French, A. W., Miller, T.R. (1996). Logistic regression and its use in detecting differential item functioning in polytomous items. Journal of Educational Measurement, 33:315–332.

French, B.F. and Maller, S.J. (2007). Iterative purification and effect size use with logistic regression for differential item functioning detection. Educational and Psychological Measurement, 67(3):373–393.

Garcia-Perez, M.A. and Frary, R.B. (1989). Psychometric properties of finite state scores versus number-correct and formula scores: A simulation study. Appl.

Psychol. Measure, 13, 403-417.

Garrett, P. (2009). A Monte Carlo study ınvestigating missing data, differential item functioning and effect size (Doctoral Dissertation). Georgia State University, ABD.

Gelin, M.N. and Zumbo, B.D. (2003). Differential item functioning results may change depending on how an item is scored: an illustration with the center for epidemiologic studies depression scale. Educational and Psychological Measurement, 63(1), 65-74.

Gierl, M. J. (2005). Using dimensionality-based DIF analyses to identify and interpret constructs that elicit group differences. Educational Measurement: Issues and Practice, 24, 3-14.

Gierl, M. J., Jodoin, M. G. and Ackerman, T. A. (2000). Performance of Mantel-Haenszel, simultaneous item bias test, and logistic regression when the proportion of DIF items is large. In Annual Meeting of the American Educational Research Association, New Orleans, Louisiana, USA.

Gök, B., Kelecioğlu, H. ve Doğan, N. (2010). Değişen Madde Fonksiyonunu belirlemede Mantel–Haenszel ve Lojistik Regresyon tekniklerinin karşılaştırılması. Eğitim ve Bilim, Cilt:35, Sayı:156.

Gözen Çıtak, G. (2007). Klasik test ve madde-tepki kuramlarına göre çoktan seçmeli testlerde farklı puanlama yöntemlerinin karşılaştırılması (Doktora Tezi). Ankara Üniversitesi, Ankara.

Grunert, M. L., Raker, J. R., Murphy, K. L. and Holme, T. A. (2013). Polytomous versus dichotomous scoring on multiple-choice examinations: development of a rubric for rating partial credit. Journal of Chemical Education, 90(10), 1310-1315.

Gürsakal, N. (2014). İstatistikte R ile Programlama. Bursa: Dora Basım.

Hagquist, C. and Andrich, D. (2004). Is the sense of coherence-instrument applicable on adolescents? A latent trait analysis using Rasch-modelling. Personality and individual differences, 36(4), 955-968.

Hahn, E. A., Kallen, M. A., Jacobs, E. A., Ganschow, P. S., Garcia, S. F. and Burns, J.

L. (2014). English-Spanish equivalence of the health literacy assessment using talking touchscreen technology (Health LiTT). Journal of Health Communication: International Perspectives, 19(2), 285-301.

Haladyna, T. M. (1990). Effects of empirical option weighting on estimating domain scores and making pass/fail decisions. Applied Measurement in Education, 3, 231–244.

Henderson D. L. (2001). Prevalence of gender DIF in mixed format high school exit examinations. Paper presented at the Annual Meeting of the American Educational Research Association, Seattle.

Herrera, A. ve Gomez, J. (2008). Influence of equal or unequal comparison group sample sizes on the detection of differential item functioning using the Mantel-Haenszel and logistic regression techniques. Quality and Quantity, 42, 739-755.

Higaldo, M. D. ve Lopez-Pina, J. A. (2004). Differential item functioning detection and effect size: A comparison between logistic regression and Mantel-Haenszel procedures. Educational and Psychological Measurement, 64, 903-915.

Holland, P. W. and Thayer, D. T. (1988). Differential item performance and the Mantel Haenszel procedure. In H. Wainer and H. I. Braun (Eds.), Test validity (pp. 129-145). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.

Hong, S. and Roznowski, M. (2001). An investigation of the influence of internal test bias on regression slope. Applied Measurement in Education, 14(4), 351-68.

Jafari, P., Allahyari, E., Salarzadeh, M. and Bagheri, Z (2015). Item-level informant discrepancies across obese–overweight children and their parents on the PedsQL™ 4.0 instrument: an iterative hybrid ordinal logistic regression. Quality of Life Research, 1-9.

Jodoin, M. G. and Gierl, M. J. (2001). Evaluating Type I error and power rates using an effect size measure with logistic regression procedure for DIF detection. Applied Measurement in Education, 14, 329–349.

Johnson Frotman, K., A. (2007). The evaluation of new criteria for polytomous DIF in the DFIT framework (Doctoral Dissertation). Illinois Institute of Technology, Chigago.

Juhel, J. ve Gaillot, A. C. (2012). Structural validity and age-based differential item functioning of the French Nottingham Health Profile in a sample of surgery patients. Advances Psychology Study,1, 14-21.

Kalaycıoğlu, D.B. (2008). Öğrenci seçme sınavı’nın madde yanlılığı açısından incelenmesi (Doktora Tezi). Hacettepe Üniversitesi Sosyal Bilimler Enstitüsü Eğitim Bilimleri Anabilim Dalı, Ankara.

Karami H. and Nodoushan M. A. S. (2011). Differential item functioning (DIF): current problems and future directions. International Journal of Language Studies, 5-4:

133-142.

Karasar, N. (2005). Bilimsel Araştırma Yöntemi. Ankara: Nobel.

Kim, J. (2010). Controlling type 1 error rate in evaluating differential item functioning for four DIF methods: Use of three procedures for adjustment of multiple item testing (Doctoral Dissertation). Georgia State University, ABD.

Koch, W. R. and Dodd, B. G. (1989). An investigation of procedures for computerized adaptive testing using partial credit scoring. Applied Measurement in Education, 2(4), 335-357.

Kristjansson, E., Aylesworth, R., Mcdowell, I. and Zumbo, B. D. (2005). A comparison of four methods for detecting differential item functioning in ordered response items. Educational and Psychological Measurement, 65(6), 935-953.

Kurz, T.B. (1999). A review of scoring algorithms for multiple-choice test items. EDRS Publications, Report No: ED 428 076.

Lai, J. S., Teresi, J. and Gershon, R. (2005). Procedures for the analysis of differential item functioning (DIF) for small sample sizes. Evaluation & the Health Professions, 28(3), 283-294.

Lau, C. A. and Wang, T. (1998). Comparing and combining dichotomous and polytomous ,tems with SPRT procedure in computerized classification testing.

Paper presented at the Annual Meeting of the American Educational Research Association, San Diego.

Lau, P. N. K., Lau, S. H., Hong, K. S. and Usop, H. (2011). Guessing, partial knowledge, and misconceptions in multiple- choice tests. Educational Technology and Society, 14, 99–110.

Lewis, C. (1993). A note on the value of including the studied item in the test score when analyzing test items for DIF. In P. W. Holland and H. Wainer (Eds.), Differential item functioning (pp. 317-319). Hillsdale, NJ: Lawrence Erlbaum Associates Inc.

Li, Y., Brooks, G. P. and Johanson, G. A. (2012). Item discrimination and Type I error in the detection of differential item functioning. Educational and Psychological Measurement, 72(5), 847-861.

Meredith, W. and Millsap, R. E. (1992). On the misuse of manifest variables in the detection of measurement bias. Psychometrika, 57,289-311.

Meyer, J.P., Huynh, H. and Seaman, M.A. (2004). Exact small-sample differential item functioning methods for polytomous items with illustration based on an attitude survey. Journal of Educational Measurement, 41, 331-344.

Miller, T. and R., Spray, J. A. (1993). Logistic discriminant function analysis for DIF identification of polytomously scored items. Journal of Educational Measurement, 30:107–122.

Millsap, R. E and Everson, H. T. (1993). Methodology review: Statistical approaches for assesing measurement bias. Applied Psychological Measurement, 17(4), 297-334.

Millsap, R. E. and Meredith, W. (1992). Inferential conditions in the statistical detection of measurement bias. Applied Psychological Measurement, 16, 389402.

Narayanan, P. and Swaminathan, H. (1994). Performance of the Mantel-Haenszel and simultaneous item bias procedures for detecting differential item functioning.

Applied Psychological Measurement, 18(4), 315-328.

Ong, Y.M., Williams, J. S. and Lamprianou, I. (2011). Exploration of the validity of gender differences in mathematics assessment using differential bundle functioning. International Journal of Testing, 11, 271-293.

Osterlind, S. (1990). Test item bias. Newbury Park: Sage Publications.

Osterlind, S. J. and Everson, H. T. (2009). Differential item functioning (Vol. 161).

Sage Publications.

Ostini, R. and Nering, M. L. (2006). Polytomous item response theory models (No.

144). California: Sage.

Özdemir, D. (2003). Çoktan seçmeli testlerde iki kategorili ve önsel ağırlıklı puanlamanın diferansiyel madde fonksiyonuna etkisi ile ilgili bir araştırma.

Eğitim ve Bilim, 28, 129, 37–43.

Paek, I. and Wilson, M. (2011). Formulating the Rasch differential item functioning model under the marginal maximum likelihood estimation context and its comparison with Mantel–Haenszel procedure in short test and small sample conditions. Educational and Psychological Measurement, 71(6), 1023-1046.

Pallant, J. F. and Tennant, A. (2007). An introduction to the Rasch measurement model:

an example using the Hospital Anxiety and Depression Scale (HADS). British Journal of Clinical Psychology, 46(1), 1-18.

Paz, S. H., Spritzer, K. L., Morales, L. S. and Hays, R. D. (2013). Age-related differential item functioning for the patient-reported outcomes information system (PROMIS®) physical functioning items. Primary health care, 3(131).

Penfield, R. D. and Lam, T. (2000). Assessing differential item functioning in performance assessment: Review and recommendations. Educational Measurement: Issues and Practice, 19(3), 5-15.

Perrone, M. (2006). Differential item functioning and item bias: Critical considerations in test fairness. Teachers College, Columbia University Working Papers in TESOL and Applied Linguistics, 6, 1-3.

Piquero, A.R., Macintosh, R. and Hickman, M. 2000. Does self-control affect survey response? Applying exploratory, confirmatory, and item response theory analysis to Grasmick et al.’s self control scale. Criminology, 38, 897-930.

Price, E. A. (2014). Item Discrimination, Model-Data Fit, and Type I Error Rates in DIF Detection using Lord’s χ2, the Likelihood Ratio Test, and the Mantel-Haenszel Procedure (Doctoral Dissertation). Ohio University, ABD.

Qian. X. (2011). A Multi-Level Differential Item Functioning Analysis of Trends In International Mathematics And Science Study: Potential Sources Of Gender And Minority Difference Among U.S. Eighth Graders' Science Achievement.

Rodriguez, H. P. and Crane, P. K. (2011). Examining Multiple Sources of Differential Item Functioning on the Clinician & Group CAHPS® Survey. Health Services Research,46(6).

Roever, C. (2005). That’s not fair! Fairness, bias, and differential item functioning in language testing. SLS Brownbag, 9(15), 1-14.

Ryan, K. E. and Chiu, S. (2001). An examination of item context effects, DIF, and gender DIF. Applied Measurement in Education, 14, 73-90.

Salehi, M., Sadighi, F. and Bagheri, M. S. (2015). Comparing confidence-based and conventional scoring methods: The case of an English grammar class. Journal of Teaching Language Skills, 6(4), 123-152.

Samuelsen, K. M. (2005). Examining differential item functioning from a latent class perspective (Doctoral Dissertation). Department of Measurement, Statistics and Evaluation, University of Maryland, ABD.

Selvi, H. (2013). Klasik test ve madde tepki kuramlarına dayalı değişen madde fonksiyonu belirleme tekniklerinin farklı puanlama durumlarında incelenmesi (Doktora Tezi). Mersin Üniversitesi, Mersin.

Stark, S., Chernyshenko, O. S. and Drasgow, F. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy. Journal of Applied Psychology, 91(6), 1292–1306.

Stephens-Bonty, T. A. (2008). Using three different categorical data analysis techniques to detect differential ıtem functioning (Doctoral Dissertation).

College of Education, Georgia State University, ABD.

Su, Y. H., and Wang, W. C. (2005). Efficiency of the Mantel, Generalized Mantel Haenszel, and Logistic Discriminant Function analysis methods in detecting differential item functioning for polytomous items. Applied Measurement in Education, 18, 313-350.

Swaminathan, H. and Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27:361–

370.

Tang, H. (1994). A new IRT-based small sample DIF method. Paper presented at the annual meeting of the Southwest Educational Research Association, San Antonio.

Tennant, A., Küçükdeveci, A. A., Kutlay, S. and Elhan, A. H. (2006). Assessing normative cut points through differential item functioning analysis: An example from the adaptation of the Middlesex Elderly Assessment of Mental State (MEAMS) for use as a cognitive screening test in Turkey. Health and quality of life outcomes, 4(1), 1.

Tennant, A., McKenna, S. P. and Hagell, P. (2004). Application of Rasch analysis in the development and application of quality of life instruments. Value in Health, 7(s1), S22-S26.

Teresi, J. A., Ramirez, M., Lai, J. S. and Silver, S. (2008). Occurrences and sources of Differential Item Functioning (DIF) in patient-reported outcome measures:

Description of DIF methods, and review of measures of depression, quality of life and general health. Psychology science quarterly, 50(4), 538.

Thissen, D. and Wainer, H. (2001). Test Scoring. New Jersey: Lawrence Erlbaum.

van der Linden, W. J. (2005). Linear models for optimal test design. New York:

Springer-Verlag.

Walker C. M. (2011). What’s the DIF? Why differential item functioning analyses are an important part of instrument development and validation. Journal of Psycholoeducational Assessment, 29: 364. 63.

Walker, C. M., Beretvas, S. N. and Ackerman, T. A. (2001). An examination of conditioning variables used in computer adaptive testing for DIF. Applied Measurement in Education, 14, 3-16.

Wang, W. C. (2000). The simultaneous factorial analysis of differential item functioning. Methods of Psychological Research Online, 5(1), 57-75.

Wang, W., Tay, L. and Drasgow, F. (2013). Detecting differential item functioning of polytomous items for an ideal point response process. Applied Psychological Measurement, 37(4), 316-335.

Wang, W. C. and Su, Y. H. (2004). Factors influencing The Mantel and Generalized Mantel-Haenszel Methods for the assessment of differential item functioning in polytomous ıtems. Applied Psychological Measurement, 28, 450-481.

Wen, Y. (2014). DIF analyses in multilevel data: identification and effects on ability estimates (Doctoral Dissertation). The University of Wisconsin Milwaukee, ABD.

Wetzel, E., Böhnke, J.R.,Carstensen, C.H.,Ziegler, M. and Ostendorf, F. (2013). Do individual response styles matter? Assessing differential item functioning for men and women in the NEO-PI-R. Journal of Individual Differences, 34(2), 69–

81.

Whitmore, M. L. and Schumacker, R. E. (1999). A comparison of logistic regression and analysis of variance differential item functioning detection methods.

Educational and Psychological Measurement, 59(6), 910-927.

Wiberg, M. (2007). Measuring and detecting differential item functioning in criterion-referenced licensing test: A theoretic comparison of methods (Doctoral Dissertation). Umea university, Sweden.

Wongwiwatthananukit, S., Popovich, N. G. and Bennett, D. E. (2000). Assessing pharmacy student knowledge on multiple-choice examinations using partial-credit scoring of combined-response multiple-choice items. American Journal of Pharmaceutical Education, 64(1), 1.

Wood, W. S. (2011). Differential item functioning procedures for polytomous items when examinee sample sizes are small (Doctoral Dissertation). Graduate College of The University of Iowa, Iowa.

Wyse, A. E. and Mapuranga, R. (2009). Differential item functioning analysis using Rasch item information functions. International Journal of Testing, 9(4), 333-357.

Yurdugül, H. (2010). Farklı madde puanlama yöntemlerinin ve farklı test puanlama yöntemlerinin karşılaştırılması. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 1, 1-8.

Yurdugül, H. and Aşkar, P. (2004). Ortaöğretim kurumları Öğrenci Seçme ve Yerleştirme Sınavının, öğrencilerin yerleşim yerlerine göre, diferansiyel madde fonksiyonu açısından incelenmesi. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 27(27).

Yüksel, S. (2012). Ölçeklerde saptanan madde işlev farklılığının karma Rasch modelleri ile incelenmesi (Doktora Tezi). Ankara Üniversitesi, Ankara.

Zenisky A. L., Hambleton R. K. and Robin F. (2003). DIF detection and interpretation in large scale science assessments: Informing item writing practices. Center for Educational Assessment MCAS Validity Report, 1. (CEA-429).

Zhong, Q., Gelaye, B., Fann, J. R., Sanchez, S. E. and Williams, M. A. (2014). Cross-cultural validity of the Spanish version of PHQ-9 among pregnant Peruvian women: a Rasch item response theory analysis. Journal of Affective Disorders, 158, 148-153.

Ziaee, R. (2007). Comparison of logistic regression and Mantel-Haenszel statistical procedures to predict length of stay of four diagnosis- related groups (Doctoral Dissertation). Wayne State University Educational Evaluation and Research, Michigan.

Zieky, M. (1993) Practical questions in the use of DIF statistics in test development.

Web: http://psycnet.apa.org/psycinfo/1993-97193-017 adresinden 17 Mayıs 2016’da erişilmiştir.

Zumbo, B. D. (1999). A Handbook on the Theory and Methods of Differential Item Functioning (DIF): Logistic Regression Modeling as a Unitary Framework for Binary and Likert-Type (Ordinal) Item Scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.

Zumbo, B. D. (2007). Three generations of DIF analyses: Considering where it has been, where it is now, and where It Is Going. Language Assessment Quarterly, 4(2), 223–233.

Zumbo, B. D. and Gelin, M. N. (2005). A matter of test bias in educational policy research: bringing the context into picture by investigating sociological community moderated (or mediated) test and item bias. Journal of Educational Research and Policy Studies, 5, 1-23.

Zumbo, B. D. and Thomas, D. R. (1996). A measure of DIF effect size using logistic regression procedures. National Board of Medical Examiners, Philadelphia, PA.

Zwick, R. (1990). When do item response function and Mantel-Haenszel definition of differential item functioning coincide? Journal of Educational Statistics, 15(3):185–197.

Zwick, R. (2012). A Review of ETS Differential İtem Functioning assessment procedures: flagging rules, minimum sample size requirements, and criterion refinement. ETS Research Report Series, i-30.

Zwick, R., Donoghue, J. R. and Grima, A. (1993). Assessment of differential item functioning for performance tasks. Journal of Educational Measurement, 30, 233-251.

EKLER

Belgede ANKARA ÜNİVERSİTESİ EĞİTİM BİLİMLERİ ENSTİTÜSÜ (sayfa 99-114)