بهمنآبادی، سمیه، فلسفینژاد، محمدرضا، فرخی، نورعلی، مینایی، اصغر. (1403). نقش تخطی از تکبعدی بودن آزمون در خطاهای همترازسازی مدلهای نظریه سؤال پاسخ و نظریه کلاسیک. فصلنامه اندازهگیری تربیتی، 14 (56)، 7-41. https://doi.org/10.22054/jem.2024.49153.1991
صادقی، میثم، فلسفی نژاد، محمدرضا، دلاور، علی، فرخی، نورعلی و جمالی، احسان. (1397). تأثیر مدل وزن دهی و نمره کل سازی سوابق تحصیلی بر کارایی گزینش داوطلبان ورود به دانشگاهها و مراکز آموزش عالی کشور. پژوهش در نظامهای آموزشی، 12(ویژهنامه), 27-43.
لرد، فردریک. ام. (1391). کاربردهای نظریه سؤال-پاسخ (دلاور، علی و یونسی، جلیل، مترجمان). تهران، انتشارات رشد. (انتشار نسخه اصلی، 1980)
مقدمزاده، علی. (1395). روش بهینه هموارسازی دادهها در همترازسازی: موردمطالعه آزمون تولیمو و آزمونهای جامع آزمونهای آزمایشی سازمان سنجش آموزش کشور. فصلنامه اندازهگیری تربیتی، 6 (22)، 261-287.
Aera, A. P. A. (2014). Standards for educational and psychological testing. New York: American Educational Research Association.
Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Monterey, California: Brooks/Cole Publishing Company.
Bahmanabadi, S., Falsafinejad, M., Farrokhi, N., & Minaei, A. (2024). The Role of Test Unidimensionality Violation in Equating Errors of IRT and Classical Theory Models. Educational Measurement, 14(56), 7-41. [in Persian]
Berenbon, R. F., & McHugh, B. C. (2023). Do Subject Matter Experts’ Judgments of Multiple‐Choice Format Suitability Predict Item Quality?. Educational Measurement: Issues and Practice, 42(3), 13-21.
Crocker, L.M., Algina, J. (2008). Introduction to Classical and Modern Test Theory. Cengage Learning.
Dorans, N. J., & Holland, P. W. (2000). Population invariance and the equitability of tests: Basic theory and the linear case. Journal of educational measurement, 37(4), 281-306.
Ferguson, C. J. (2009). An effect size primer: A guide for clinicians and researchers.
Professional Psychology: Research and Practice, 40(5), 532–538.
https://doi.org/10.1037/a0015808
Ferrara, S., Svetina, D., Skucha, S., & Davidson, A. H. (2011). Test development with performance standards and achievement growth in mind. Educational Measurement: Issues and Practice, 30(4), 3–15. https://doi.org/10.1111/j.1745-3992.2011.00218.x
González Burgos, J. A., & Wiberg, M. (2017). Applying test equating methods, using R.
Goodwin, L. D. (1996). Focus on quantitative methods: Determining cut-off scores. Research in Nursing & Health, 19, 249–256.
Graham, J. M. (2006). Congeneric and (essentially) tau-equivalent estimates of score reliability what they are and how to use them. Educational and Psychological Measurement, 66(6), 930-944.
Hair J.F., Jr., Black W.C., Babin B.J., Anderson R.E. Multivariate Data Analysis. 7th ed.
Hair Jr, J. F., Hult, G. T. M., Ringle, C. M., Sarstedt, M., Danks, N. P., & Ray, S. (2021). Partial least squares structural equation modeling (PLS-SEM) using R: A workbook (p. 197). Springer Nature.
Hambleton, R. K., & Jirka, S. J. (2014). Anchor-based methods for judgmentally estimating item statistics. In Handbook of test development (pp. 413-434). Routledge.
Holland, P. W., & Dorans, N. J. (2006).Linking and equating. Educational measurement, 4, 187-220.
Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6(1), 1-55. https://doi.org/10.1080/10705519909540118
Kiessling, C., Lahner, F.-M., Winkelmann, A., & Bauer, D. (2018). When predicting item difficulty, is it better to ask authors or reviewers? Medical Education, 52(5), 571–572. https://doi.org/10.1111/medu.13570
Kolen, Michael & Brennan, Robert. (2014). Test equating, scaling, and linking. Methods and practices. 3rd revised ed. 10.1007/978-1-4939-0317-7.
Liang, Z., Zhang, M., Huang, F., Kang, D., & Xu, L. (2021). Application innovation of educational measurement theory, method, and technology in China’s New College Entrance Examination Reform. Chinese/English Journal of Educational Measurement and Evaluation, 2(1), 3.
Livingston, S. A. (2014). Equating test scores (without IRT). Educational testing service.
Lord, F. M. (2012). Applications of Item Response Theory to Practical Testing Problems (Delavar, A., Younesi, J, Trans). Tehran, Roshd publication. (Original work published 1980). [in Persian]
MoghadamZade, A. (2015). Optimal Smoothing Method of Data in Test Equating: The Case of TOLIMO and Comprehensive Trial Tests of Iran Educational Testing Organization. Quarterly of Educational Measurement, 6(21), 261-287. [in Persian]
Nunnally, J. C. (1978). Psychometric theory. New York, NY: McGrawHill.
Penfield, R. D. (2013). Item analysis. In K. F. Geisinger, B. A. Bracken, J. F. Carlson, J.-I. C. Hansen, N. R. Kuncel, S. P. Reise, & M. C. Rodriguez (Eds.),
APA handbook of testing and assessment in psychology, Vol. 1. Test theory and testing and assessment in industrial and organizational psychology (pp. 121–138). American Psychological Association.
https://doi.org/10.1037/14047-007
Raykov, T. (2001). Bias of coefficient α fixed congeneric measures with correlated errors. Applied psychological measurement, 25(1), 69-76.
Rezigalla, A. A. (2024). AI in medical education: uses of AI in construction type A MCQs. BMC medical education, 24(1), 247.
Sadeghi, M., Falsafinezhad, M., Delavar, A., Farrokhi, N; & Jamali, E (2018). The effect of the weighting model and the composite score of academic records on the efficiency of selecting candidates to enter the universities and higher education centers of the country. Journal of Research in Educational Systems, 12, (Special Issue), 27-43. [in Persian]
Schmeiser, C. B., & Welch, C. J. (2006). Test development. Educational measurement, 4, 307-353.
Sun, T., & Kim, S. Y. (2023). Evaluating Equating Methods for Varying Levels of Form Difference. Educational and Psychological Measurement, 00131644231176989.
Tarrant, M., & Ware, J. (2008). Impact of item-writing flaws in multiple choice questions on student achievement in high-stakes nursing assessments: Item-writing flaws and student achievement. Medical Education, 42(2), 198–206. https://doi.org/10.1111/j.1365-2923. 2007.02957.
Thorndike, R, L. (1996). Apllied psychometrics (Hooman, H,A, Trans). Houghton Mifflin School. (Original work published 1982(
van de Watering, G., & van der Rijt, J. (2006). Teachers’ and students’ perceptions of assessments: A review and a study into the ability and accuracy of estimating the difficulty levels of assessment items. Educational Research Review, 1(2), 133–147. https://doi.org/10.1016/j.edurev.2006.05.001
Viladrich, C., Angulo-Brunet, A., & Doval, E. (2017). A journey around alpha and omega to estimate internal consistency reliability. Anales de psicología, 33(3), 755-782.
Wendler, C. L., & Walker, M. E. (2015). Practical issues in designing and maintaining multiple test forms. In Handbook of test development (pp. 433-449). Routledge.