INTRODUCTION
Accountability and assessment literacy are now served as rudimental features for all teachers and play vital roles in teacher education programs (Xu & Brown, 2016). According to Scarino (2013), the effectiveness of a language program is highly dependent on deep understanding, clear awareness, and careful implementation of assessment techniques. The implementation of diverse assessment strategies to evaluate and enhance student performance has been a focus of attention in the field of English Language Teaching. Considering the concept of education, assessment is highly advantageous. Not only does it reflect teachers' success in teaching but also it shows learners' progress and improvement in the classroom setting. Moreover, according to Öz and Atay (2017), assessment helps teachers to “recognize what is wrong, what is right, and what parts need to be changed, improved, or omitted” (p.26). Integrating assessment and instruction to supporting, monitoring, and reporting students’ learning and demonstrating educational standards is recommended to all teachers throughout the world (DeLuca, LaPointe-McEwan, & Luhanga, 2016; Gotch & French, 2014). In an effort to enhance teachers' classroom assessment methods, many researchers (e.g., DeLuca, LaPointe-McEwan, & Luhanga, 2016; Gotch & French, 2014; Mertler & Campbell, 2005) developed different tools and instruments to investigate and monitor teachers' assessment literacy. In their systematic review of assessment literacy measures, Gotch and French (2014) discovered that there is little psychometric data to support these measures, and that existing instruments lack representativeness and relevance of content in light of developments in the assessment area. These findings are not surprising considering that most of assessment literacy instruments are based on early 1990s assessment standards (i.e., Standards for Teacher Competency in Educational Assessment of Students [STCEAS], American Federation of Teachers [AFT], National Council on Measurement in Education [NCME], and National Education Association [NEA], 1990; Gotch & French, 2014; DeLuca, LaPointe-McEwan, & Luhanga, 2016).
Using the 1990s standards as a guideline, instruments such as the Teachers Assessment Literacy Questionnaire (TALQ) (Plake, Impara, & Fager,1993) and Classroom Assessment Literacy Inventory (CALI) (Mertler, 2003) were developed to investigate teachers’ assessment literacy levels. These assessment literacy instruments were designed to evaluate teachers' knowledge of the language skills and to highlight their strengths and weaknesses in assessment literacy. Although the strong and weak areas identified in these studies differed by various samples, the overall agreement was that teacher assessment knowledge was often inadequate in comparison to standards and expectations. Brookhart (2011) identified that the 1990s assessment standards no longer account properly for the diversity of assessment activities or the assessment expertise required by teachers in current educational landscape and their assessment needs.
LITERATURE REVIEW
Previous Instruments of Teachers’ Assessment Literacy
Several studies (e.g., DeLuca, LaPointe-McEwan, & Luhanga, 2016; Mertler & Campbell, 2005; Gotch & French, 2014; Plake, et al.,1993) have been conducted to investigate teachers and students’ perceptions of assessment literacy. Most of these studies were quantitative and based on the original 1990s Standards for Teacher Competence in Educational Assessment of Students (AFT, NCME, & NEA, 1990).
In their study, Mertler and Campbell (2005) established an Assessment Literacy Inventory (ALI) aiming at helping teachers and school administrators establish a reliable and valid process of grading students. Moreover, they intended to help teachers take the advantage of this inventory in terms of professional development and in-class assessment.
Considering the concept of language assessment literacy from the viewpoint of teachers, Rezaeifard and Tabatabaei (2018) investigated 52 Iranian EFL teachers’ perceptions of assessment literacy. They used Mertler’s (2003) Classroom Assessment Literacy Inventory (CALI) as the main instrument of their study. After analyzing the data, they showed that majority of the participants were at the low level of assessment literacy. Furthermore, in their mixed methods study on 16 in-service and preservice Iranian EFL teachers, Dehqan and Asadian Sorkhi (2020) revealed that years of teaching experience played a vital role in demonstrating teachers’ knowledge of assessment. They argued that in-service teachers were more literate in assessment compared to pre-service ones. Moreover, they asserted that teachers were not interested in implementing assessment literacy skills in their classes. Therefore, they suggested that both practical and theoretical concepts of assessment literacy be incorporated in teachers’ education programs.
Considering students’ perceptions of assessment literacy in the foreign language context of Iran, Brown, Pishghadam and Sadafian (2014) used the Students’ Conceptions of Assessment (SCoA) inventory as the instrument of their study for examining their conceptions of assessment. Their findings showed that all the 760 Iranian university students who participated in their study had both positive and negative attitudes toward assessment. In other words, they claimed that although assessment might improve both learning and teaching, it might hinder learning development. This conclusion is partially supported by Tong and Adamson’s (2015) study, conducted in the foreign language-teaching context of Hong Kong. They revealed that most of the students agreed that feedback helped their learning, but they were not satisfied with their teacher’s feedback. They also had a negative feeling toward the concept of assessment.
Considering teachers’ assessment competency and skills, Plake et al. (1993) conducted a significant research study in which they investigated the assessment skills of 555 teachers and 268 administrators from 45 different states in the USA. The Teacher Competencies Assessment Questionnaire (TCAQ), a 35-item instrument, was designed in the first phase of the study to assess the seven competency criteria. The instrument was assessed by a 10-member NCME panel to determine construct validity before it was pilot-tested with 70 instructors to get reliability estimates (Crocker & Algina, 2006). As the first study to investigate teacher assessment competency, the findings revealed considerable gaps in teachers' comprehension and implementation of assessment as shown by a sixty-six percent average score across the 35 items. Participants lacked skill in reading, integrating, and conveying assessment data, in particular. Later, they used their findings for continuing professional development programs. Based on Plake et al.’s (1993) study, O'Sullivan and Johnson (1993) employed the TCAQ with 51 graduate students engaged in a teacher assessment course. The course introduced students to performance-based assignments that were tied to the Standards (AFT, NCME, & NEA, 1990). Participants were invited to complete the Classroom Assessment Tasks (CAT) survey in addition to the pre- and post-TCAQ administrations to examine alignment between the Standards and the performance assessment tasks. Responses to Classroom Assessment Tasks indicated a significant alignment of performance tasks with the Standards, providing further validity evidence for the TCAQ. In a similar study, A revised version of the TCAQ was administered to 220 undergraduate students engaged in a pre-service measuring course by Campbell, Murphy, and Holt (2002). They discovered that teacher candidates' proficiency varied across the seven Standards based on their investigation. As a result, these researchers found that teacher candidates lacked crucial parts of competency when they entered the teaching profession.
Moreover, Mertler and Campbell (2004, 2005) collaborated on reconceptualizing the TCAQ into the Assessment Literacy Inventory (ALI). Their goal was to contextualize the items by reorganizing them into scenario-based questions, reflecting a more realistic approach to the Standards (AFT, NCME, & NEA, 1990). The ALI consisted of seven scenarios, each of which was tied to one of the Standards and was accompanied by a set of five multiple-choice. Like O'Sullivan and Johnson (1993), Mertler and Campbell (2004, 2005) administered the ALI to instructors participating in a measurement course to assess student learning in reference to the Standards.
Moreover, considering the concept of critical language assessment, Tajeddin, Khatib, and Mahdavi (2022) have recently developed an inventory to assess EFL teachers’ Critical Language Assessment Literacy (CLAL). The CLAL scale consisted of five factors which are (a)teachers’ knowledge of assessment objectives, scopes, and types; (b) assessment use consequences; (c) fairness; (d) assessment policies; and (e) national policy and ideology.
Therefore, considering all the relevant studies, it was found that developing and validating a reliable scale to fulfill teachers’ language assessment literacy is highly needed.
PURPOSE OF THE STUDY
Assessment is now served as a vital component of the language teaching process. It enables teachers to improve or change their instructional practices and also helps them evaluate students’ progress and achievement in learning (Harris, Irving, & Peterson, 2008). Assessment literacy in foreign/second language learning and teaching is crucially important since it enables language teachers to understand, analyze, and utilize the information to improve their instruction (Falsgraf, 2005; Scarino, 2013). Furthermore, knowledge of assessment literacy helps language teachers choose the most effective and appropriate instruments to assess students’ learning and progress (Siegel & Wissehr, 2011).
Despite the increasing and considerable importance of language assessment literacy, teachers’ needs in the assessment literacy landscape has remained unexplored in many educational contexts. Moreover, since most of the teachers are involved in the process of decision-making and spend much of their professional time on developing and designing assessment-related tasks and activities, it is not still satisfactory (Brookhart, 2011, DeLuca & Klinger, 2010; Popham, 2009; Galluzzo, 2005; Zhang & Burry-Stock, 1997). Therefore, the aim of this study is to develop a valid and reliable instrument that is representative of teachers’ needs in assessment literacy, specifically the Iranian EFL high school teachers. In particular, an instrument that answers teachers' current needs within the existing educational accountability system and accounts for the numerous dimensions of assessment literacy beyond just addressing assessment purposes is required.
Particularly, this study drew on the Fulcher’s assessment literacy framework (2012) as a basis for delineating EFL teachers’ assessment literacy needs.
METHOD
Participants
Since Iran is a geographically vast country, it is not possible to collect data from every province and municipality. Therefore, the convenience sampling strategies in which the subjects are selected because of their convenient accessibility were employed for data collection purposes. Using Krejcie and Morgan’s (1970) sample size table, the participants of the study were 159 Iranian EFL high school teachers working at public schools in different cities. The major of most of the participants was English language teaching, and a few of them had studied linguistics, translation, and English literature majors. All had more than five years of teaching experience. They also had different university degrees (BA, MA, or PhD).
The Instrument-Developing Procedure
To create a complete assessment literacy inventory that is representative of EFL teachers’ current assessment literacy needs, the researchers adopted a multistep development method. To be more specific, the researchers (a) conducted a document analysis of prior and current assessment standards to aid them in early item construction, and (b) gathered validity data to support the intended interpretations and applications of the instrument. The 2014 Standards for Educational and Psychological Testing (American Educational Research Association, American Psychological Association, and NCME, 2014) describe five sources of validity evidence (content, response processes, internal structure, relationship to other variables, and consequences). This article presents evidence of validity based on content and internal structure (construct).
According to Dörnyei (2007) the process of developing a standard questionnaire is a challenging procedure which requires some stages which are (a)initial item development, (b) initial piloting of the items, (c) final piloting and item analysis.
Initial Item Development
The first stage of a questionnaire development is collecting as many potential items for each section and creating a collection of items called the “item pool”. In doing so, the researchers used two different sources: (a) reviewing the current literature on language assessment literacy including similar questionnaires and Fulcher’s framework of language assessment literacy (2012), and borrowing some proper items from those published questionnaires that were properly acknowledged; and (b) interviewing EFL teachers and asking their needs and challenges in assessment literacy.
Since the Fulcher’s language assessment literacy framework (2012) was the basis of this study, the item pool consisted of different types of items originated from the Fulcher’s framework, and some items were based on EFL teachers’ expectations of Continuing Professional Development (CPD) programs about assessment literacy (97 items). The items were about EFL teachers’ knowledge and principles of assessment, familiarity with test processes, skills, and abilities to place knowledge in real situations, and their expectations of Continuing Professional Development (CPD) programs.
In total, in creating the item pool, the form of the items, the wording of the items, and the types of responses that the questionnaire is designed to induce were taken into account. Furthermore, all items were based on a five-point Likert scale, in which the participants had to determine their needs in assessment literacy by indicating the extent to which they agree with each statement using (1) not at all, (2) very little, (3) little, (4) moderate, (5) a lot for all items.
Evidence of Validity Based on Content: Expert-panel Review
The degree to which the measurement encompasses most of the dimensions of the concept under research is referred to as content validity; hence, an instrument is deemed valid if it considers all the associated features of the concept under research. Therefore, the second stage of developing a questionnaire refers to the initial piloting of the item pool for the purpose of reducing the large list of items gathered from the previous stage to the intended final number. To do so the researchers asked a panel of experts including EFL head teachers, university instructors, and experts in assessment literacy to go through the items and provide feedback. They were asked to check its face and content validity and if necessary, change, add, or remove some items.
As Morgado et al., (2017) claimed, expert judges are well-versed in the topic of interest and/or scale development. Moreover, target population judges are potential scale users. Eleven university instructors and eighteen EFL head teachers who were experts in assessment literacy went through the items, and based on expert feedback, 21 items were removed, and the total number of items decreased to 76.
Evidence Based on Internal Structure: Pilot Testing
The third stage is to do the final piloting and item analysis. Based on the feedback received from the panel of experts, the researchers put together a near-final version of the questionnaire that seemed satisfactory and did not have any glitches. Subsequently, the researchers created an online version of the questionnaire through using Google Docs. The link of the questionnaire was sent to the target group through social media networks. As the participants completed the questionnaire and submitted it, their responses were automatically loaded into a database on the web server, from which they were downloaded onto Microsoft Excel. Finally, 159 questionnaires were used for data analysis.
The item analysis is the final stage of developing a questionnaire. To fine-tune and finalize the questionnaire, the researchers subjected the responses of the pilot group (EFL high school teachers) to statistical analysis and checked the missing responses and possible signs that the instructions were not understood correctly, and the range of responses elicited by each item, the internal consistency of multi-item scales, and factor analysis were calculated. Based on the results of reliability and factor analysis, the researchers excluded the items that did not work properly and selected the best items related to the purpose of the study.
A heterogeneous sample, i.e., a sample that both reflects and captures the range of the target population was selected for the purpose of piloting.
RESULTS
Construct Validity Analysis
The construct validity of a questionnaire may be verified using factor analysis (Bornstedt, 1977; Ratray & Jones, 2007). A questionnaire has construct validity when all the items represent the underlying construct. Based on the relationships between variables (in this study, questionnaire items), exploratory factor analysis identifies the constructs - i.e., factors - that underpin a dataset (Field, 2009; Rietveld & Van Hout, 2011; Tabachnik & Fidell, 2007). The underlying constructs are supposed to be the ones that explain the greatest fraction of the variation shared by the variables. Factor analysis, unlike the frequently used principal component analysis, does not assume that all variance within a dataset is shared (Costello & Osborne, 2005; Field, 2009; Rietveld & Van Hout, 2011; Tabachnik & Fidell, 2007). Therefore, factor analysis is assumed to be a more reliable questionnaire evaluation method than principal component analysis (Costello & Osborne, 2005).
Initially, the factorability of the 76 TALNs items was examined. Several well-recognized criteria for the factorability of a correlation were used. Firstly, it was observed that most items have some correlations with each other, suggesting reasonable factorability. Secondly, a large enough sample size is required to undertake a credible factor analysis (Costello & Osborne, 2005; Field, 2009; Tabachnik & Fidell, 2007). In order to determine whether the sample size is large enough, the Kaiser-Meyer-Okin’s measure of sampling adequacy (KMO) was calculated. According to Field (2009, p.647), the KMO "represents the ratio of the squared correlation of variables to the squared partial correlation of variables”. Table 1 presents the estimated KMO and Bartlett’s Test of Sphericity of the present study.
Table 1. KMO and Bartlett’s Test
Kaiser-Meyer-Olkin Measure of Sampling Adequacy.
|
.730
|
Bartlett's Test of Sphericity
|
Approx. Chi-Square
|
1.654E4
|
Df
|
2850
|
Sig.
|
.000
|
Based on Field (2009), when the KMO is near 0, it is difficult to extract a factor. On the other hand, when the KMO is close to 1, a component or factors may most likely be retrieved since the opposing pattern is observable. Thus, KMO “values between 0.5 and 0.7 are average, values between 0.7 and 0.8 are acceptable, values between 0.8 and 0.9 are excellent, and values beyond 0.9 are exceptional” Field (2009, p.647).
As a result, as Table 1 shows, the KMO of the present study is 0.730, which is near 1, and it is suitable to justify the sample adequacy to do the factor analysis (Kaiser, 1974; Pallant, 2020; Field, 2009; Tabachnick & Fidell, 2007). Therefore, the sample size was large enough. Moreover, the communalities were all above 0.3, and Bartlett’s test of sphericity was significant (p < .05). This means that the variables are correlated highly enough to provide a reasonable basis for factor analysis. Given these overall indicators, factor analysis was deemed to be suitable with all 76 items.
Seventy-six items relating to EFL teachers’ assessment literacy needs were factor analyzed using Exploratory Factor Analysis (EFA) with Varimax rotation. The analysis yielded four components explaining a total of 53.545% of the variance for the entire set of variables. Component 1 was labeled knowledge and basics of Language Assessment Literacy (LAL) due to the high loadings by the items related to this issue. The second component derived was labeled Practical effects of Language Assessment Literacy (LAL) on real situations. This component was labeled as such due to the high loadings by the items related to the consequences of assessment on different issues. Due to the high loadings by the items related to the testing processes and principles, component 3 was labeled principles and processes of Language Assessment Literacy (LAL). The label of each component was based on the Fulcher’s assessment literacy framework (2012). And finally, the fourth component was labeled needs of Continuing Professional Development (CPD) Programs. This component was labeled as such due to the high loadings by the items related to the EFL teachers’ expectations of the CPD programs. After rotation, to improve clarity, variables with loadings lower than 0.3 were considered to have a nonsignificant impact on a factor; therefore, they were omitted (Field, 2009). Moreover, items located in three or four components were also omitted (11 items were omitted). Table 2 displays the items and factor loadings for the rotated factors.
Table 2. Rotated component matrix
Item
|
|
Components
|
|
As an EFL teacher I need to know more about……………
|
1
|
2
|
3
|
4
|
57
|
the use and interpretation of both descriptive and inferential statistics
|
.906
|
|
|
|
30
|
the knowledge of the process of conducting item analysis
|
.841
|
|
|
|
15
|
the use of advanced statistics (e.g., Classical True Score theory, Generalizability theory, Item Response theory, SEM)
|
.828
|
|
|
|
53
|
the history of language testing (pre-scientific, psychometric, structuralist, sociolinguistic-pragmatic)
|
.828
|
|
|
|
71
|
research methods in setting up experiments in testing (e.g., quantitative, qualitative, and mixed-methods approaches)
|
.823
|
|
|
|
54
|
the basic concepts of language testing and assessment (e.g., tests, measurement, evaluation, test use, test type, test format)
|
.821
|
|
|
|
29
|
the knowledge of the process of conducting test analysis
|
.816
|
|
|
|
75
|
knowing scales of measurement (e.g., nominal, ordinal, interval and ratio scale)
|
.794
|
|
|
|
2
|
test validity and its different forms (e.g., predictive, concurrent, content, construct, face, response)
|
.788
|
|
|
|
3
|
test reliability and its different forms (e.g., test-retest, parallel forms, split-haves, Kuder-Richardson formulae, Cronbach’s alpha, scorer reliability)
|
.769
|
|
|
|
16
|
the use of more modern statistical tests (e.g., Multilevel modelling, Autoregressive SEM models, Latent growth curve modelling, Time series approaches, Event history analysis)
|
.738
|
|
|
|
35
|
doing pre-test (item facility, item discrimination, choice distribution)
|
.677
|
|
|
|
68
|
theories of testing (traditional testing, discrete-point testing, integrative testing, communicative testing)
|
.663
|
|
|
|
7
|
testing models and frameworks
|
.648
|
|
|
|
18
|
features of developing a good test
|
.636
|
|
|
|
19
|
authenticity in test developing
|
.587
|
|
|
|
9
|
different types of test score interpretations (norm-referenced and criterion-referenced interpretation)
|
.587
|
|
|
|
17
|
the principles of rubric development
|
.574
|
|
|
|
26
|
interactiveness in testing (interaction between test takers’ characteristics and test tasks)
|
.551
|
|
|
|
28
|
accountability (obligation of teachers to accept responsibility for students’ performance)
|
.540
|
|
|
|
70
|
test critique (critical evaluation of tests)
|
.536
|
|
|
|
27
|
learners’ preparation to take a test
|
.491
|
|
|
|
55
|
different types of tests and designs of assessments for all four language skills (i.e., reading, writing, speaking, listening)
|
|
|
|
.488
|
6
|
the differences between testing, assessment and measurement
|
.459
|
|
|
|
8
|
different types of tests and assessments and their usages (objective versus subjective, essay type versus multiple choice)
|
.456
|
|
|
|
44
|
the practical effects of assessment on students’ performance
|
|
.889
|
|
|
52
|
the use of tests scores and interpretations in educational programs
|
|
.848
|
|
|
45
|
the social consequences of tests
|
|
.831
|
|
|
49
|
the psychological consequences of tests (e.g., memory improvement, students’ learning style, …)
|
|
.819
|
|
|
43
|
the practical effects of assessment literacy on teachers’ teaching strategies
|
|
.789
|
|
|
50
|
the responsibility of test takers
|
|
.783
|
|
|
48
|
the educational consequences of tests (e.g., educational decisions, reforming the curriculum, …..)
|
|
.778
|
|
|
51
|
the use and effects of tests on educational programs
|
|
.748
|
|
|
46
|
the political consequences of tests (e.g., educational policies, ….)
|
|
.730
|
|
|
47
|
the economical consequences of tests
|
|
.704
|
|
|
37
|
the effects of using different platforms of online assessment on educational programs (e.g., testmoze, google doc, Monta,….)
|
|
.637
|
|
|
38
|
the effects of different types of tests on learning and teaching
|
|
.586
|
|
|
63
|
various platforms for online assessment
|
|
|
|
.517
|
34
|
the process of developing and using personal response assessments (e.g., checklists, journals, videotapes, audiotapes, self-assessment, teacher observation, portfolios, conferences, diaries)
|
|
|
.499
|
|
59
|
the principles of developing a good test
|
|
|
|
.430
|
66
|
different types of tests and their functions and effects
|
|
.413
|
|
|
69
|
the use of alternative assessment
|
|
.386
|
|
|
64
|
the effects of using various computer software programs for test construction, test analysis and test scoring on educational programs
|
|
.359
|
|
|
72
|
the effect of test taking strategies on learning and teaching
|
|
.316
|
|
|
21
|
the process of developing a good test and test specifications
|
|
|
.789
|
|
22
|
the principles of educational measurement
|
|
|
.773
|
|
23
|
the principles of using tests in society
|
|
|
.698
|
|
25
|
the effect of tests on teaching/learning (washback)
|
|
|
.664
|
|
5
|
the design of assessments for productive skills (speaking and writing)
|
|
|
|
.529
|
20
|
test bias and analyzing it in test designs (e.g., cultural background, ethicality, sex, native language, background knowledge)
|
|
|
.512
|
|
11
|
ethical issues in assessment
|
|
|
.502
|
|
32
|
the process of administrating oral/written exams
|
|
|
|
.483
|
60
|
the responsibility of test takers and test givers
|
|
|
|
.435
|
33
|
the process of developing and using constructed-response assessments (e.g., fill in the blank, short answer)
|
|
|
.402
|
|
41
|
the process of test administration
|
|
|
|
.671
|
56
|
different types and use test scores and their interpretation in educational programs
|
|
|
|
.625
|
40
|
the functions of tests (achievement, proficiency, aptitude, selection, placement, diagnosis)
|
|
|
|
.623
|
39
|
different interpretation of tests
|
|
|
|
.582
|
76
|
administrating and scoring oral and written exams
|
|
|
|
.582
|
42
|
the process of writing test specifications
|
|
|
|
.507
|
65
|
the process of making assessment real and personal
|
|
|
|
.497
|
31
|
the process of administrating and scoring computer-based testing
|
|
|
|
.410
|
73
|
providing test security
|
|
|
|
.391
|
58
|
the ethical issues in assessment
|
|
|
|
.389
|
4
|
the design of assessments for receptive skills (reading and listening)
|
|
|
|
.355
|
After rotation, the first part of the questionnaire which refers to the EFL teachers’ needs of knowing more about Language Assessment Literacy (LAL) basics and history, accounted for 24 items. The following table (Table 3) represents this part.
Table 3. The items related to factor1(knowledge of LAL)
Item
|
As an EFL teacher how much training do you need on …. ……….?
|
Loaded Factor
|
1
|
the use and interpretation of both descriptive and inferential statistics
|
.906
|
2
|
the knowledge of the process of conducting item analysis
|
.841
|
3
|
the use of advanced statistics (e.g., Classical True Score theory, Generalizability theory, Item Response theory, SEM)
|
.828
|
4
|
the history of language testing (pre-scientific, psychometric, structuralist, sociolinguistic-pragmatic)
|
.828
|
5
|
research methods in setting up experiments in testing (e.g., quantitative, qualitative, and mixed-methods approaches)
|
.823
|
6
|
the basic concepts of language testing and assessment (e.g., tests, measurement, evaluation, test use, test type, test format)
|
.821
|
7
|
the knowledge of the process of conducting test analysis
|
.816
|
8
|
knowing scales of measurement (e.g., nominal, ordinal, interval and ratio scale)
|
.794
|
9
|
test validity and its different forms (e.g., predictive, concurrent, content, construct, face, response)
|
.788
|
10
|
test reliability and its different forms (e.g., test-retest, parallel forms, split-haves, Kuder-Richardson formulae, Cronbach’s alpha, scorer reliability)
|
.769
|
11
|
the use of more modern statistical tests (e.g., Multilevel modelling, Autoregressive SEM models, Latent growth curve modelling, Time series approaches, Event history analysis)
|
.738
|
12
|
doing pre-test (item facility, item discrimination, choice distribution)
|
.677
|
13
|
theories of testing (traditional testing, discrete-point testing, integrative testing, communicative testing)
|
.663
|
14
|
testing models and frameworks
|
.648
|
15
|
features of developing a good test
|
.636
|
16
|
authenticity in test developing
|
.587
|
17
|
different types of test score interpretations (norm-referenced and criterion-referenced interpretation)
|
.587
|
18
|
the principles of rubric development
|
.574
|
19
|
interactiveness in testing (interaction between test takers’ characteristics and test tasks)
|
.551
|
20
|
accountability (obligation of teachers to accept responsibility for students’ performance)
|
.540
|
21
|
test critique (critical evaluation of tests)
|
.536
|
22
|
learners’ preparation to take a test
|
.491
|
23
|
the differences between testing, assessment and measurement
|
.459
|
24
|
different types of tests and assessments and their usages (objective versus subjective, essay type versus multiple choice)
|
.456
|
The second part of the questionnaire which refers to the EFL teachers’ needs of knowing more about the effects of Language Assessment Literacy (LAL) on real life situations accounted for 16 items. Table 4 shows the items of this part.
Table 4. The items of factor 2 (effects of LAL)
Item
|
As an EFL teacher how much do you need to know about …………?
|
Loaded Factor
|
1
|
the practical effects of assessment on students’ performance
|
.889
|
2
|
the use of tests scores and interpretations in educational programs
|
.848
|
3
|
the social consequences of tests
|
.831
|
4
|
the psychological consequences of tests (e.g., memory improvement, students’ learning style, …)
|
.819
|
5
|
the practical effects of assessment literacy on teachers’ teaching strategies
|
.789
|
6
|
the responsibility of test takers
|
.783
|
7
|
the educational consequences of tests (e.g., educational decisions, reforming the curriculum, …..)
|
.778
|
8
|
the use and effects of tests on educational programs
|
.748
|
9
|
the political consequences of tests (e.g., educational policies, ….)
|
.730
|
10
|
the economical consequences of tests
|
.704
|
11
|
the effects of using different platforms of online assessment on educational programs (e.g., testmoze, google doc, Monta,….)
|
.637
|
12
|
the effects of different types of tests on learning and teaching
|
.586
|
13
|
different types of tests and their functions and effects
|
.413
|
14
|
the use of alternative assessments
|
.386
|
15
|
the effects of using various computer software programs for test construction, test analysis and test scoring on educational programs
|
.359
|
16
|
the effect of test taking strategies on learning and teaching
|
.316
|
The third part of the questionnaire, which refers to the EFL teachers’ needs of knowing more about the principles and processes of Language Assessment Literacy (LAL), accounted for 8 items. Table 5 shows the items of this part.
Table 5. The items of factor 3 (processes of LAL)
Item
|
As an EFL teacher how much do you need to know about……………………?
|
Loaded Factor
|
1
|
the process of developing a good test and test specifications
|
.789
|
2
|
the principles of educational measurement
|
.773
|
3
|
the principles of using tests in society
|
.698
|
4
|
the effect of tests on teaching/learning (washback)
|
.664
|
5
|
test bias and analyzing it in test designs (e.g., cultural background, ethicality, sex, native language, background knowledge)
|
0512
|
6
|
the process of developing and using personal response assessments (e.g., checklists, journals, videotapes, audiotapes, self-assessment, teacher observation, portfolios, conferences, diaries)
|
.499
|
7
|
ethical issues in assessment
|
.502
|
8
|
the process of developing and using constructed-response assessments (e.g., fill in the blank, short answer)
|
.402
|
The fourth part of the questionnaire which refers to the EFL teachers’ Language Assessment Literacy (LAL) expectations of CPD programs accounted for 17 items. Table 6 shows the items of this part.
Table 6. The items of factor 4 (CPD programs)
Item
|
As an EFL teacher I need to participate in CPD programs to know more about………
|
Loaded Factor
|
1
|
different types of tests and designs of assessments for all four language skills (i.e., reading, writing, speaking, listening)
|
.488
|
2
|
various platforms for online assessment
|
.517
|
3
|
the principles of developing a good test
|
.430
|
4
|
the design of assessments for productive skills (speaking and writing)
|
.529
|
5
|
the process of administrating oral/written exams
|
.483
|
6
|
the responsibility of test takers and test givers
|
.435
|
7
|
The process of test administration
|
.671
|
8
|
different types and use test scores and their interpretation in educational programs
|
.625
|
9
|
the functions of tests (achievement, proficiency, aptitude, selection, placement, diagnosis)
|
.623
|
10
|
different interpretation of tests
|
.582
|
11
|
administrating and scoring oral and written exams
|
.582
|
12
|
the process of writing test specifications
|
.507
|
13
|
the process of making assessment real and personal
|
.497
|
14
|
the process of administrating and scoring computer-based testing
|
.410
|
15
|
providing test security
|
.391
|
16
|
the ethical issues in assessment
|
.389
|
17
|
the design of assessments for receptive skills (reading and listening)
|
.355
|
Reliability Analysis
After doing Exploratory Factor Analysis (EFA) and removing the insignificant items and categorizing the items in their appropriate group, the internal consistency of the questionnaire should be assessed (Field, 2009). Cronbach’s Alpha was used to estimate the reliability of four parts of the questionnaire separately and totally. Cronbach’s Alpha is a coefficient used to rate the internal consistency (homogeneity) or correlation of the items in a questionnaire together. If a questionnaire enjoys strong internal consistency, most measurement experts (e.g., Field, 2009; Garson, 2010; Cortina, 1993) agree that it should show only moderate correlation among items. According to Field (2009), a questionnaire with an α of 0.8 (or more) is considered reliable. The reliability indices (table 7) reveals that all parts of the questionnaire enjoy high level of internal consistency.
Table 7. Cronbach’s Alpha reliability coefficient
|
Reliability Coefficient (α)
|
Part 1:
EFL teachers’ language assessment literacy knowledge
|
.916
|
Part2:
Principles and processes of language assessment literacy
|
.862
|
Part 3:
Practical aspects of language assessment literacy
|
.906
|
Part 4:
EFL teachers’ expectations of CPD programs
|
.854
|
Total
|
.901
|
DISCUSSION
Despite the fact that language assessment literacy is considered essential for teacher development (DeLuca, LaPointe-McEwan, & Luhanga, 2016; Gotch & French, 2014), no validated instrument has been designed for investigating teachers’ assessment literacy needs. With that in mind, this study was an attempt to develop an inventory to reflect the EFL teachers' needs in assessment literacy. To fulfill this purpose, exploratory analysis was used to examine the construct validity of the proposed inventory. Moreover, the Fulcher’s assessment literacy framework (2012) was drawn on as the analytic model guiding the study. The EFA results indicated that the inventory can be best explained by four components, three of them, namely language assessment literacy knowledge, principles and processes of language assessment literacy and practical aspects of language assessment literacy, are based on the Fulcher’s framework. The fourth component refers to the teachers’ expectations from CPD programs.
The 24 items in the language assessment literacy knowledge factor clearly show the significance of knowing the basics of language assessment literacy, empowering teachers for classroom assessment. It is believed that knowing the basics of language assessment literacy can empower teachers to be not only knowledgeable but also creative (Crocker & Algina, 2006) if taught thoroughly and meticulously.
The 16 items in the practical aspects of language assessment literacy factor obviously represent the consequences of assessment literacy on learners’ real-life situations. Teachers’ knowledge of social, educational, psychological consequences of assessment can empower them to be more precise in making decisions about the learners’ future. Also, knowing these practical features help teachers contextualize the assessment activities (Mertler & Campbell, 2004, 2005). Thus, this is not surprising that in the context of L2 education, teachers need to acquire the knowledge of how to assess learners as well as how to make decisions about their future (Richards & Farrel, 2005).
The eight items in the principles and process of language assessment literacy factor emphasize the role of knowing the process of developing different types of assessment tasks in teacher education. Familiarity with test methods, techniques, processes, and awareness of test principles and practices including ethics help teachers be creator of new motivating assessment contexts (Jeong, 2013).
The 17 items in the Expectations of CPD programs factor reflect teachers’ expectations from CPD programs about language assessment literacy. These items, also, highlight the importance of CPD programs in empowering teachers with newly-developed concepts in assessment literacy. Moreover, teachers’ CPD programs not only do equip teachers with both learning and learner-centered assessment tasks but also help them expand their understanding of assessment technical knowledge (Behzadi, Golshan, & Sayadian , 2019).
CONCLUSION AND IMPLICATIONS
Although earlier inventories and surveys of teacher assessment literacy sparked much of this research, numerous academics have pointed out that these measures are now out of date (e.g., Brookhart, 2011; DeLuca et al., 2016; Gotch & French, 2014), and none of them reflect teachers’ assessment literacy needs and expectations. In the absence of any instrument investigating teachers’ needs in assessment literacy, however, it has not been possible to quantify this construct in its operational terms. Thus, the present study was conducted to design and validate an instrument unique to EFL context.
The TALNs developed in this study is a new inventory for research and professional development in the field of teacher assessment literacy. It also gives directions to practitioners and researchers in the realm of language assessment literacy to provide teachers with professional development programs based on their needs and expectations. This study provides initial validity and reliability data to support the TALNs as a helpful indication of teachers' assessment literacy needs and their expectations from CPD programs. The TALNs data can spark crucial studies regarding teachers' current assessment literacy levels and needs, as well as a database for targeted professional learning.
As a result, we provide the TALNs in this paper as an instrument for use and development by researchers and educational practitioners in the service of improving teachers' language assessment literacy and perceiving their needs.
This study was an initial step towards Iranian EFL teachers’ language assessment literacy needs. Further research is needed to make it more comprehensive and general. To do this, other EFL contexts such as language institutions, universities, and even other areas of education can be considered for future studies. Considering triangulation, also, researchers can benefit from other data gathering tools such as think aloud, interview, and observation. Additionally, the present study drew on the Fulcher’s language assessment literacy framework (2012). Other studies can focus on other assessment frameworks as their analytic guiding model. Finally, other statistical analyses such as Confirmatory Factor Analysis (CFA) or Rasch model can be implemented for establishing the factors and assigning items.
Disclosure statement
No potential conflict of interest was reported by the authors.
ORCID